Please don't depend on pandas

schireson / pytest-mock-resources

Pytest Fixtures that let you actually test against external resource (Postgres, Mongo, Redshift...) dependent code.

https://pytest-mock-resources.readthedocs.io/en/latest/quickstart.html

MIT License

179 stars 19 forks source link

Please don't depend on pandas #42

Closed DanCardin closed 5 years ago

DanCardin commented 5 years ago

Pandas is an annoying and large dependency that should be simple to avoid, by using the built-in csv module in python.

oakhan3 commented 5 years ago

Unfortunately, we leverage pandas for its IO capability with SQL databases - to emulate COPY and UNLOAD statements in redshift engines/conns so it isnt avoidable.

We could rewrite it to not do this, but it would be quite a bit of work.

We could also make "redshift" an extras requirement which installs pandas, but it might not be preferred or nice to do as people would have to transition to the extras version when updating PMR.

DanCardin commented 5 years ago

its totally avoidable. you need a connection already for pandas, and you'd only using it to avoid having to use python's built in csv package!

I see 3 usages. • read_sql into to_csv which is no different from engine.execute into csv.DictReader • that again in a different file • SQLTable into insert, which is just a glorified sqlalchemy Table and insert wrapper.

langelgjm commented 5 years ago

Is it possible to somehow have an option like [nopandas] that would omit it, and you'd just lose the COPY/UNLOAD capability? I too felt sad to have to install pandas and numpy for a project that did not require it purely to use PMR.

ryan-at-schireson commented 5 years ago

Or maybe switch the default and have the flag be [withpandas]

I just ran into an issue where a pandas dependency is missing and it's blocking my use of PMR

DanCardin commented 5 years ago

Will be fixed by https://github.com/schireson/schireson-pytest-mock-resources/pull/58