Closed DanCardin closed 5 years ago
Unfortunately, we leverage pandas for its IO capability with SQL databases - to emulate COPY and UNLOAD statements in redshift engines/conns so it isnt avoidable.
We could rewrite it to not do this, but it would be quite a bit of work.
We could also make "redshift" an extras requirement which installs pandas, but it might not be preferred or nice to do as people would have to transition to the extras version when updating PMR.
its totally avoidable. you need a connection already for pandas, and you'd only using it to avoid having to use python's built in csv package!
I see 3 usages.
• read_sql
into to_csv
which is no different from engine.execute
into csv.DictReader
• that again in a different file
• SQLTable
into insert
, which is just a glorified sqlalchemy Table
and insert
wrapper.
Is it possible to somehow have an option like [nopandas]
that would omit it, and you'd just lose the COPY/UNLOAD capability? I too felt sad to have to install pandas and numpy for a project that did not require it purely to use PMR.
Or maybe switch the default and have the flag be [withpandas]
I just ran into an issue where a pandas dependency is missing and it's blocking my use of PMR
Will be fixed by https://github.com/schireson/schireson-pytest-mock-resources/pull/58
Pandas is an annoying and large dependency that should be simple to avoid, by using the built-in csv module in python.