hellonarrativ / spectrify

Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.
https://aws.amazon.com/blogs/big-data/narrativ-is-helping-producers-monetize-their-digital-content-with-amazon-redshift/
MIT License
116 stars 25 forks source link

Python2 unicode support. #30

Closed nirmo-lightricks closed 5 years ago

nirmo-lightricks commented 6 years ago

Try to solve the unicode code issue (#16 )

nirmo-lightricks commented 6 years ago

@c-nichols I used generator in order to load each line at a time. I tried to use unicodecsv instead but for some reason it failed.. (i used unicodecsv in the tests as well). This solution works fine for me.

what do you think?

nirmo-lightricks commented 6 years ago

Hi @c-nichols I solved the unicode issue using unicodecsv package. what do you think?

nirmo-lightricks commented 6 years ago

@c-nichols Hi, I would like to hear your opinion... what do you think about the current solution? Thanks

c-nichols commented 5 years ago

Hi @nirmo-lightricks sorry I have been out of commission for the past couple weeks due to health reasons -- I'll get to this soon!

nirmo-lightricks commented 5 years ago

@c-nichols No problem ;) thanks 👍

nirmo-lightricks commented 5 years ago

@c-nichols ping..

c-nichols commented 5 years ago

Hi @nirmo-lightricks sorry for the delay. I'd like for the usage of unicodecsv csv reader to be optional (maybe enabled with an environment variable), and for the default to be the standard Python CSV reader. The reason I suggest this is because unicodecsv can have a pretty serious performance impact. Are you open to adding that feature toggle?

nirmo-lightricks commented 5 years ago

@c-nichols sure. I made some changes.. Please have a look.

nirmo-lightricks commented 5 years ago

🎉 @eliran-lightricks Note that the unicode feature has merged!