taraslayshchuk / es2csv

Export from an Elasticsearch into a CSV file
Apache License 2.0
510 stars 191 forks source link

Problem with UTF8 characters in geoip fields #3

Closed brablc closed 8 years ago

brablc commented 8 years ago

We use geoip filter and for 85.237.234.8 we get this: geoip.city_name = 'Kysucké Nové Mesto'. The UTF8 character causes crash:

Traceback (most recent call last):                                                                                                                                                                                    ] [88/1000] [  8%] [0:00:00] [ETA:  0:00:00] [  2.68 kB/s]
  File "/usr/local/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/Library/Python/2.7/site-packages/es2csv.py", line 248, in main
    es.write_to_csv()
  File "/Library/Python/2.7/site-packages/es2csv.py", line 212, in write_to_csv
    csv_writer.writerow(json.loads(line))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: ordinal not in range(128)

Would it be possible to fix the writer to work with UTF8?

Thanks for the script it is exactly what we were looking for!

taraslayshchuk commented 8 years ago

It was already fixed in #1 by @pokab. I have merged this changes and have updated pip package. Could you, please, upgrade your version and provide feedback? If you are using pip you can simply run:

 $ pip install --upgrade es2csv
brablc commented 8 years ago

Great! Thank you!