taraslayshchuk / es2csv

Export from an Elasticsearch into a CSV file
Apache License 2.0
510 stars 191 forks source link

Encoding issue while writing into csv #35

Closed abidulrmdn closed 6 years ago

abidulrmdn commented 7 years ago

Calling a field which is


"name": {
                        "type": "keyword"
                     },

Command that i ran: es2csv -i index -D type -f name --verify-certs -u https://userwithurl -q '*' -o database.csv

And the error that was showing was :

Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/es2csv", line 11, in load_entry_point('es2csv==5.2.1', 'console_scripts', 'es2csv')() File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 284, in main es.write_to_csv() File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 237, in write_to_csv line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()} File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 237, in line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()} NameError: name 'unicode' is not defined

taraslayshchuk commented 7 years ago

python3.5 is not supported at the moment. Please use python 2.7.

abidulrmdn commented 7 years ago

Why not just make it str instead. @taraslayshchuk

katafrakt commented 7 years ago

@taraslayshchuk I'm using es2csv with Python 2.7 yet I'm getting this unicode-related error:

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/.local/lib/python2.7/site-packages/es2csv.py", line 284, in main
    es.write_to_csv()
  File "/home/ubuntu/.local/lib/python2.7/site-packages/es2csv.py", line 221, in write_to_csv
    csv_writer.writeheader()
  File "/usr/lib/python2.7/csv.py", line 141, in writeheader
    self.writerow(header)
  File "/usr/lib/python2.7/csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 8: ordinal not in range(128)

So I guess using 2.x is not a solution for the problem. I'm not sure what document is problematic, but \xe5 is .

katafrakt commented 7 years ago

I solved it by using unidecodecsv instead of csv:

diff --git a/es2csv.py b/es2csv.py
index b948843..509e5ea 100755
--- a/es2csv.py
+++ b/es2csv.py
@@ -16,7 +16,7 @@ import sys
 import time
 import argparse
 import json
-import csv
+import unicodecsv as csv
 import elasticsearch
 import progressbar
 from functools import wraps
taraslayshchuk commented 7 years ago

@abdrmdn Answer for you question is in @katafrakt comment. More details, that's why we should use such methods.

taraslayshchuk commented 7 years ago

Hello @katafrakt you have definitely another error. Looks like you have unicode naming in document, to be precise it is in key name and that is not expected by this tool.

abidulrmdn commented 6 years ago

Since this is already resolved, I'll close the issue.

taraslayshchuk commented 6 years ago

Fixed in 5.5.2.