Closed MJones36 closed 2 years ago
I think you may be using the wrong encoding. The 0xed
byte appears to be the í
unicode character. I would recommend trying the iso-8859-1
encoding.
Additionally, here's a gist for how I would try to load those files. It requires the method added by #510.
Hope this helps.
@MJones36 does this address the issue or should we leave this open and mark as a bug?
While trying to load some files from Box into redshift using the
box.get_table_by_file_id
method, we got the following error:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 5561: invalid continuation byte
. We realized that we were only getting this error on excel files that were not saved as CSV utf-8 by a third party.Using the chardet package, we attempted to figure out a way to detect the encoding of the excel saved CSVs. The package detected with a confidence of 1.0, that the files were ascii but when we tried to pass that encoding to parsons.from_csv and
petl.fromcsv
we still got a UnicodeDecodeError implying that ascii is also not correct.We would love to have a way to reliably detect, anticipate, or at least attempt typical excel encodings and be able to pass those through methods that use
.fromcsv