larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

diff csv files encode with utf8 #58

Closed wangbinaaa closed 4 years ago

wangbinaaa commented 5 years ago

Traceback (most recent call last): File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 64, in compare_common(db_list1, db_list2) File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 54, in compare_common diff = csvdiff.diff_files(file1, file2, [(index.split(',')[0])]) File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff__init__.py", line 44, in diff_files ignore_columns=ignored_columns) File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\patch.py", line 204, in create from_indexed = records.index(from_records, index_columns) File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 53, in index for r in record_seq File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 51, in obj = { File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 38, in iter for lineno, r in enumerate(self.reader, 2): File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 111, in next self.fieldnames File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 98, in fieldnames self._fieldnames = next(self.reader) UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 85: illegal multibyte sequence

when I diff two files I write with utf8, pycharm raise this error

larsyencken commented 4 years ago

Interesting, it looks like maybe your OS is setting a different default encoding than UTF-8. You might try loading the records yourself and then diffing the records instead of the files.