larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

Not working with Large files or Unicode Chars #41

Open rsrini7 opened 5 years ago

rsrini7 commented 5 years ago

Large File Error:

File "i:\anaconda3\lib\site-packages\csvdiff\records.py", line 53, in for r in record_seq MemoryError

Unicode Decode Error:

return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5148: character maps to

larsyencken commented 5 years ago

Csvdiff is meant for datasets that fit into memory. It's not a streaming comparison. How big are the files you're trying to compare?

rsrini7 commented 5 years ago

ok thanks. I am comparing around 80 MB files. Data from : https://blog.majestic.com/development/majestic-million-csv-daily/ http://downloads.majestic.com/majestic_million.csv