simonw / csv-diff

Python CLI tool and library for diffing CSV and JSON files
Apache License 2.0
292 stars 47 forks source link

Error when column name contains `.` (dot) #7

Closed antopolskiy closed 3 years ago

antopolskiy commented 4 years ago
$csv-diff --version
csv-diff, version 0.6

Consider the following example, which works correctly:

import csv_diff
from io import StringIO
csv_diff.compare(csv_diff.load_csv(StringIO("id,a,b,c c,d\n0,2,3,4,5"), key="id"),
                 csv_diff.load_csv(StringIO("id,a,b,c c,d\n0,2,4,5,5"), key="id"))

Output:

{'added': [],
 'removed': [],
 'changed': [{'key': '0', 'changes': {'b': ['3', '4'], 'c c': ['4', '5']}}],
 'columns_added': [],
 'columns_removed': []}

If I add a . inside the column name of "c c" -> "c. c" it breaks

import csv_diff
from io import StringIO
csv_diff.compare(csv_diff.load_csv(StringIO("id,a,b,c. c,d\n0,2,3,4,5"), key="id"),
                 csv_diff.load_csv(StringIO("id,a,b,c. c,d\n0,2,4,5,5"), key="id"))

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-5902626c5fa9> in <module>
      2 from io import StringIO
      3 csv_diff.compare(csv_diff.load_csv(StringIO("id,a,b,c. c,d\n0,2,3,4,5"), key="id"),
----> 4                  csv_diff.load_csv(StringIO("id,a,b,c. c,d\n0,2,4,5,5"), key="id"))

~/anaconda3/envs/py37/lib/python3.7/site-packages/csv_diff/__init__.py in compare(previous, current)
     65                         "changes": {
     66                             field: [prev_value, current_value]
---> 67                             for _, field, (prev_value, current_value) in d
     68                         },
     69                     }

~/anaconda3/envs/py37/lib/python3.7/site-packages/csv_diff/__init__.py in <dictcomp>(.0)
     65                         "changes": {
     66                             field: [prev_value, current_value]
---> 67                             for _, field, (prev_value, current_value) in d
     68                         },
     69                     }

TypeError: unhashable type: 'list'

From what I can understand, for some reason when the column name contains ., the field become a list instead of a str (e.g. ["c. c."] instead of "c. c"), and it breaks the construction of the dictionary.

simonw commented 3 years ago

This seems to be weird behavior from dictdiffer. I can work around it.