larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

KeyError when comparing csv files #60

Open CaptainDaVinci opened 3 years ago

CaptainDaVinci commented 3 years ago

I have the following 2 csv files,

view-1.csv

As of Date,Business Title,Email,Employee Type,Employee_ID

view-2.csv

'As of Date',As of Date,Business Title,Email,Employee Type,Employee_ID

Running

$ csvdiff Employee_ID view-1.csv view-2.csv 

Throws the following error,

Traceback (most recent call last):
  File "/Users/.pyenv/versions/3.7.5/bin/csvdiff", line 8, in <module>
    sys.exit(csvdiff_cmd())
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 160, in csvdiff_cmd
    significance=significance)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 172, in _diff_files_to_stream
    diff = diff_files(from_csv, to_csv, index_columns, sep=sep, ignored_columns=ignored_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 44, in diff_files
    ignore_columns=ignored_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 211, in create
    return create_indexed(from_indexed, to_indexed, index_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 222, in create_indexed
    index_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 252, in _assemble
    key=_change_key)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 251, in <genexpr>
    for k in changed),
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 264, in record_diff
    from_ = lhs[k]
KeyError: "'As of Date'"

I was expecting the output to be something like columns removed/added: 1, 'As of Date'.

larsyencken commented 3 years ago

csvdiff is a row based diff rather than a column based diff, so it doesn't detect column changes. In this case it's erroring because it expects the same columns in both files. I agree the error message could be better.

If you're able to generate the same column names, you should get a good diff out of it in JSON format.

On Tue, 6 Oct 2020 at 14:29, Yash Kothari notifications@github.com wrote:

I have the following 2 csv files,

view-1.csv

As of Date,Business Title,Email,Employee Type,Employee_ID

view-2.csv

'As of Date',As of Date,Business Title,Email,Employee Type,Employee_ID

Running

$ csvdiff Employee_ID view-1.csv view-2.csv

Throws the following error,

Traceback (most recent call last): File "/Users/.pyenv/versions/3.7.5/bin/csvdiff", line 8, in sys.exit(csvdiff_cmd()) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, **kwargs) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/init.py", line 160, in csvdiff_cmd significance=significance) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/init.py", line 172, in _diff_files_to_stream diff = diff_files(from_csv, to_csv, index_columns, sep=sep, ignored_columns=ignored_columns) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/init.py", line 44, in diff_files ignore_columns=ignored_columns) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 211, in create return create_indexed(from_indexed, to_indexed, index_columns) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 222, in create_indexed index_columns) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 252, in _assemble key=_change_key) File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 251, in for k in changed), File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 264, in recorddiff from = lhs[k] KeyError: "'As of Date'"

I was expecting the output to be something like columns removed/added: 1, 'As of Date'.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/larsyencken/csvdiff/issues/60, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIZLBF6TQF56TFHLQUBQTSJMER7ANCNFSM4SF6S4VA .