larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

invalid column name 'id' as key #51

Open wyhwow opened 5 years ago

wyhwow commented 5 years ago

In the API :diff_files example ,it can work sucessful with column 'name' but failed with column 'id'

Traceback (most recent call last): File "differ.py", line 3, in patch = csvdiff.diff_files('Skill.csv', 'Skill_1.csv', ['id']) File "/usr/local/lib/python3.6/dist-packages/csvdiff/init.py", line 44, in diff_files ignore_columns=ignored_columns) File "/usr/local/lib/python3.6/dist-packages/csvdiff/patch.py", line 204, in create from_indexed = records.index(from_records, index_columns) File "/usr/local/lib/python3.6/dist-packages/csvdiff/records.py", line 58, in index raise InvalidKeyError('invalid column name {k} as key'.format(k=k)) csvdiff.records.InvalidKeyError: invalid column name 'id' as key

column 'id','name' are both in my testing files

larsyencken commented 5 years ago

Thanks for the bug report. Can you provide some fake example data that it fails for?

On Fri, 7 Dec 2018 at 10:14, wyhwow notifications@github.com wrote:

In the API :diff_files example ,it can work sucessful with column 'name' but failed with column 'id'

Traceback (most recent call last): File "differ.py", line 3, in patch = csvdiff.diff_files('Skill.csv', 'Skill_1.csv', ['id']) File "/usr/local/lib/python3.6/dist-packages/csvdiff/init.py", line 44, in diff_files ignore_columns=ignored_columns) File "/usr/local/lib/python3.6/dist-packages/csvdiff/patch.py", line 204, in create from_indexed = records.index(from_records, index_columns) File "/usr/local/lib/python3.6/dist-packages/csvdiff/records.py", line 58, in index raise InvalidKeyError('invalid column name {k} as key'.format(k=k)) csvdiff.records.InvalidKeyError: invalid column name 'id' as key

column 'id','name' is both in my testing files

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/larsyencken/csvdiff/issues/51, or mute the thread https://github.com/notifications/unsubscribe-auth/AACMrAfkYUqIxhk60YGMZiRyC1mU7Py3ks5u2jFwgaJpZM4ZIDk6 .

wyhwow commented 5 years ago

skill.csv

id name desc
int string string
技能ID 技能名称 技能描述
1001 小恶魔普攻 attack01
1002 小恶魔普攻 attack01
1003 夏提雅技能 skill01

skill_1.csv

id name desc
int string string
技能ID 技能名称 技能描述
1001 小恶魔普攻 attack01
1002 小恶魔普攻 attack01
1003 夏提雅技能 skill01
1004 夏提雅奥义 skill02
1005 雅尔贝德普攻 attack01
_1 百合折 attack01
1006 雅尔贝德技能 attack02

if example data can‘t reproduce the bug,I can email the original files. @larsyencken

simbo1905 commented 5 years ago

I cannot reproduce this:

 2019-05-22 05:46:02 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → head skills*
==> skills.csv <==
id,name,desc
int,string,string
技能ID,技能名称,技能描述
1001,小恶魔普攻,attack01
1002,小恶魔普攻,attack01
1003,夏提雅技能,skill01

==> skills1.csv <==
id,name,desc
int,string,string
技能ID,技能名称,技能描述
1001,小恶魔普攻,attack01
1002,小恶魔普攻,attack01
1003,夏提雅技能,skill01
1004,夏提雅奥义,skill02
1005,雅尔贝德普攻,attack01
_1,百合折,attack01
1006,雅尔贝德技能,attack02

 2019-05-22 05:46:22 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → csvdiff --style=summary id skills.csv skills1.csv
0 rows removed (0.0%)
4 rows added (80.0%)
0 rows changed (0.0%)

 2019-05-22 05:46:29 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
kpalka92 commented 3 years ago

I had similar problem. In my case the problem was that I had been exporting data from excel, which caused that saved file had "UTF-8 BOM" encoding. This has been causing csvdiff to detect additional unicode characters in the name of the first column - instead of the "id" the csvdiff has treated this as "\u010f\u00bb\u017cid". The problem has been solved when I changed the encoding to normal "UTF-8".