simonw / csv-diff

Python CLI tool and library for diffing CSV and JSON files
Apache License 2.0
292 stars 47 forks source link

BUG: Duplicate key causes undefined behaviour, user not warned #31

Open corneliusroemer opened 2 years ago

corneliusroemer commented 2 years ago

When trying to figure out what happens in #30 I discovered that when there are rows with duplicated keys, the behaviour is odd.

It would be good to test for duplicate keys and raise an error or warning (maybe allowing user to pick how to deal with this).

What shouldn't happen is that incomplete rows cause some sort of undefined behaviour as here:

printf "a,b,c,d\n1,2,3,4\n1,2,3\n3,2,3,4" >a.csv
printf "a,b,c,d\n1,2,3,4\n1,2,3,4\n3,2,3,4" >b.csv
csv-diff b.csv a.csv --key a

The output makes no sense:

1 column removed

1 column removed

  d