wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
5.9k stars 605 forks source link

csvjson -key option processing #1248

Closed wiluite closed 1 month ago

wiluite commented 1 month ago

To work successfully with the key option, all values ​​in the column must be unique. However, csvjson considers the same numeric values ​​represented by different strings to be different. Is it really correct? )

a,b,c
1/1/2020,1s,3.
01/01/2020,1 sec,3.0
csvjson  dummy.csv -k a
ValueError: Value 2020-01-01 is not unique in the key column.
csvjson  dummy.csv -k b
ValueError: Value 0:00:01 is not unique in the key column.
csvjson  dummy.csv -k c

{"3": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}, "3.0": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}}

jpmckinney commented 1 month ago

Hmm, yes, and different number of decimals has the same behavior:

$ printf 'a,b,c\n1/1/2020,1s,3.00\n01/01/2020,1 sec,3.0' | csvjson -k c
{"3.00": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}, "3.0": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}}
jpmckinney commented 1 month ago

Aha, it is because Decimal('3.00') == Decimal('3.0') but str(Decimal('3.00')) != str(Decimal('3.0')). Fixed in agate.