csvjson -key option processing

wiluite commented 6 months ago

To work successfully with the key option, all values in the column must be unique. However, csvjson considers the same numeric values represented by different strings to be different. Is it really correct? )

a,b,c
1/1/2020,1s,3.
01/01/2020,1 sec,3.0

csvjson  dummy.csv -k a
ValueError: Value 2020-01-01 is not unique in the key column.

csvjson  dummy.csv -k b
ValueError: Value 0:00:01 is not unique in the key column.

csvjson  dummy.csv -k c

{"3": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}, "3.0": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}}

jpmckinney commented 6 months ago

Hmm, yes, and different number of decimals has the same behavior:

$ printf 'a,b,c\n1/1/2020,1s,3.00\n01/01/2020,1 sec,3.0' | csvjson -k c
{"3.00": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}, "3.0": {"a": "2020-01-01", "b": "0:00:01", "c": 3.0}}

jpmckinney commented 6 months ago

Aha, it is because Decimal('3.00') == Decimal('3.0') but str(Decimal('3.00')) != str(Decimal('3.0')). Fixed in agate.

wireservice / csvkit

csvjson -key option processing #1248