Open eadmaster opened 4 years ago
I think this is a reasonable addition, given the existence of a good Go CSV library. Suggestions welcome!
converting csv->json should be a no brainer, while the opposite is not because csv only supports a flat schema.
Using the dot notation seem the most common solution for nested records:
Btw since this is rarely used i think you could just make the conversion fail in this case.
Encoding is indeed a huge pain. It may be best if you just start with decoding only.
When I tried to implement this and I found it to be a pain because you have to basically decide how to handle non-arrays and arrays of non-homogenous objects. It's probably okay to simply fail in many cases, but limits the usefulness quite a bit.
For decoding, you'll also need to think about the following:
--decoder-opts="csv=array-of-arrays"
.0
, 1
or field1
, field2
. * How do you handle headers missing if you're producing an array of objects? Eg probably just set the objects fields to something like `0`, `1` or `field1`, `field2`.
An array of objects should be nicer.
* How do you handle headers missing if you're producing an array of objects?
you can omit the fields in the objects or leave them as empty strings.
e.g. you have this csv:
name,children,thing
1,2,3
,,2,3
1,,3
This is the catmandu convert CSV to JSON
output:
[
{
"thing": "3",
"name": "1",
"children": "2"
},
{
"children": "",
"name": "",
"thing": "2"
// note the 3rd value is omitted here because it has no header
},
{
"children": "",
"name": "1",
"thing": "3"
}
]
Alternative with omitted missing fields:
[
{
"thing": "3",
"name": "1",
"children": "2"
},
{
"thing": "2"
},
{
"name": "1",
"thing": "3"
}
]
I wasn't talking about values missing, but headers. Eg: What does this CSV produce as JSON?
1,2,3
,,2,3
1,,3
catmandu just assume the first row is always the header.
$ cat test.csv
1,2,3
,,2,3
1,,3
$ catmandu convert CSV to JSON < test.csv | jq .
[
{
"1": "",
"2": "",
"3": "2"
},
{
"3": "3",
"2": "",
"1": "1"
}
]
Another command that is able to convert csv->json
is rows (you can install via the python-rows
package) :
$ rows convert test.csv test.json
$ cat test.json | jq
[
// same behavior here
{
"field_1": null,
"field_2": null,
"field_3": 2
},
{
"field_1": 1,
"field_2": null,
"field_3": 3
}
]
With the headers i get this, which looks a bit confusing to me:
[
{
"name": 1,
"children": 2,
"thing": 3,
"field_3": null
},
{
"name": null,
"children": null,
"thing": 2,
"field_3": 3
},
{
"name": 1,
"children": null,
"thing": 3,
"field_3": null
}
]
I think the 2nd output is more what I was describing. I don't think it's wise to assume every file has headers, hence the point I was trying to make about having decoder specific options to control this kind of behavior. It's probably fine to start with assuming headers are set. Someone can add headers like this:
faq -fcsv -ojson '.' <(echo col1,col2,col3; cat my-csv-file.csv)
With separators auto-guessing and optional overrides: https://en.wikipedia.org/wiki/Comma-separated_values
faq -f json -o csv < file.json