jzelinskie / faq

Format Agnostic jQ -- process various formats with libjq
Apache License 2.0
439 stars 14 forks source link

CSV #79

Open eadmaster opened 4 years ago

eadmaster commented 4 years ago

With separators auto-guessing and optional overrides: https://en.wikipedia.org/wiki/Comma-separated_values

faq -f json -o csv < file.json

jzelinskie commented 4 years ago

I think this is a reasonable addition, given the existence of a good Go CSV library. Suggestions welcome!

eadmaster commented 4 years ago

converting csv->json should be a no brainer, while the opposite is not because csv only supports a flat schema.

Using the dot notation seem the most common solution for nested records:

Btw since this is rarely used i think you could just make the conversion fail in this case.

chancez commented 4 years ago

Encoding is indeed a huge pain. It may be best if you just start with decoding only.

When I tried to implement this and I found it to be a pain because you have to basically decide how to handle non-arrays and arrays of non-homogenous objects. It's probably okay to simply fail in many cases, but limits the usefulness quite a bit.

For decoding, you'll also need to think about the following:

eadmaster commented 4 years ago
* How do you handle headers missing if you're producing an array of objects? Eg probably just set the objects fields to something like `0`, `1` or `field1`, `field2`.

An array of objects should be nicer.

* How do you handle headers missing if you're producing an array of objects?

you can omit the fields in the objects or leave them as empty strings.

e.g. you have this csv:

name,children,thing
1,2,3
,,2,3
1,,3

This is the catmandu convert CSV to JSON output:

[
  {
    "thing": "3",
    "name": "1",
    "children": "2"
  },
  {
    "children": "",
    "name": "",
    "thing": "2"
    // note the 3rd value is omitted here because it has no header
  },
  {
    "children": "",
    "name": "1",
    "thing": "3"
  }
]

Alternative with omitted missing fields:

[
  {
    "thing": "3",
    "name": "1",
    "children": "2"
  },
  {
    "thing": "2"
  },
  {
    "name": "1",
    "thing": "3"
  }
]
chancez commented 4 years ago

I wasn't talking about values missing, but headers. Eg: What does this CSV produce as JSON?

1,2,3
,,2,3
1,,3
eadmaster commented 4 years ago

catmandu just assume the first row is always the header.

$ cat test.csv
1,2,3
,,2,3
1,,3

$ catmandu convert CSV to JSON < test.csv  | jq .                                                                                                                                                                                                                                   
[
  {
    "1": "",
    "2": "",
    "3": "2"
  },
  {
    "3": "3",
    "2": "",
    "1": "1"
  }
]

Another command that is able to convert csv->json is rows (you can install via the python-rows package) :

$ rows  convert  test.csv  test.json
$ cat test.json | jq
[
  // same behavior here
  {
    "field_1": null,
    "field_2": null,
    "field_3": 2
  },
  {
    "field_1": 1,
    "field_2": null,
    "field_3": 3
  }
]

With the headers i get this, which looks a bit confusing to me:

[
  {
    "name": 1,
    "children": 2,
    "thing": 3,
    "field_3": null
  },
  {
    "name": null,
    "children": null,
    "thing": 2,
    "field_3": 3
  },
  {
    "name": 1,
    "children": null,
    "thing": 3,
    "field_3": null
  }
]
chancez commented 4 years ago

I think the 2nd output is more what I was describing. I don't think it's wise to assume every file has headers, hence the point I was trying to make about having decoder specific options to control this kind of behavior. It's probably fine to start with assuming headers are set. Someone can add headers like this:

faq -fcsv -ojson '.' <(echo col1,col2,col3; cat my-csv-file.csv)