multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Other
3.71k stars 151 forks source link

dsq failed to recognize nested field, if its parent object absents in some records #110

Open mon-jai opened 1 year ago

mon-jai commented 1 year ago

Describe the bug and expected behavior

dsq failed to recognize nested field if its parent object absents in some records.

Reproduction steps

dsq failed to retrieve location.city, if location is absent in some records.

user_addresses.json

[
  {"name": "Agarrah", "location": {"city": "Toronto", "address": { "number": 1002 }}},
  {"name": "Minoara", "location": {"city": "Mexico City", "address": { "number": 19 }}},
  {"name": "Fontoon"}
]

Result:

dsq user_addresses.json 'SELECT name, "location.city" FROM {}'
[{"name":"Agarrah","\"location.city\"":"location.city"},
{"name":"Minoara","\"location.city\"":"location.city"},
{"name":"Fontoon","\"location.city\"":"location.city"}]

Expected behavior:

dsq user_addresses.json 'SELECT name, "location.city" FROM {}'
[{"name":"Agarrah","location.city":"Toronto"},
{"name":"Minoara","location.city":"Mexico City"},
{"name":"Fontoon","location.city":null}]

Versions

eatonphil commented 1 year ago

Here's a way you can work around this:

$ dsq user_addresses.json 'select location->>"address" FROM {}'
[{"location->>\"address\"":"{\"number\":1002}"},
{"location->>\"address\"":"{\"number\":19}"},
{"location->>\"address\"":null}]
mon-jai commented 1 year ago

@eatonphil That is great. But it won't work if every record contains the location field, so won't work consistently on other datasets with the same schema.

Maybe we can add a flag to also include the string representation of objects, regardless of whether or not it can be extracted to sub-fields, like something called --preserve-json-string?

# location field will present in all situations
$ dsq user_addresses.json 'select location->>"address" FROM {}' --preserve-json-string
eatonphil commented 1 year ago

But it won't work if every record contains the location field, so won't work consistently on other datasets with the same schema.

I don't understand what you mean by this?