jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.38k stars 1.58k forks source link

Querying Nested objects #319

Closed vito-c closed 10 years ago

vito-c commented 10 years ago

Let's say I have: echo '{ "a":{"b":{"c":{"d":10}}}}' | jq '.[][][].d' which returns the value 10 is there anyway to do this with some kind of wild card? Like echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '.*.d' ? :)

Also is there a way to have it query for all objects that match? echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '.*.d'

I was lastly wondering what you guys thought of json path? would be neat if you could incorporate that into jq

nicowilliams commented 10 years ago

In master you can use .. as the wildcard you want: jq '..|.d', and it will return both d's in your example.

Which JSON path do you mean? http://goessner.net/articles/JsonPath/ ? Or JSON Pointer?

vito-c commented 10 years ago

I just installed master with brew install --HEAD jq Here are the results:

echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '..|.d' null null null 10 jq: error: Cannot index number with string 5 jq: error: Cannot index number with string

echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '..d' error: syntax error, unexpected IDENT, expecting $end ..d 1 compile error

Yah the goessner.net article json path

ghost commented 10 years ago

I don’t think that’s a bug: to improve readability, you’re supposed to do .. | .d instead of ..d.

vito-c commented 10 years ago

@slapresta could you paste the command you ran and the output. Using '.. | .d' gives me the same output.. and the second one echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '..d'

echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '.. | .d' null null null 10 jq: error: Cannot index number with string 5 jq: error: Cannot index number with string

and this

echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '..d'

just gives me the errors stated above

ghost commented 10 years ago

Oh, right! Sorry, I misunderstood you.

.. operates recursively over all elements in .. This means that, when you run .. | .d on your example, it will run the instruction .d over the following elements:

{"a":{"b":{"c":{"d":10}},"e":{"d":5}}}
{"b":{"c":{"d":10}},"e":{"d":5}}
{"c":{"d":10}}
{"d":10}
10
{"d":5}
5

This should explain the errors you're getting. In order to make the recurse operator useful, the ? operator was added recently. See it in action:

jq '.. | .d?'

null
null
null
10
5

If you want to get rid of those pesky nulls, try filtering it with select(type != "null").

vito-c commented 10 years ago

ic that works but it seems like a lot to write when you want to use a recursive wild card the whole point is to have like it be short

echo '{ "a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '..|.d? | select(type != "null")'
10
5

I was hoping jq '..d' would do that

nicowilliams commented 10 years ago

You can ..|.d?|select(. != null). I.e., . != null is the same (but shorter) as type != "null". I don't think that's a lot to type, but we may end up adding shorthands for select(<type>) that make it shorter. I don't think we'd want the .field? syntax to omit nulls: what if ..|.d did have a null value?

vito-c commented 10 years ago

maybe there should be a --prune or something that automatically strips out nulls. Then it's a flag the user would have to knowingly set? How would I use select( . != null ) here?

echo '{ "id":123,"a":{"b":{"c":{"d":10}},"e":{"d":5}}}' | jq '{ id:.id, d:..|.d? }'
nicowilliams commented 10 years ago

.a can produce null if either . is an object and has no value named "a", or if it does have such a value and the value is null. This is ambiguous, and this is bothering you. It has bothered me too, actually. jq has a has builtin that helps, but it's not really enough. Maybe the .a? form should not produce null when . doesn't have an "a" name... I'll think about it; at first glance that seems like it would be useful. I would insist on .a outputting null when .a really is a null value though.

ghost commented 10 years ago

@vito-c on solving your specific use case, you'd put select(. != null) inside the d value:

echo '{ "id": 123,"a": {"b": {"c": {"d": 10}},"e": {"d": 5}}}'
| jq '{ id: .id, d: .. | .d? | select(. != null) }'
{
  "id": 123,
  "d": 10
}
{
  "id": 123,
  "d": 5
}
vito-c commented 10 years ago

@slapresta Thanks! I didn't realize you could put a query in the { } ... jq is super powerful!

@nicowilliams is there a way to distinguish between a field that is a null value and an object that doesn't exist (in the code)?

If there is maybe you could change a null object query to display undefined.

For example: currently:

> echo '{ "a":null ,"b":1}' | jq '.a'
null
> echo '{ "a":null ,"b":1}' | jq '.c'
null

purposed:

> echo '{ "a":null ,"b":1}' | jq '.a'
null
> echo '{ "a":null ,"b":1}' | jq '.c'
undefined

Then you could have a prune flag to to hide all undefined outputs? Maybe it's enough to have a flag that allows you to prune all nulls?

lluchs commented 10 years ago

undefined isn't valid JSON though.

vito-c commented 10 years ago

@lluchs according to jsonlint.com neither is just null ;) but {"a":null} is valid json. However the issue we are trying to address is when you filter/query a json string and ask for object that does not exist on the string.

> echo '{ "a":null ,"b":1}' | jq '.c'
?????

If we are assuming that the default value for any field on a json object is null then I guess I can see why echo '{ "a":null ,"b":1}' | jq '.c' = null but the discrepancy I can see here is that c never existed on the original string. By returning null here we are implying that the original string was '{"a":null, "b":1,"c":null}'.

That's why I suggested undefined as a way to differentiate between a null value and a field that doesn't exist.

The other suggestion was to add a flag to prune all nulls

nicowilliams commented 10 years ago

@vito-c JSON now allows non-array/object top-level values, see https://tools.ietf.org/html/rfc7159 . So jsonlint.com is wrong :)

Anyways, jq's object index operation deliberately outputs null if the name (key) doesn't exist in the object. There is a has built-in you can use to test if a key exists, so you could write .a?.b?|select(.!=null and has("c")).c instead of .a?.b?.c? to distinguish "c doesn't exist in .a?.b?" from "c exists in .a?.b? and its value is null".

jq only deals with values that can be expressed as JSON texts, therefore adding an "undefined" value that is distinct from null is NOT an option.

The only options here would be:

or

I think we should be very reluctant to add new syntax unless it's a very clear improvement (@slapresta's suggestion of Coffescript-style ? was very good). I don't really find this ambiguity as to key existence to be a problem for me, so I'd be loath to add new syntax. I might go for changing the ? syntax's semantics though. Convince me :)

vito-c commented 10 years ago

From http://coffeescript.org The Existential Operator: CoffeeScript's existential operator ? returns true unless a variable is null or undefined, which makes it analogous to Ruby's nil? also relevant is the accessor variant of the existential operator ?.

From the Site: The accessor variant of the existential operator ?. can be used to soak up null references in a chain of properties. Use it instead of the dot accessor . in cases where the base value may be null or undefined. If all of the properties exist then you'll get the expected result, if the chain is broken, undefined is returned instead of the TypeError that would be raised otherwise.

My knowledge of Coffee script is pretty limited but it sounds like ?. might be the syntax we are looking for?

nicowilliams commented 10 years ago

@vito-c Sure, but the CoffeeScript analogy only goes so far. First, I made ? a filter, not a predicate, second jq has no notion of undefined, third there concept of variable is very different, and fourth, CoffeScript's existential operator has the same ambiguity as in jq!

"unless a variable is null or undefined" -- same ambiguity :)

vito-c commented 10 years ago

That's why they have the ?. syntax in coffee script to "soak up null references" However I feel like this syntax is combining two operators. Also . has a very special meaning in jq so I don't think this syntax fits very well.

Since we are talking about question mark operators I did some searching on google: http://en.wikipedia.org/wiki/Null_coalescing_operator http://en.wikipedia.org/wiki/Elvis_operator

It's funny how the ternary operator is short hand for an if statement and then these are short hand for a ternary operator. The best syntax I could think of is having be a short hand for | select(. != null) then you could append it to any filter. LIke .d?☠ or even just .d☠ You'd probably have to I'm not sure what a good operator would be so I just put ☠ ... what about ^?

nicowilliams commented 10 years ago

Hmmm , github seems to have lost the gist of your suggestion... View this with a browser: https://github.com/stedolan/jq/issues/319

(It's even worse in the github e-mail.)

nicowilliams commented 10 years ago

I think something like .a?: would be awful. I'm not sure what to do about this, if anything. I'll await @stedolan 's input.

vito-c commented 10 years ago

@nicowilliams Not sure I follow you with github losing the gist of my suggestion? I didn't post an actual gist ro do you just mean my previous comment? I did accidentally close and open the issue though :) The special character I posted probably got all mangled.

nicowilliams commented 10 years ago

I meant "gist" in the sense of the English word, not in the sense of "github gitst".

Ah, you meant to use a non-ASCII character. Well. I've certainly been waiting for the day popular programming languages start using a larger repertoire of operator/punctuation characters, but the lack of universally simple and memorable input methods for such things is enough of a problem that that day still seems far off :(

vito-c commented 10 years ago

oh no I wasn't saying use a non-ascii character like though that would be pretty hilarious! Just some character (any character really) to allow for short-handing the stripping of nulls

vito-c commented 10 years ago

@slapresta @nicowilliams how do I filter out [] from the list of results using select?

ghost commented 10 years ago

How about select(. != [])?

vito-c commented 10 years ago

ok so this works: ... select( . != null ) | select( . != [])' but not this: .. select( . != null && . != [] )'

ghost commented 10 years ago

Use and instead of &&.

vito-c commented 10 years ago

@nicowilliams

If you do decide to add extra syntax to soak up these null references like coffee scripts ?. then should it also filter out empty arrays and empty objects?

ghost commented 10 years ago

No way. [] and {} are valid values. Empty, but valid. It's like asking for it to filter '' or 0.

vito-c commented 10 years ago

@slapresta yah I guess you are right, but when I'm filtering large json objects {} and [] don't really provide anything useful. I'll just filter them out. I think I'll just make a function in bash that strips them all out

nicowilliams commented 10 years ago

There's a difference between absence and emptiness. Sometimes making that ambiguous might help, but by and large I'd not want to add more such cases to jq.

vito-c commented 10 years ago

[ 0 ] and [] I'm also getting that as output =( what does the [0] mean?

gberche-orange commented 9 years ago

I had a similar need of selecting a field within a nested json graph, and pruning out empty arrays and nulls. The syntax proposed in this thread is fine except that its verbose. I therefore shorthanded it into a bash function:

#jqe as jq nested expression
 function jqe { jq "..  |  ${1}? | select(. != []) | select (. != null)"; }
$ cf curl /v2/service_plans/aa5704e9-1957-4e20-8cec-242fb803f8e9/service_instances  | head -n 20
{
   "total_results": 9,
   "total_pages": 1,
   "prev_url": null,
   "next_url": null,
   "resources": [
      {
         "metadata": {
            "guid": "40df65dc-571b-422c-9a41-6cb5218f0f7d",
            "url": "/v2/service_instances/40df65dc-571b-422c-9a41-6cb5218f0f7d",
            "created_at": "2014-12-02T17:30:38Z",
            "updated_at": null
         },
         "entity": {
            "name": "mysql-spring-travel",
            "credentials": {},
            "service_plan_guid": "aa5704e9-1957-4e20-8cec-242fb803f8e9",
            "space_guid": "8d74f1c3-e005-42ad-bd31-675d5dc0e1de",
            "gateway_data": null,
[...]

$ $ cf curl /v2/service_plans/aa5704e9-1957-4e20-8cec-242fb803f8e9/service_instances  | jqe .space_guid
"8d74f1c3-e005-42ad-bd31-675d5dc0e1de"
"32056146-258b-48cf-b0be-a85f059331d6"
"0551cebc-847b-491b-b408-072e20d7485d"
"c2a89e41-2494-4d7b-a59f-2011e1624cd1"
"7c7906b5-305a-40fb-8004-20805a689d67"
"ec7a94e7-ed24-4a9b-a81f-b7b66665481a"
"ec7a94e7-ed24-4a9b-a81f-b7b66665481a"
"a7e6d95a-dc3e-497c-8a4b-e6ae21e9ad00"
"90d6b238-3f79-46f4-82f7-8dc93e9dfda8"

This enables me to nicely script REST APIs browsing

 cf curl /v2/service_plans/aa5704e9-1957-4e20-8cec-242fb803f8e9/service_instances  | jqe .space_url | xargs -n 1 cf curl | jqe .name

Maybe a new jq command line option would be useful to some users to avoid such custom shorthands

pkoppstein commented 9 years ago

@gberche-orange wrote:

The syntax proposed in this thread is fine except that its verbose.

Defining a bash function as you have done is a good way to avoid verbosity.

Your jq could be made slightly less verbose by using 'values':

..  |  .space_url? | values | select(. != []) 

or a single select:

..  |  .space_url? | select( . != null and . != []) 

A case can certainly be made for something shorter than .space_url? | values but, as has been pointed out in this thread and elsewhere, in the JSON world there is nothing special about the combination (null or []). In jq and elsewhere, null and false are paired as the "non-truthy" values, but 0, [] and {} are all "truthy".

By the way, for anyone who may not have noticed, jq "master" provides scalars_or_empty in addition to the type-based filters provided by jq 1.4.