jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.46k stars 1.58k forks source link

Behavior clarification #212

Closed np closed 10 years ago

np commented 10 years ago

I'm not sure how to explain the following behavior for combinations of (,) and (+).

Why the first filter is not equivalent to the two others?

$ jq -n -c '[("a","b") | .+.]'      
["aa","bb"]
$ jq -n -c '["a" | (.,"b")+(.,"b")]'
["aa","ba","ab","bb"]
$ jq -n -c '[("a","b")+("a","b")]' 
["aa","ba","ab","bb"]

This question occured to me while toying with an implementation of jq in Haskell https://github.com/np/json-tools/blob/master/jq.hs

In the following I consider filters to be functions from list of values to list of values.

A natural semantics would be to have H=F+G defined as H input = [x+y | x <- F input, y <- G input]. The meaning of it would be to return x+y where x iterates over each of the results of F input, y iterates over the results of G input. One get to look at the results of G for each result of F. This semantics would make the first example above behave as the others.

The first behavior has a form of "zipping" semantics, see https://en.wikipedia.org/wiki/List_comprehension#Parallel_list_comprehension for an explanation.

I look forward to read more details about the semantics of these filters.

nicowilliams commented 10 years ago

The first case sends "a" then "b" to .+. (which then adds "a" to itself, then "b" to itself), and the results of that ("aa", then "bb") are gathered into an array.

The second and third cases involve a cartesian product because of the way , and + work. This is harder to explain, but let me show you something that might make it easier to understand

% jq -n '_plus(("a","b");("a","b"))'
"aa"
"ba"
"ab"
"bb"
% 

Another name for + is _plus. It's a function of three arguments that ignores its first argument (the one that is "piped" to it). In this case its two arguments are expressions that each produces two results, so the logical thing to do is to invoke _plus once for each pair of values output by the argument expressions. What jq is doing, effectively, is to run two co-routines to produce those two streams of values for _plus to add, but the second co-routine gets re-run once for each output of the first. Here's another example that might make things clearer:

% jq -n '_plus(("a","b");("1","2","3"))'
"a1"
"b1"
"a2"
"b2"
"a3"
"b3"
% 

I hope that clarifies things. The comma and semi-colon operators are kinda special (; isn't an operator, really, but anyways).

stedolan commented 10 years ago

A better semantics is to consider jq filters as functions from values to sets of values. The composition operator | is then roughly map-and-concat.

("a","b") | .+. is the composition of ("a","b"), which produces two results regardless of input, and .+., which produces one result (its input doubled). So, we get two outputs. It is equivalent to ("a" | .+.), ("b" | .+.).

Your second example, "a" | (.,"b")+(.,"b") is equivalent to your third, ("a","b")+("a","b"). The semantics of + (and function calls and other operators and indexing and essentially every operation in jq) are roughly a cartesian product: P + Q has the meaning \x -> [p + q | p <- P x, q <- Q x].

For the monadically inclined, that's bind in the monad ReaderT List. The prototype version of jq was in fact implemented in Haskell, and it's on the haskell-version branch of this repo if you want to poke around.

stedolan commented 10 years ago

OK, while we're talking about semantics: A jq filter is given a meaning in the domain of functions from JSON values to lists of JSON values. I'll use [[x]] for semantic brackets, and Haskell syntax for list operations. The semantics of a function [[f]] is an n-ary function on JSON values. Operators are another syntax for functions.

[[ . ]] x = [x]
[[ P, Q ]] x = P x ++ Q x
[[ P | Q ]] x = [ z | z <- Q y, y <- P x]
[[ f(P; Q; R; ...) ]] x = [ [[f]](p, q, r, ...) | p <- P x, q <- Q x, r <- R x, ...]
[[ [ P ] ]] x = [P x]
[[ .[] ]] x = x

There's a few more bits (and lots of builtin functions), but that's the gist.

np commented 10 years ago

Thanks for the explanations. So far I'm using functions from list to list (of JSON value), however I'm getting convinced otherwise. In particular you just made me figure out that my array building was wrong as I was collecting all the inputs.

Lukily I also figured out how to get the right semantics (so far) in my Haskell implementation.

np commented 10 years ago

Thanks to your comments I now have a much nicer implementation. I manage to support --run-tests such that I can also run your test suite. Although it doesn't support all the features, all the other tests passes, as shows this output:

https://github.com/np/json-tools/blob/master/tests/hjq-test.out