Open pkoppstein opened 6 years ago
The documentation is inaccurate. The RHS of update-assignment doesn't receive the LHS as input. The actual behaviour is more like [length] as $_ | .VALUE |= .+$_
.
It probably makes more sense to update the documentation than break compatibility, since this is arguably the more useful operation: you're more likely to be doing arithmetic on values at the same level than, say, merging a field of an object with the object itself. e.g. jq -nc '{a:3,b:4}|.a+=.b'
And the alternatives to achieve this behaviour are all much longer, compared to just two extra characters to achieve the documented behaviour.
Actually, I think this is a bug indeed!
$ jq -c '(.VALUE) += [debug|length]' <<< '{"DATE":"September","VALUE":[5,17,8,9]}'
["DEBUG:",{"DATE":"September","VALUE":[5,17,8,9]}]
$ jq -c '(.VALUE) |= .+[debug|length]' <<< '{"DATE":"September","VALUE":[5,17,8,9]}'
["DEBUG:",[5,17,8,9]]
{"DATE":"September","VALUE":[5,17,8,9,4]}
The problem is that gen_update()
doesn't put the val
block in the update lambda, but computes it first with the same input (.
) as the LHS:
static block gen_update(block object, block val, int optype) {
block tmp = gen_op_var_fresh(STOREV, "tmp");
return BLOCK(gen_op_simple(DUP),
val,
tmp,
gen_call("_modify", BLOCK(gen_lambda(object),
gen_lambda(gen_binop(gen_noop(),
gen_op_bound(LOADV, tmp),
optype)))));
}
Compare to the |=
operator:
Exp "|=" Exp {
$$ = gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3)));
} |
I don't know if we can fix this now. It's always been thus. @stedolan?
I think this should not be considered a bug.
I have always used non-|=
assignment operators assuming the right hand side is evaluated once using .
as input, just like for =
.
Maybe this is not how it worked in jq 1.4
or earlier, but I don't think this behaviour should be changed now.
Re-thinking this now, lhs += rhs
is really taking whatever rhs
is and adding it to whatever lhs
is, but what should be the input to rhs
? Well, "who cares" is a pretty good answer -- it could be anything at all and the +=
operator should still work as intended, as long as the rhs
ignores its input. Typically the rhs
will just be a literal, and literal values ignore their input. Thus, whatever that input is today, that is just fine.
I think perhaps it would have been best to pass the current value of the lhs
to the rhs
so that one could write expressions like .a += if iseven then 0 else 1 end
.
On the other hand, we have documented what the input to the rhs
:
I think most likely we could make the code match the docs and it should more than likely not break any existing jq code because most jq code using +=
and friends probably use RHS expressions that ignore their input.
OTOH, this has been like this for a long time, and we could just leave it as-is and instead fix the docs to say that the input to the RHS is unspecified, thus letting us wait until whatever release follows 1.7 to change this.
For reference https://github.com/jqlang/jq/issues/2407 might similar or dup of this? also https://github.com/jqlang/jq/issues/2410 about //=
might be interesting
For what it's worth, I stand by the position that it's more useful to document the current behavior for the arith-assign ops so that e.g. {a:1,b:2}|.a+=.b
yields {"a":3,"b":2}
, which is an occasionally useful operation that's substantially less intuitive to express otherwise, especially if the lhs is a nontrivial expression. I'm having a lot more trouble coming up with a use for the documented behavior for the arith ops.
For the //= issue, my (not too considered) thought is that the best unifying semantic might be for lhs op= rhs
to be the equivalent of . as $_|lhs |= . op ($_|rhs)
, which if I'm not misthinking something should provide identical results to current behavior for the ops that always evaluate the rhs, while having a more useful behavior for //=.
For reference #2407 might similar or dup of this? also #2410 about
//=
might be interesting
~Yes, //=
in particular is really busted:~
//=
would be silly to send the current value to the rhs
since that value would be null
or the lhs
is not even set.
@nicowilliams A common way I use +=
is .foo += (-1, 0, 1)
; example in my day 14 solution for advent of code 2022 to generate the possible directions in which the grain of sand can fall (this is the version that draws an animation in the terminal if used as ./foo.jq input | awk -v RS= '{ system("clear") } 1; { system("sleep .1") }'
)
#!/bin/sh --
# \
exec jq -nRrf "$0" -- "$@"
[
inputs / " -> " |
map(. / "," | map(tonumber)) |
range(0; length - 1) as $i |
.[$i:$i + 2]
] |
(map(.[][0]) | min) as $minx |
(map(.[][0]) | max) as $maxx |
(map(.[][1]) | max) as $maxy |
[ 500 - $minx, 0 ] as $sandIn |
.[][][0] -= $minx |
reduce .[] as $line ([];
def xrange: $line | map(.[0]) | range(min; max + 1);
def yrange: $line | map(.[1]) | range(min; max + 1);
.[xrange][yrange] = "#"
) |
def gridToString:
transpose |
(.[][] | nulls) = "." |
map(join("")) |
join("\n");
{ grid: ., $sandIn, units: [] } |
while(.grid[.sandIn[0]][.sandIn[1]] == null;
.units += [ .sandIn ] |
def directions($grid):
select(.[1] <= $maxy) |
.[1] += 1 |
.[0] += (0, -1, 1) |
select(.[0] < 0 or $grid[.[0]][.[1]] == null);
reduce (.units | keys_unsorted[]) as $k ({ offset: 0, v: . };
def unit: .v.units[$k + .offset];
.v.grid as $grid |
[ unit | first(directions($grid)) ] as $dir |
if $dir == [] then
.v.grid[unit[0]][unit[1]] //= "o" |
del(unit) |
.offset -= 1
else
unit = $dir[0] |
select($dir[0][0] < 0) |= (
.v.grid |= [ [] ] + . |
(.v | .sandIn, .units[])[0] += 1
)
end
) |
.v
) |
reduce .units[] as $u (.grid; .[$u[0]][$u[1]] = "%") |
gridToString,
""
Some example inputs (describe the map in which the sand falls):
498,4 -> 498,6 -> 496,6
503,4 -> 502,4 -> 502,9 -> 494,9
480,10 -> 500,10
510,10 -> 520,10
480,20 -> 490,20
500,20 -> 500,18 -> 520,18
But I have many other more pratical examples in which I use .foo += manyvalues
to generate multiple values with a modified .foo
. This is way more useful than having it work as .foo |= . + (-1, 0, 1)
so as just .foo += -1
.
In my scripts, I can also find many instances of me using .foo += [ if something then x else y end ]
where something
, uses .
as input and x
, y
are either constants or expressions that use .
as input, not .foo
.
Also, if you need something like .a += if iseven then 0 else 1 end
, you should just use (.a | select(iseven | not)) += 1
;)
Oh, hmm, no, //=
is not particularly busted.
@emanuele6 and @muhmuhten have the winning arguments. Let's fix the docs then.
//=
would be silly to send the current value to the rhs since that value would be null or the lhs is not even set.`
It could also be false
, sadly //
checks trurthiness, not non-null :(
Note that all non-|=
assignment operators evalute the rhs
first, and stop running if rhs
is empty, and run multiple times if rhs
returns multiple values since rhs
is a $
argument for _assign/2
.
What used to confuse me a bit is that rhs
is evaluated even if lhs
doesn't generate any paths, but that kind of make sense to be honest; also I often exploit that behaviour as:
empty = (.bar | map(.baz) | debug)
As a more convenient way to use debug
with an expressions instead of using (.bar | map(.baz) | debug | empty),
as I mentioned in https://github.com/jqlang/jq/pull/2478#issuecomment-1236186328; so I kind of like that behaviour now.
Oh, yeah, I definitely overlooked |=
's (extremely janky, mind—check your expectation of what [range(10)]|.[] |= empty
should yield; I feel like that behavior is only predictable by knowing exactly the implementation detail that |= updates by key in forward order) behaviour with non-one rhs output values. Perhaps the documentation should be more along the lines of "lhs op= rhs
is equivalent to lhs = lhs op b
, except that lhs is evaluated only once(?)`?
Unifying //= to not evaluate in a way that doesn't bork += manyvalues
looks something like lhs op= rhs
→ . as $o|reduce path(lhs) as $p (.; setpath($p; getpath($p) op ($o|rhs)))
, which is probably a new builtin and ... honestly, I'm not sure if that's worth a breaking change.
And yeah, sending empty to the rhs of //= would be spectacularly silly :p
//=
would be silly to send the current value to the rhs since that value would be null or the lhs is not even set.`It could also be
false
, sadly//
checks trurthiness, not non-null :(
I'm as annoyed by that as you.
Note that all non-
|=
assignment operators evalute therhs
first, and stop running ifrhs
is empty, and run multiple times ifrhs
returns multiple values sincerhs
is a$
argument for_assign/2
.
Yes.
What used to confuse me a bit is that
rhs
is evaluated even iflhs
doesn't generate any paths, but that kind of make sense to be honest; also I often exploit that behaviour as:
That's because null | path(.a)
matches just so you can set values at paths that don't exist yet.
empty = (.bar | map(.baz) | debug)
I'm failing to make sense of that.
As a more convenient way to use
debug
with an expressions instead of using(.bar | map(.baz) | debug | empty),
as I mentioned in #2478 (comment); so I kind of like that behaviour now.
I often write def debug($msg): ("\($msg): \(.)|debug|empty), .;
. That should be a builtin.
If someone is using .[] //= empty
to try delete null
s in the input array/object, they should use del(.[] | nulls)
or .[] |= values
(now that it is fixed). Even if //=
worked as .[] |= (. // empty)
, that would also delete false
so it is not good.
empty = (.bar | map(.baz) | debug)
I'm failing to make sense of that.
(.bar | map(.baz) | debug)
is always evaluted, and its result is discarded since empty
does not generate any path, so as long as you write an expression that always returns one value, you can use empty = (.bar | map(.baz) | debug) | ...
to print to debug output only the result of .bar | map(.baz)
instead of the entire input.
I find that more convenient to write than having to use (.bar | map(.baz) | debug | empty), (...)
, so I usually use that.
Oh, yeah, I definitely overlooked
|=
's (extremely janky, mind—check your expectation of what[range(10)]|.[] |= empty
should yield; [...]
@itchyny fixed that! It now yields []
.
Unifying //= to not evaluate in a way that doesn't bork
+= manyvalues
looks something likelhs op= rhs
→. as $o|reduce path(lhs) as $p (.; setpath($p; getpath($p) op ($o|rhs)))
, which is probably a new builtin and ... honestly, I'm not sure if that's worth a breaking change.
I've struggled with this before. Basically I try never to use non-literals as the RHS of op=
, and I mostly use |=
for everything else.
An interesting change we could make in the future would be to let the RHS see all of the value being "modified", the path where it's modified, and the .
of the LHS. To do this we'd have well-known $bindings
like $__path__
and $__dot__
. We'd have to be very careful to drop the $__dot__
reference before the setpath()
in _modify
/_assign
. The LHS value would be the input to the RHS, which would be a breaking change for op=
, but necessary if we want the RHS to see all three things and avoid unnecessary copies.
And yeah, sending empty to the rhs of //= would be spectacularly silly :p
:)
empty = (.bar | map(.baz) | debug)
I'm failing to make sense of that.
(.bar | map(.baz) | debug)
is always evaluted, and its result is discarded sinceempty
does not generate any path, so as long as you write an expression that always returns one value, you can useempty = (.bar | map(.baz) | debug) | ...
to print to debug output only the result of.bar | map(.baz)
instead of the entire input. I find that more convenient to write than having to use(.bar | map(.baz) | debug | empty), (...)
, so I usually use that.
: ; jq -cn '{bar:[{baz:1,foo:2}]}|(empty = (.bar | map(.baz) | debug))'
["DEBUG:",[1]]
{"bar":[{"baz":1,"foo":2}]}
?
Interesting!
Well, anyways, we should add a nicer version of debug
, because I do something very similar (see above).
@nicowilliams wrote:
we should add a nicer version of debug
I've found this debug/1 to be both simple and serviceable:
def debug(msg): (msg|debug) as $debug | .;
The nice thing is that it does not preclude a debug/2 :-)
Any other suggestions based on the existing debug
?
I've added wiki pages on the internals of assignments and path expressions.
@emanuele6 wrote:
Maybe this is not how it worked in jq 1.4 or earlier, but I don't think this behaviour should be changed now.
The above remark was not made w.r.t. |=
but (after all the bug fixes that have taken place since 1.5), we seem to have reached a stable equilibrium w.r.t. |=
too.
Please also consider that there has been a nice alignment between jq, gojq and jaq, in that all three give the same results for:
{a:1,b:2} | .a += .b
{a:1,b:2} | .a += empty
{a:1,b:2} | .a += (1,2)
In short, I think something like the following ought to be in the documentation:
V | (E += F) is identical to V | . as $V | (E = ($V|E) + ($V|F))
After all, jq is no longer in alpha :-)
p.s. Maybe these too:
V | (E = F) is identical to V | . as $V | (E = ($V|F))
V | (E |= F) is identical to V | . as $v | (E = ($V|E|F))
I've found this debug/1 to be both simple and serviceable:
def debug(msg): (msg|debug) as $debug | .;
I believe that will hold on to a reference on the message $debug
for as long as whatever is "to the right of" debug/1
is running.
def debug(msg): (msg|debug|empty), .;
should not.
But don't quote me on this. I'm a bit rusty.
@nicowilliams out of curiosity: i've usually used this def debug(f): . as $c | f | debug | $c;
also bad reference wise i guess? i'm still new to how jv
references and things are designed
@nicowilliams out of curiosity: i've usually used this
def debug(f): . as $c | f | debug | $c;
also bad reference wise i guess? i'm still new to howjv
references and things are designed
If you have stuff | debug(msg) | other_stuff
, the reference to $c
will survive until other_stuff
runs out of outputs, and also until f
(msg
) runs out of outputs. That's why I prefer to use (f | debug | empty), .
.
I've created #2708 related to this. The idea is that maybe //=
and <op>=
just have to stay the way they are but we could do something closer to |=
with new assignment operators. Maybe that's too ugly -- we can always just use functions considering that mostly that's what actually happens anyways.
ASIDE: Imagine a "pure" jq where there's no syntactic sugars. No if/then/elif-then/else/end
, no try
/catch
, no binary operators, no assignment operators. We could have a cond/3
builtin that takes the place of if
, and a try/2
and try/3
builtins that take the place of try
/catch
, and plain old jq-coded "core" functions like _assign
and _modify
(but with public names). That might be ugly compared to having special forms like if
/try
/<assignment-operators>
/<binary-operators>
, but as with Haskell and Lisp it may prove useful to have such a tiny core.
If you have
stuff | debug(msg) | other_stuff
, the reference to$c
will survive untilother_stuff
runs out of outputs, and also untilf
(msg
) runs out of outputs. That's why I prefer to use(f | debug | empty), .
.
Continued curiosity: would some basic static analysis be able to see that $c
don't "escape" outside debug
making def debug(f): . as $c | f | debug | $c
work equally well?
Continued curiosity: would some basic static analysis be able to see that $c don't "escape" outside debug making def debug(f): . as $c | f | debug | $c work equally well?
Yes, that could be done, and it would be amazing. This is why I have a branch with a JSON dump of block representation of programs: to help me reason about static analysis.
(Sorry I edited your comment, @wader, I meant to quote-reply and flubbed up. I've restored your comment.)
@nicowilliams no worries!
CORRECT:
INCORRECT:
This behavior can be observed with jq1.4, jq 1.5, and the master version (jq-1.5rc2-245-g7b81a83).