jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.33k stars 1.57k forks source link

Constants (def $foo: EXP;) #826

Open pkoppstein opened 9 years ago

pkoppstein commented 9 years ago

Sometimes one would like to be able to invoke a constant function multiple times without having to incur the cost of computation more than once. A trivial but perhaps fairly common case is:

def pi: 1|atan *4;

A more important example would be a dictionary that must be constructed.

Of course, there are workarounds, but a simple solution consistent with jq semantics would be to allow a function of arity 0 to be annotated so that once its value has been computed, no further computation is required, e.g.

def final pi: 1|atan * 4;

Two other minimally disruptives options would be (using the 'def pi' example):

let pi: 1|tan*4; # Swift-like

or:

defconst pi: 1|atan * 4;

I realize this could be generalized, but the case for the more generalized version is at best more complex.

ghost commented 9 years ago

I like the idea of using "let" for filters that can't use the . argument.

nicowilliams commented 9 years ago

I've been thinking that i want something like def $name: EXP; as the syntax for this. With built-ins that can do I/O, like inputs checking that an expression is constant is tricky...

This is for after 1.5.

nicowilliams commented 9 years ago

With better constant expression folding we could totally do what @pkoppstein wants, though def $name: ... will yield faster code, and won't require writing code to detect that . is not used. The thing is that we'll have either defer computing the value till first run, run these expressions at compile time, or make constant folding really smart. If i code this I'll go for the latter, i think, but it does seem a bit brittle.

pkoppstein commented 9 years ago

@nicowilliams - Your idea of 'def $name' fits in nicely with the existing '--arg/--argfile/--argjson' framework, and I suppose that means compile-time evaluation, though I did have lazy (on-first-invocation) evaluation in mind.

For compile-time evaluation, it would be important to ensure that the --arg/--argfile/--argjson variables be setup first so that the constant-functions can use them.

Perhaps we could have 'def $name' in jq 1.5, with the possibility of a different syntax for lazy evaluation for some future release?

nicowilliams commented 9 years ago

Hmmm, if we want --arg* globals to be visible we'd have to either change how they are bound (we'd have to pass along an object containing all those vars to a lot of the functions in compile.c, probably all of them), or we'd have to eval on first reference, which is very hard because you see, constants compile down to a single opcode (LOADK) with an immediate operand (the constant). But the first part doesn't work either because of the order in which bindings take place (right to left): we can't know at constant-fold time whether a symbol reference will be global or not. So this is all rather complex. Constant folding might have to move to block_compile() so that it can work with fully-bound programs. Ay.

Don't expect this any time soon.

nicowilliams commented 9 years ago

We could have def $name: <constant exp>; in 1.5, but in 1.5 it could only support simple arithmetic and nothing else (because anything else would require much more work).

nicowilliams commented 9 years ago

Moving constant folding to block_compile() will have other benefits, such as allowing one to redefine binary operation builtins (though one should not do this).

nicowilliams commented 9 years ago

One other important benefit of deferring constant folding as much as possible is to avoid unnecessary computation, since we drop unreferenced symbols' implementations.

What should happen in a case like def $foo: range(5);? $foo can only have one value (EDIT: either throw an error or take just the first value). Allowing the full range of expressions in the definition of a constant also means we have no choice but to interpret its byte-code to evaluate it the first time. Perhaps the best way to handle this then is to have an opcode like "CALLK" that provides a slot for memoizing a constant. Then we can keep the existing simple constant folding as-is for now.

We'll need to label all C-coded builtins as to whether they are deterministic. Currently we only really have one non-deterministic C-coded builtin: _input.

pkoppstein commented 9 years ago

What should happen in a case like def $foo: range(5);

An even worse case is def $foo: empty; since in that case there is no first item in the stream to use as the constant.

From my perspective, though, it's far more important to support the creation of valid constant functions than to flag invalid attempts to create a constant function. In fact, taking @slapresta's observation one step further, I'd suggest that 'let : ...;' be viewed as a declaration that is a constant function. If an invalid function body is specified, then the behavior can be declared to be "undefined".

nicowilliams commented 9 years ago

@pkoppstein

If it's a constant value, then the name should be $whatever. Functions, recall, take inputs, so we'd have to be careful that the input isn't actually used. Whereas $foo is always a constant.

let $foo: ...; or def $foo: ...; differs only in a keyword.

How we handle errors is rather important, IMO. Adding obscure error cases is not a good idea.

Clearly, if there's more than one value we can just take the first or error.

If there are no values we can error.

Either way, undefined it won't be.

pkoppstein commented 9 years ago

@nicowilliams wrote:

If it's a constant value, then the name should be $whatever.

I'm fine with that, but since (as you say) "functions take inputs, so we'd have to be careful that the input isn't actually used", I think (along with @slapresta), that 'let $foo: ...' would be wise. (We could alternatively adopt the rule that 'let' sets the input to null if you like.)

Clearly, if there's more than one value we can just take the first or error. What you wrote seemed to indicate you wanted an exclusively (?) compile-time analysis.

nicowilliams commented 9 years ago

Don't focus on the keyword. That's not important. I like def $foo: ...; just because.

The code that outputs the constant does need an input value, yes. That input value should be null.

I think the only question is how to implement. I'll not think about that for a while, so we're done for now until someone submits a PR.

nicowilliams commented 9 years ago

I don't know how to write a better analysis than running the bytecode (but only if the constant is referenced). I think that's fine.

pkoppstein commented 9 years ago

I think that's fine.

Yep. Thanks!

nicowilliams commented 9 years ago

This would be a very convenient way to implement #820.

nicowilliams commented 9 years ago

I think the high-level implementation plan is as follows:

The details are another story.

EDIT: This means that multiple outputs (def $foo: range(5);) will be allowed, but only one will ever be produced -- only one will be taken. Not outputting any values (def $foo: empty;) will be an error.

nicowilliams commented 9 years ago

Implementation-wise, perhaps just have one new opcode that is like LOADV and differs only in that it skips the immediately following CALL_JQ and STOREV instructions when the var has already been set, else it does not skip them. The call would be to first(EXP), and if EXP produces no values... well, it's OK because we can then have a special value that backtracks (like Icon's &fail, oddly enough). I.e., def $foo: empty; . + $foo` -> nothing, no value, no error.

nicowilliams commented 9 years ago

I thought this might be easy enough, but there's tricky bits, so just so you know, there's no chance this will make 1.5.

I fixed one of @pkoppstein's module issues and I think we're ready for 1.5rc2. I plan on making the 1.5rc2 release sometime this week and we'll let it steep for a month.

I still need to finish merging @joelpurra's PRs, I know. They may make 1.5 but not 1.5rc2.

Bang away, especially at the streaming bits.

pkoppstein commented 9 years ago

@nicowilliams wrote:

I fixed one of @pkoppstein's module issues

Thank you for restoring the ability to import into the caller's namespace. That's great, but since you seem to have a few moments to spend on jq, I was hoping you'd be able to ponder the still less-than-ideal state of affairs. As I understand things, currently the importer has to choose between:

a) import "library" as whatever; # perfect

b) import "library"; # import into current namespace

c) import "library" as library; # a tad tedious

I've always felt that the module writer should be able to specify a default namespace name; within the current jq framework, that could come from module {"name": _}. If the default namespace name comes from the module declaration, then import "library"; would most naturally mean "use the default".

Thus even if the implementation of a mechanism for supporting a default namespace name is postponed, I think it would be better to adopt some other syntax for importing into the current namespace. My current favorite is import STRING as null.

nicowilliams commented 9 years ago

How about include "stuff"; for importing into the namespace and import "stuff"; for importing with the module's basename as the prefix?

nicowilliams commented 9 years ago

@pkoppstein I'm out of time in 3, 2, 1, ...

EDIT: I'll build a 1.5rc2, and then wait a while.

nicowilliams commented 9 years ago

I want to declare a feature freeze. From here until 1.5 is release, only build and test improvements will go in, module system improvements, and bug fixes.

pkoppstein commented 9 years ago

How about include "stuff"; for importing into the namespace and import "stuff"; for importing with the module's basename as the prefix?

Having one's cake and eating it too? Beautiful!

nicowilliams commented 9 years ago

aight, that'll be it.

nicowilliams commented 9 years ago

@pkoppstein include pushed.

pkoppstein commented 9 years ago
$ jq -n -r 'include "Abracadabra"; hi'
Thank you!
Sincerely,
~/.jq/Abracadabra/Abracadabra.jq
nicowilliams commented 8 years ago

We should mark builtins like inputs as being non-deterministic and then have the compiler do much more optimizations -- more constant folding. Currently that's hard because the compiler would have to do the same things that the interpreter does to evaluate code, but still. The point is, I don't think I want users to have to ask for the compiler to define constants based on arbitrary expressions; I want the compiler to just do it when it can.

nicowilliams commented 8 years ago

And we really should move all bytecode optimizations to work on blocks.

nicowilliams commented 7 years ago

My current thinking on this is that the compiler should automatically bracket constant expressions (ones using only constants and only deterministic functions) with two special instructions: one to elide (i.e., jump past) the whole computation if it has been completely memoized, and the other to save an output from the computation. A computation would be completely memoized only when the first instruction is backtracked past. There should be a maximum number of memoized outputs; if too many are produced then memoization is disabled for that expression.

This can take care of constant folding.

One option would be to not do this by default and instead provide const/1 builtin that applies memoization to a given expression (passing it null as its only input). This would apply even to non-deterministic expressions, thus allowing one to have a constant random number, for example (assuming we have a random builtin).

emanuele6 commented 10 months ago

This predated as $foo I think

pkoppstein commented 10 months ago

@emanuele6 wrote:

This predated as $foo I think

No, and I'd like this issue or an equivalent one (regarding optimizing constant expressions) to remain open.

@nicowilliams suggested one option would be to introduce def $foo: so taking that as a starting point, let me highlight some of the differences between def foo: and the hypothetical def $foo::

def $foo: would be somewhat akin to--argjson foo in that $foo would be constant and computed just once.
jq allows multiple occurrences of def foo: but would only allow one top-level def $foo:.

The body of def $foo:, however, would allow some computation to take place, e.g. involving variables defined on the command line.

The tight restrictions on the body of def $foo: and other considerations argue in favor of adopting a different syntax, perhaps:

const $foo: BODY;

emanuele6 commented 10 months ago

@pkoppstein

I don't see how this makes sense.

It is either exactly the same as BODY as $foo, or exactly the same as [ BODY ] as $foo if you use $foo[].

In the latter case, I don't see how it could end up being more optimised than $foo[] if you use a special syntax.

pkoppstein commented 10 months ago

@emanuele6 - Take the simple case I mentioned at the outset: the expression 1|atan *4. Let's call that expression BODY, which we suppose evaluates to a single value and does not involve any input, though it need not be purely functional, e.g. it might be now.

What this ER essentially calls for is being able to do the equivalent of jq --argjson pi $(jq -n BODY) ... that is, to have BODY evaluated just once AND to have it globally available AND constant (no redefinition allowed).

No doubt you have in mind an alternative along the lines suggested by:

(1|atan*4) as $pi
| def x($a): $a*$pi;
def y($a): $a*$pi;
(x(1), y(2))

But as an alternative to a declaration such as const $foo: BODY; that seems rather obscure and awkward at best, and wouldn't be much use in general, e.g. if one wanted to have $pi defined in ~/.jq, or available in several included files.

emanuele6 commented 10 months ago

I see. So the only reason to add this is to allow declaring variables in modules and ~/.jq.

nicowilliams commented 9 months ago

This is really just memoization.

Maybe we could have a memoize(exp) function, and then def $foo: exp; could be just syntactic sugar for def foo: memoize(exp);. That wouldn't save us the function call to foo/0, but it would be easy enough to add. What would be neater is if the compiler could evaluate foo/0 at compile-time, but that's significantly more work. Neater than the first and not as neat as the second would be to have def $foo: exp; automatically evaluate exp with null as input the first time that $foo is referenced, but otherwise have $foo be a global binding as if provided with --argjson foo ....

I don't see why we should disallow shadowing of these global constants -- we don't disallow shadowing for anything else.

nicowilliams commented 9 months ago

Discouraging search: https://www.google.com/search?q=overlength+c+string+literals