jsonata-js / jsonata

JSONata query and transformation language - http://jsonata.org
MIT License
2.05k stars 217 forks source link

Inconsistency in array construction #160

Open xogeny opened 6 years ago

xogeny commented 6 years ago

I've been having quite a bit of trouble understanding the evaluation semantics for arrays (in particular in combination with paths). I think I've narrowed down my concern to something relatively simple. Consider the following expressions and their evaluations (no inputs are necessary):

Expression Result
[["hello", "hello"], ["world", "world"]] [["hello", "hello"], ["world", "world"]]
[(["hello", "hello"]), (["world", "world"])] ["hello", "hello", "world", "world"]

Now I think it is completely intuitive and desirable that wrapping any expression in (...) should not change evaluation. The documentation seems to substantiate this when it says:

... the result of the last expression is returned from the block.

However, as we can see from the table above, this is not the case in the current jsonata implementation. I strongly suspect that such issue arise because of lines like this and this (among others).

Frankly, to me this is a bug and I've run into more complicated cases in the test suite that don't evaluate the way I think they should for what I think are similar reasons. But interactions between multiple features are involved in those cases so it is harder to explain those cases.

Looking at the bigger picture, the problem I see with the current evaluation code (in a couple of places) is that nodes in the AST are not evaluated in a context free way. Instead, the evaluation code is "peeking" into its children to examine their type and, in some cases, tagging them with special properties (e.g., consarray, predicate, keepArray, keepSingletonArray).

In my evaluation code, I'm not doing any "peeking". Furthermore, I've promoted predicate information to first class AST nodes. Finally, attributes like consarray, keepArray and keepSingletonArray are necessarily present in some form in order to exactly mimic the semantics of version the 1.5.x release but I only tag results of evaluation with these attributes, never the expression nodes.

It is important to note that my goal here is not to be critical of the current semantics or code but rather to make sure that the semantics are intuitive and relatively easy to implement with an absolute minimum of extra "bookkeeping" on the evaluation results.

So bottom line...this at best violates the principle of least surprise but really should be classified as a bug I think since this behavior (i.e., adding parentheses around an atomic expression changes the evaluation semantics) is pretty much unprecedented in any programming language I can think of.

Comments?

xogeny commented 6 years ago

I think there is a strong connection between this and #170.