Closed bartbutenaers closed 7 years ago
@bartbutenaers, that's a really good question. Am I right in thinking that you want to iterate over the key/value pairs in an object, apply some kind of logic, and aggregate the return the result as an object? I think @andrew-coleman or @mattbaileyuk would be the best people to answer that.
Hi George,
thanks for your response!
Yes indeed, currently I can e.g. get all the values at once using a simple powerful Jsonata expression:
However it would be great if a similar expression could be used to update the JSON data (i.e. update the original JSON data or return a transformed copy of the original JSON data). For example *.last_value := 'new value'
to get following result:
Don't think this is currently possible with this library, but I may be mistaken ...
Kind regards, Bart
JSONata was designed to be a functional language in that the output of an expression (or part thereof) is dependent only on its input, and has no side effects (including the modification of the input). There is no guaranteed order of execution of parts of an expression, and the engine is free to process parts of the input data in parallel (although the current javascript implementation doesn't do this). The :=
operator is a variable binding
operator rather than a general assignment
operator that you have in imperative languages. The latter would modify the state of an object and break the functional paradigm.
This functional design was inspired by XQuery which has very similar semantics for querying and transforming XML data. XQuery does have an extension language called XQuery Update Facility which introduces extra concepts and syntax for modifying the input data (adding, removing & changing values). It does this without breaking the functional programming model by introducing the concept of a Pending Update List
which is returned by the expression and then applied to the input data. If there is sufficient interest, then I could investigate how we might do this with JSONata.
Hello Andrew,
thanks for your feedback!
Hopefuly the "Jsonata Update Facility" will be added to the roadmap in the near future ...
Kind regards, Bart
I, too, have been looking for ways to update selected values within a JSON structure while leaving the rest unchanged. Something similar to using an XSLT template to copy every element and attribute in an XML document as-is, with the exception of those that match a certain XPATH selector...
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" />
<!-- Copy everything -->
<xsl:template match="/ | @* | node()">
<xsl:copy>
<xsl:apply-templates select="* | @* | node()" />
</xsl:copy>
</xsl:template>
<!-- Update something -->
<xsl:template match="Product[@weight > 10]/@surcharge">
<xsl:attribute name="surcharge">
<xsl:value-of select="number(.) + 10" />
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
This would be very useful when e.g. adding a surcharge to all the Products
in my Order
where Weight > 10
. Although I can use a JSONata expression to locate and operate on those Products, I don't see a good way to leave all the other objects/attributes/levels unaltered. The new $merge()
function coming in 1.3 will help with modifying each matching Product, but will not help with keeping the enclosing Order structure intact.
There seem to be several attempts at what is loosely called JSONT (templating rules for JSON) available in github and npm. I guess I'm not really suggesting that they could be incorporated into the JSONata engine - but it seems that a similar utility transforming one JSON structure into another based on JSONata expressions would be very useful (and satisfy the OP's initial use case).
Steve
P.S -- IF something like this were to be made available, PLEASE don't follow suit and call it "JSONTata" ;*)
Interesting. I need to do some thinking on this and play around with a few options. Thanks for all the discussion - keep it coming!
@bartbutenaers @shrickus As promised, I've done some thinking and come up with a proposal that I hope will meet this requirement. In branch 1.4
I have added some new syntax allows you to copy an object with modifications applied to the copy. The modifications are defined using a location path to select the part(s) of the object that should be changed followed by a structure to define the changes. The syntax uses the vertical bar |
to separate the parts of the expression in a similar manner to regular expressions using a slash '/' to separate the /search/replace/
terms. Think of it like |search|replace|
for objects.
Example: |Account.Order.Product|{'Price': Price * 1.2}|
defines a transform that will return a deep copy the object passed to it, but with the Product
object modified such that its Price
property has had its value increased by 20%. The first part of the expression is the path location that specifies all of the objects within the overall object to change, and the second part defines an object that will get merged into the object(s) matched by the first part. The merging semantics is the same as that of the $merge()
function.
This transform definition syntax creates a jsonata function which you can either assign to a variable and use multiple times, or invoke inline.
Example: payload ~> |Account.Order.Product|{'Price': Price * 1.2}|
or: $increasePrice := |Account.Order.Product|{'Price': Price * 1.2}|
This also has the benefit that multiple transforms can be chained together for more complex transformations.
In common with $merge()
, multiple changes (inserts or updates) can be made to an object.
Example: |Account.Order.Product|{'Price': Price * 1.2, 'Total': Price * Quantity}|
Note that the Total
will be calculated using the original price, not the modified one (JSONata is declarative not imperative).
I've also been thinking how to support the removal of properties from objects. The merge semantics doesn't work for this, so I'm proposing to add an optional extension to the second part of the transform expression where you can specify which properties to delete (by name). This would be in the form of a string or array of strings.
Example: $ ~> |Account.Order.Product|{'Total': Price * Quantity}, ['Price', 'Quantity']|
This copies the input, but for each Product it inserts a Total and removes the Price and Quantity properties.
I'd welcome feedback on any of this including the usability of the syntax. Just grab the 1.4
branch.
Looks like you've done more than just "some thinking"...
I have to admit, my first impression was that the new | search | replace |
syntax was too hard to read -- and this comes from someone who likes building complex regex expressions for fun! Actually, the idea is growing on me... but it would help me if we could discuss the shortcomings of using more established JSONata syntax to accomplish the same results (if possible).
What does that even mean? Well, let's take your first example that increases the prices of all products by 20%. The two things we need for that are:
Without the new syntax, and using an imaginary $replace()
function, I would expect to write something like this:
$replace(Account.Order.Product, { "Price": Price * 1.20 })
or
Account.Order.Product~>$replace({ "Price": Price * 1.20 })
and since you agree that the replacement action is essentially a merge operation, why not this?
$merge(Account.Order.Product, { "Price": Price * 1.20 })
Implementation details aside, I would think any number of updates/deletes could be defined as:
Account.Order.Product~>$replace({ "Total": Price * Quantity, "Price": undefined, "Quantity": undefined })
Since you are proposing the new search/replace syntax, I'm assuming the above is insufficient SO, if the new syntax IS adopted, I'd like to see it be generally available for all input data types, not just for replacing bits of JSON objects. Perhaps this is already your intention, in which case please take this as affirmation of your design ;*)
In your opinion, which of these alternate usages would make sense, and which would be problematic or already have better solutions?
# Update incorrect text within the Product Names
Account.Order.Product."Product Name" ~> | "Trilbey" | "Trilby" |
# Round up all Product weights from lbs to whole oz.
Account.Order.Product.Description ~> | Weight | $ceiling(Weight * 16) |
# Rename an object field, retaining its value
Account.Order.Product.Description ~> | Colour | { "Color": $ } |
# Use a function to aggregate child object field values onto the parent
Account.Order.Product ~> | Description.$max([Height,Width,Depth]) > 500 | { "Oversized": true } |
I realize some of these are a bit contrived, and perhaps all of these could be rewritten to work properly using the current implementation -- I guess I'm thinking that the returned data type of the search
side would be used to figure out how to apply the replace
results. And the search context would be applied to the incoming results on the left of the ~>
chain operator. I think this would mean that the new syntax represents a function with 3 inputs: context, search, and replace.
Sorry for the length of this -- and if you got this far, thanks for reading it!
Steve
This is great feedback - thanks for taking the time!
So to address the first part of it -- why not just have a $transform()
function? Perhaps something like:
$transform(payload, Account.Order.Product, {'Price': Price * 1.2})
instead of:
payload ~> |Account.Order.Product|{'Price': Price * 1.2}|
payload
is required because that is the object that we want to copy (and modify)payload
object that should be modified. The problem is that the function arguments are evaluated before the function is invoked (eager evaluation) and so Account.Order.Product
will be evaluated relative to the parent context rather than the result of evaluating payload
.Price * 1.2
will be evaluated relative the the parent rather than relative to each Product
.The only way I can think of doing this using functions alone, is for the $transform function to take callback functions as its second and third arguments so that their evaluation can be delayed and invoked with the right context. Something like:
$transform(payload,
function($obj) {
$obj.Account.Order.Product
},
function($match) {
$merge($match, {'Price': $match.Price * 1.2})
})
(I haven't tried this BTW, I'm just making it up as I go along).
It might be good to provide this higher-order function as well. In the same way we have a $sort
function as well as the 'order-by' syntax.
Comments on the |...|...|
syntax:
I'm also slightly worried the syntax might be confusing - I'm open to any better ideas.
I did try <search | replace>
, but I thought that looked even more confusing when mixed with the ~>
operator.
# Update incorrect text within the Product Names
Account.Order.Product."Product Name" ~> | "Trilbey" | "Trilby" |
In this case the transform construct is being handed the Product Name
which is just a string. So it could only return a string - it's lost the reference to the outer object that we might want to deep copy. Same with the other examples. But I take your point about wanting to be able to change a single value without having to merge an object. I will give that some more thought.
Thanks again for the thoughtful feedback.
P.S - Interesting to see that you used #...
to add comments to your examples. JSONata doesn't yet support comments, but if it did, would you prefer #
to //
?
Hi Andrew,
I thought that my question would be classified in the section 'maybe later, maybe never'. But as Steve already mentioned, this is indeed much more than only thinking ... Thank you very much already for all the effort !!!!!!!!! I really appreciate it.
I will start experimenting with it next weekend, more specifically in a Node-Red context. Since Jsonata became integrated into Node-Red, a lot of custom function nodes could be already replaced by standard nodes containing simple Jsonata expressions (to read data). However still lots of custom function nodes are still being used, just to write values in JSON objects.
I would be very pleased if such Jsonata transforms could also be integrated somehow into Node-Red.
E.g. integration in the Change Node. To allow it copying the message payload (A) to some location(s) inside some global/flow/msg JSON object (B), using some new syntax:
I know that Nick (@knolleary) and Dave(@dceejay) are already overloaded, but I'm pretty sure they could have very valuable input for your transform syntax. At least for my purpose :-) Sorry guys for mentioning! If you don't like the idea of integrating this syntax in node-red, I will try to wrap a contribution around it. But I would prefer to have it some day inside at least the Change node.
Not clear to me at this moment if my setup is even possibly, since you are returning a deep clone of the original object (instead of changing the original object).
Will follow this topic with great interest, and will come back on this...
Kind regards, Bart
Andrew,
I did not know about the "eager evaluation" used by the function processing, so thanks for that explanation. If the | search | replace |
syntax would help avoid all the callback function code, then I'm all for it! And as I said, it's starting to grow on me -- it's different enough from the rest of the punctuation in use that it stands out, and does look like a regex in a way. Incidentally, that's specifically why I did NOT choose //
to denote a comment, and the #
is not being used yet that I've seen. But hey, you got my point, eh? It would indeed be nice to be able to comment some of my more heinous expressions...
The most important bit (IMO) kind of got lost at the tail end of my last rant -- that there need to be 3 inputs, which I'm calling context
, search
, and replace
. The context
(typically the expr before the chain operator, or $$ if none is given) would be used as $
during the evaluation of the search
expression. The results of the search
expression would be used as $
during the replacement evaluation. That may be a bit simplistic, especially when the search matches an array -- does $
refer to the whole array, or each element?
Ooh, now that I'm thinking about it, if the context
is missing, would it be possible to implement a deep search (similar to the XSLT descendant axis, where **/Product
finds all Products at any level)? Feel free to tell me that's a bridge too far... but it would enable some interesting options, like:
# Add a domain field to each email record
| eMail | { "Domain": $substringAfter(eMail, "@") } |
Essentially, it's like Unix pipes, that transform the data flowing from step to step -- you even chose the |
symbol. Coincidence? I don't think so! Maybe this is where your "streaming" jsonata idea comes into play? Maybe we should treat the pipe symbol in JSONata to mean "stream the preceeding results into the next expression evaluator" -- create an evaluator for each piece, and hook the output of each one into the input of the next one. Then we could put as many expressions together as necessary, instead of limiting it to a "search" step and a "replace" step.
Bart, I don't see any problems with integrating this new feature into the node-red change node. The current implementation simply evaluates the JSONata expression, gets the resulting JS objects, and replaces the changed variable with this output. Whether it's a deep clone or not should not matter.
I'm with you -- this is such an interesting topic, and could have huge impact on msg transformations within node-red. In fact, in my case it already has... this new feature just simplifies the expressions we have to build when we need to keep most of the msg structure intact, but change selected parts of it.
Steve
Some quick responses that are a bit of a muddle between the jsonata specific feature being discussed and how it may apply in the context of node-red.
I'm not 100% sure on the |...|...|
syntax - but I don't have an instinctive sense of what would be clearer. Harking back to my Perl days, the regex syntax of /.../.../
is a bit more familiar - although given we aren't talking regexes here, reusing that bit of the syntax may only serve to confuse.
As per the screenshot in @bartbutenaers screenshot in the previous comment, the Change node's current UI separates out the 'which value you want to change' and 'what value you want to set it to' into separate inputs. Currently the latter can be an expression (and thus dynamic) - but the former (the 'where') can only be a hardcoded value. The proposed syntax combines the where and what into a single expression - so the current UI would not work. Given we're now talking about side-effects of expressions now, this would have to exposed as another option in the change node to evaluate an expression against the message and take the result as the new message to pass on.
But there are some challenges with applying this to node-red. When applied to pure JSON objects, deep copying the original is fine. When applied to a node-red msg object, there may well be properties that cannot be deep copied (such as an HTTP Request object). We already have to special case some of this stuff when we clone messages within the runtime. We can't have JSONata deep cloning node-red message objects.
Applying it to global/flow context is a non starter - it isn't an object that can be manipulated like this.
On the matter of comments, we rely on the fact we can strip newline characters from a JSONata expression without modifying its meaning - this is because we have to present the expression in both a multi-line text editor and a single line text input (which strips newlines). As soon as you add single-line comments we won't be able to do that and we'll have to come up with a whole new UI for expressions (our problem, not yours...). Of course, comments are an important feature. If you're considering #
vs //
I'd urge consistency with JavaScript -//
(or the multiline /* ... */
which would avoid the newline issue...)
Steve,
I think we are mostly agreeing with each other, but I just want to try and clarify a few points to make sure we are on the same wavelength. The current implementation in branch 1.4
has the following structure:
head ~> | location | update |
I've intentionally changed all the terminology here in an attempt not to overly compare it with regexes.
head
evaluates to the object that is to be copied and transformedlocation
evaluates to the part(s) within the copied object that are to be updated. The location
expression is evaluated relative to the result of head
. At the moment the result of evaluating location
must be of type object
, but we can revisit this.update
evaluates to an object that is merged into the object matched by the location
. update
is evaluated relative to the result of location
and if location
matched multiple objects, then the update
gets evaluated for each one of these. The result of (each) update
is merged into the result of location
.The ~>
operator is an existing operator for function chaining and passes the value on the left hand side to the function on the right hand side as its first argument. The expression on the right hand side must evaluate to a function, hence the |...|...|
syntax generates a function with one argument.
The following illustrates the effect of moving the location path progressively from the location
to the head
. They all work on the same input data:
The result of the expression on the left hand side of the ~>
determines the top level of the object that is going to be transformed.
Your observation about doing a 'deep search' transformation is definitely possible, and what I had in mind for this. E.g. **.Product ~> |...|...|
On the subject of 'pipes', that was why I chose to make it generate a function so that you could specify a pipeline of transformations using the ~>
operator:
... ~> |...|...| ~> |...|...| ~> |...|...|
and because ~>
does proper function composition, you can build complex higher-order functions very simply:
(
$transform1 := |...|...|;
$transform2 := |...|...|;
$transform3 := |...|...|;
$complex_transform := transform1 ~> transform2 ~> transform3;
$complex_transform(my_obj)
)
@knolleary just to pick up on a few of your points:
|...|...|
might be confusing. I don't disagree, but I can't think of anything better at the moment. I'll keep thinking though.Yes, it appears we are on the same page after all -- bravo! It looks like what you have working in the new branch is already well on the way to a good solution...
As for embedded comments, I think no matter what delimiters you choose, there will be a clash with some other language it's ported to. So while I agree with Nick that the JS style would be good for node-red, I guess you will just have to pick a style and go with it -- or make it pluggable.
Steve
I've split the requirement for comments into a separate issue since its not directly related to this.
I found a simple way to do this by adding an assign function.
I simply bound a new function to the staticFrame:
staticFrame.bind('assign',defineFunction(functionAssign, '
@wnm3 the problem with this is that the function is modifying the input object which could have undesirable side effects in other parts of the query that invokes it. Doing a deep copy of the object
argument at the top of the function will make it safe.
Given all the feedback and discussion above, I've been looking for alternative ways to achieve these 'modified copy' semantics, but with a more acceptable syntax. The discussion with @shrickus on eager vs lazy evaluation of function arguments also got me thinking... if we could arrange for the delayed evaluation of function arguments, would that help this situation? I think it could, but at what cost?
To recap the current 'eager' behaviour. If a function func
has two arguments arg1
and arg2
then at evaluation time, the expressions that are represented by arg1
and arg2
are both evaluated (in no guaranteed order), then the resulting values of these argument expressions are passed to the implementation of func
(i.e. the function body). arg1
and arg2
are both evaluated in the context of the outer environment, the func
body can have no influence over that.
In the following example:
(
$func := function($arg1, $arg2) { $arg1 & ': ' & $arg2 };
Account.Order.Product.(
$func(SKU, Price)
)
)
is equivalent to
(
$func := function($arg1, $arg2) { $arg1 & ': ' & $arg2 };
Account.Order.Product.(
$val1 := SKU;
$val2 := Price;
$func($val1, $val2)
)
)
The crucial point here is that both SKU
and Price
are evaluated in the context of each Product
, and $func
cannot affect that.
Now imagine a world where these two expressions are not equivalent and in the first expression, the evaluation of the two arguments is under the control of the $func
function. If it could control not only the timing of the argument evaluation, but more importantly, the context under which it is evaluated, then this would add a new dimension to the kinds of functions we could write.
The end goal of this would be to replace
$ ~> | Account.Order.Product | {'Price': Price * 1.2} |
with
$ ~> $transform(Account.Order.Product, {'Price': Price * 1.2})
which is syntactic sugar for:
$transform($, Account.Order.Product, {'Price': Price * 1.2})
where $transform
is one of these special functions that controls the evaluation of its arguments. This function would make a modified copy of the object referenced by evaluating the first argument; and would select all the parts to be modified by evaluating the second argument in the context of the object from the first arg; then for each Product
from the second argument as the new context, would evaluate the expression in the third argument to be merged into the copy.
So, which is more palatable - the extra syntax |...|...|
, or the familiar function syntax, but with radically different evaluation semantics?
BTW, I prototyped this lazy function
behaviour and $transform
function as described above and added it to the 1.4
branch a couple of weeks ago. I’ve just gotten round to writing about it. As ever, I value any feedback.
The $transform(...)
syntax is probably more confusing, and harder to document the lazy evaluation that happens on only one of the arguments. I like the alternate piped syntax, which is visually more indicative of what is happening -- the left side is being evaluated lazily, and passed to the function on the right (unless I'm totally confused and misunderstood your design!)
Great progress on this, and a very interesting discussion... Would you be able to make the current development branch available on a special url, like try.jsonata.org/1.4
or even dev.jsonata.org
so we could try it live on our data?
@andrew-coleman my purpose is to change the original object so I don't want a deep copy. I want subsequent rules to work on the modified copy on purpose. I'm attempting to use JSONata as a "rules" language to transform context for a conversation so the changes are intended to alter the context data.
@shrickus thanks - I'm glad you said that, I think I'm coming to the same conclusion. I'll see what I can do getting the exerciser to pick up the 1.4 branch
@wnm3 The thing with functional languages (like this) is that functions must not have side effects and that the value returned by a function is dependent only on its arguments (the timestamp and random functions are slightly annoying exceptions to this). Internal optimisation might skip the invocation of a function if it has already been invoked with the same parameters previously, and there is no guaranteed order of execution of any of the parts of the expression unless explicitly documented.
I suspect a small modification to your expression will get round this by assigning the result of your modify/copy function to a variable, then using this variable in you subsequent expressions. I'm happy to look at your expressions/rules to make suggestions.
Yes, simply put, I can do the assignment outside of JSONata and only use JSONata to create new structures from the existing one. It seems a bit wasteful to have to always make the clone to then make the alterations on the clone, the outside of JSONata overwrite the original with the altered clone version. But, if this keeps the language pure I guess I'll have to take that approach.
It just seemed so easy to invoke the assign function for changes to Object properties that allowing it in the language would be nice. I would have thought the function call was atomic (e.g., that the concern over the need for guaranteed order of execution would not apply) and that JSONata's interpretation by its parser would produce an expected order of operation by pushing outer manipulations to a stack so inner operations can be performed first. I didn't notice the logic for internal optimization in the code I'd reviewed but since I wasn't looking for it I may have missed it altogether.
One of my favorite sayings is: in theory, theory and practice are the same. In practice they are not.
Is it possible we are worried about a theoretical problem that in practice is not implemented in JSONata?
@wnm3 you're right, this implementation of JSONata doesn't yet perform function memoization, parallel execution, copy-on-write data structures, and all the other possibilities available to functional language runtimes, but I would like to reserve the right to implementing them in the future.
To that end, I have tests in the test suite today that assert that built-in functions (e.g. $sort
) do not modify the input. E.g.
describe('[[$], [$sort($)], [$]]', function () {
it('should return result object', function () {
var expr = jsonata('[[$], [$sort($)], [$]]');
var result = expr.evaluate([1,3,2]);
var expected = [
[1, 3, 2],
[1, 2, 3],
[1, 3, 2]
];
expect(result).to.deep.equal(expected);
});
});
and alternative JSONata implementations (I know of one being developed in Java at the moment) could choose to evaluate those three sub-expressions on separate threads without the constraints imposed by an imperative language.
We have now open-sourced the jsonata exerciser... https://github.com/jsonata-js/jsonata-exerciser So you can clone it and run it locally.
I have also added the ability to switch which version of JSONata the exerciser loads. This includes tagged released and any branch in github.
@shrickus you can try out the 1.4 branch in the exerciser now by typing the following 'command' in the JSONata input box:
J:branch 1.4
followed by a newline
This has now been added to the v1.4 release
Hi Andrew,
just learnt about the transform, pretty cool. As a total noob I'm failing though at the following challenge (http://try.jsonata.org/BkGi25HJz):
My goal: substract the first value of the values array (1490852901454) from all values (including the first one, ie. resulting in 0 for the first value, 1490852901590-1490852901454 = 136 for the second one, etc.).
Thanks a lot for your help in advance, much appreciated!
Finally got it (http://try.jsonata.org/SkrU3nByf):
Final thought: why is the transform functionality not mentioned in the documentation (yet)?
{ "output": $reduce(response, function($acc, $item) { $exists := $acc.output[enterpriseid = $item.enterpriseid] $output := $exists ? $acc.output[$acc.output[enterpriseid = $item.enterpriseid]._index] : {} { "output": $exists ? ( $output + { "bookingvalue": $output.bookingvalue + $item.bookingvalue, "foodmiles": $output.foodmiles + $item.foodmiles
}
)
: (
$acc.output + {
"enterpriseid": $item.enterpriseid,
"name": $item.enterprise,
"bookingvalue": $item.bookingvalue,
"foodmiles": $item.foodmiles
}
)
}
}, { "output": [] }).output }
in this code I am getting a syntax error
{ "code": "S0202", "token": "output", "position": 132, "message": "Expected \"}\", got \"output\"" }
this is my son data { "response": [ {
"enterpriseid": 53898,
"enterprise": " Absolute F and B Facilities Management LLC ",
"bookingqty": 2,
"foodmiles": 3088.25,
"bookingvalue": 46,
},
{
"enterpriseid": 53898,
"enterprise": " Absolute F and B Facilities Management LLC ",
"bookingqty": 5,
"foodmiles": 0,
"bookingvalue": 275,
},
]
}
I am performing aggregation if found a duplicate enterpriseid then make the enterprise name unique and add the rest of the values
Dear,
I was wondering if there is any way to use Jsonata for changing Json data, or returning a transformed version of the JSON data (containing the changes).
For example assume following input:
That needs to be transformed to following output (based on some Jsonata expression):
Since Jsonata is a query and transformation language (and not a manipulation language), I assume it is not possible... I have been debugging the entire expression evaluation, but couldn't find anything relevant.
But perhaps there is a workaround e.g. using a user defined function (that logs everything from the original data, except from the changes )?
Thanks in advance for your time !!! Bart Butenaers