mulesoft-labs / data-weave-rfc

RFC for the data weave language

11 stars 6 forks source link

Object Factories #23

Open machaval opened 3 years ago

machaval commented 3 years ago

Object Factories

Currently an object value can be constructured by using {}. The way of creating an object simple collects all the fields inside the object, so repeated fields are just kept like that. This is very usefull for cases or formats where repeated fields are supported like XML, QueryParams etc but not so much for things like Json

So the problem we found is that there are patterns like trying to override an object fields with other object fields, or compose an object with two other objects, started generating problems. So we started to see patterns like {name: "Mariano", lastName:"Mendez"} - "name" ++ {name: "Agustin"} in order to replace a field. For this cases we added the update operator

{name: "Mariano", lastName:"Mendez"} update {
    case .name -> "Agustin"
}

But this doesn't fixe the case where you just want to override one object with other object fields

var agus = {name: "Agustin", lastName:"Mendez", spouse: null, married:false}

var spouse = {spouse: "Viky", married: true}

in order to merge this two objects we added the function dw::core::Objects::mergeWith

agus mergeWith spose does the job. But to be honest I don't think this is very declarative.

Last couple of years I've been working in Typescript for some small projects, and in there people use the spread operator for things like this.

{
    ...agus,
    ...spouse
}

And I started to like this way of doing things, we have a similar kind of spread operator.

    (agus),
    (spouse)
}

Now the problem with this is that our objects support duplicated keys and then instead of ending with just one spouse field you end up with two.

So for this I propose to add a new object constructor that creates an object without duplicated keys, and that keys in the bootom override keys on the top. Like a Map or Dictionary in other languages.

The suggested sintax is {{ }}

{{
    (agus),
    (spouse)
}}

Now this starts to work just like in Typescript. Also other kind of things, like conditional fields also work. It is just a different Object factory where it guaranties that there key names are unique.

I like this because it makes working with json much more clean and uses a pattern that is already tested and plays well with the rest of the language.

menduz commented 3 years ago

I love the examples heh

In clojure we have a function called assoc for that purpose: https://clojuredocs.org/clojure.core/assoc, everything is a list in clojure, like in other lisp languages, maps are lists with keys at even indexes and values at odd indexes. Therefore, duplicated keys is allowed. Optimizations aside (clojure maps are closer to maps than to lists in the generated jvm code), there are functions to handle the operations over maps.

assoc is a function that takes two lists of KeyValuePairs and returns a new map of the combination of both, since it is a map, no keys are duplicated.

This problem can be solved both syntactically and idiomatically, I'd prefer to first solve it with an idiom (function) and then create the sugar syntax on top of it once the semantic of the function is battle tested.

Regarding the sugar syntax: Why not copy JS approach?

My proposal is to have an assoc function that dedouplicates the keys (keeping the last occurence of each). Typing this function will be interesting. Then the ... prefix in values would reduce all the previous KeyValuePairs with the new ones.

%dw 2.0
fun assoc<T1, T2>(keyValuePairs: T1, newKeyValuePairs: T2): ??? = ???

// macro: { ...[A] } => assoc({}, A)
// macro: { [HEAD], ...[A] } => assoc(HEAD, A)
// macro: { [HEAD], ...[A], [~TAIL] } => assoc(HEAD, A) ++ TAIL
// macro: { ...[A], [~TAIL] } => assoc({}, A) ++ TAIL

{ ...agus, ...spouse }
// => assoc(assoc({}, agus), spouse)

fun toJsonObject(value) = { ...value } 
// => assoc({}, value)

{ x: 'a', ...objA } 
// => assoc({ x: 'a' }, objA)

{ x: 'a', ...objA, test: 1, asd: 2 } 
// => assoc({ x: 'a' }, objA) ++ { test: 1, asd: 2 }

Considering {{ and }} as the "map" constructor feels odd, because {} is for XML model and {{}} is for json objects. But it would be certainly easier to type.

Side note: as far as I remember, the fact that { ({a: 1, b: 2}) } works is a bug, although I recall creating tests to make sure we don't break the bug-feature. I think the original design only contemplated a single KeyValuePair to be used in the parentheses like this: { (b: 2) }, and that was a bug feature when we handled optional if in { (b: 2) ``if`` CONDITION }

machaval commented 3 years ago

I keep on hitting with this idiom over and over. So I would like to continue with it until we found a proper solution ;). The other day I found a user doing

{
  (user - "description"),
  (account - "description"),
  "description" : "Some description taken from other variable"
}

All this in the context of json. I see a few problems with this idiom.

The need to use the - "description" where description needs to be duplicated, and is a string so no validation is being done.
If account and user has any other common property they are both going to remain and this is going to generate a json with duplicated entries and that is going to end up being a problem
The idiom of ( ) is not very clear in what it does. We are using parenthesis for something different than sintaxis grouping.

This is a common idiom and we need to improve it.

One of the problem that we have detected with the previous solutions is that

{{
    (user),
    (account)
}}

Let's suppose that user and account have duplicated keys. This operation has two effects.

account fields will override the user fields
any user duplicated fields will be removed same as in account.

So is still not ver declarative on the intentionality of what we want to do.

Very declarative approach

{
      append user,
      overwriteWith account
}

This looks a little too verbose but it is very clear on what it does.

A very short way

{
      ++ user,
      << account
}

We could use this, it is not very declarative. But each operator will have a concrete representation and it will be same as each operator

so {} ++ user << account would have same result if we make << operator overwrite from right to left and ++ is already present in the language.

What do we think?

martincousido commented 3 years ago

I've been giving this a lot of thought. For me it feels a bit weird that this:

{
      ++ user,
      << account
}

Would be the same as this: {} ++ user << account

You are using the same operators in different contexts and expecting to do exactly the same. In the second one I can clearly see the steps and I could also divide the problem into pieces in my mind and I get a clear path of execution.

In the first one I feel like there's a lot of magic involved, What is user being ++ to? Could it be confused with an addition operation?

I think I would prefer something like this:

{
      ++ (user),
      << (account)
}

Expressing something like append or merge with the contents of this object (key values). What I've said about the operations prevails though.

I know what we've discussed about {{ }} but I think I'm going to contradict myself and think that something like that might be the best choice with a small change.

Why?

Let's say you have something like this:


%dw 2.0
output application/json
var input1 = {"foo" : "bar", "foo" : "bar2"}
var input2 = {"foo" : "bar3"}

{ ++ input1, << input2 }


Here you are still building an object with duplicate keys I would assume that only the second duplicated key would be replaced
` {"foo" : "bar", "foo" : "bar3"}` if you are in the context of json, is that really something you want? I think the problem here is the ambiguity in the definition of `Object` that may or may not have duplicated keys. This is a counter example of what we discussed the other day because here `.foo `would still give you `bar` but maybe is something that you don't want as you are merging it with another object that already has the definition of foo.

2. I think we were thinking the `{{}}` the wrong way, if we think of it only as a deduplication of keys, the intention is clear. Here two examples:

%dw 2.0 output application/json var input1 = {"foo" : "bar", "foo" : "bar2"} var input2 = {"foo" : "bar3", "bar": "test"}

{ (input1), (input2), "foo": "lala" }

This today it produces:

{ "foo": "bar", "foo": "bar2", "foo": "bar3", "bar": "test", "foo": "lal" }


Let's say now we use the new `{{}}` (side note, I don't like that symbol combination though)

%dw 2.0 output application/json var input1 = {"foo" : "bar", "foo" : "bar2"} var input2 = {"foo" : "bar3", "bar": "test"}


This would produce:

{ "bar": "test", "foo": "lala" }



And I think it's easier to wrap your head around it and to think how this is working step by step, first you have all the fields in your object added in the order in which you specify each object expansion or field and then you are eliminating duplicates leaving the last one. If you are on the JS world you know that eventually you will have to eliminate your duplicates, and you know that using `{{}}` would have that effect.
So when we say:

>1. Account fields will override the user fields
>2. Any user duplicated fields will be removed same as in account.

Are those really the effects or is just one:
Any duplicated field will be removed.

machaval commented 3 years ago

Let's take a look at your example

%dw 2.0
output application/json
var input1 = {"foo" : "bar", "foo" : "bar2"}
var input2 = {"foo" : "bar3", "bar": "test"}

{{
    (input1),
    (input2),
    "foo": "lala"
}}

This would produce:

{
  "bar": "test",
  "foo": "lala"
}

So we say {{ removes duplication. Know one of the arguments is should it keep the first one or the last one. Taking into account what Teo said. It was un intuitive for him that is the last one that is kept as the selector pics the firstone. So for him this should be

{
  "foo": "bar",
  "bar": "test"
}

So if your intention is to do a merge, extend u override kind of semantics one need to flip the order and for me that is very weird.

For me the option of keeping the first one has two reasons.

Is the consistent with selector
We are an inmutable language so keeping the first one will go more aligned

For me the second reason is quite strong, though it doesn't mean we should not ignore them ;)

But for me the questions are two.

One do we need something to declare unique key objects?
Two do we need something to express override?

We have explore the two options. Fix only the first, fix only the second, fix the first and that will give us a semantic for the second one.

martincousido commented 3 years ago

I think those are valid points as well and I agree with the consistency issue between the selector and the merge we might have to do that explicitly somehow (might be worthwhile researching a bit), I don't see how the option with << or ++ works that out though.

{
      ++ user,
      << account
}

Going back to this case what happens if the user has two repeated keys and you are overriding that key with one inside the account object? Which one will be account replacing? The first one, the second one or both? You still have to make that choice implicitly right? Also what will happen with this:

{
      << account,
      ++ user
}

is account extending the empty object and then user adding its fields to it? The order also matters here doesn't it? Or this is some kind of indicator of which one is the important object you are extending?

If we look at js style the order also matters { ...agus, ...spouse }

There's also another thing to discuss is how much control over the construction of the object do we want to give the user, the first two options you would think are to let the first or let the last prevail, if we are talking about objects, but maybe you want also to control which one is going to prevail per key value. (this might be out of scope for this I'm just mentioning it but you can disregard this)

One do we need something to declare unique key objects? When that declaration comes, at some point you will still have to make a choice on what field is going to prevail, although if we do this for some type of checking and forcing the user to make that choice I think that's better.

I agree with the problems of making the choice ourselves, thinking a bit further whether the most important field is the first that appears or the last depends on the semantics of that data and where that duplication was originated, and how as well.

teofr commented 3 years ago

I'll just leave some thoughts on this.

`{{}}`

I think using {{}} and giving preference always to the last key is quite bad, because it goes against how the selector works today:

var payload = {foo: 1, foo: "bar"}
---
[payload.foo, {{ payload }}.foo ]
// == [1, "bar"]

Another option is taking the first key, from the last object, but that also would be bad:

[ {{ ({foo: 1, foo: "bar"}) }}.foo, {{ (foo: 1), (foo: "bar") }}.foo ]
// == [1, "bar"]

So, if we go with {{}} I think going from left to right is the only reasonable way to go, maybe a better syntax that makes that obvious.

`<< ++`

On the other hand, I think the ++, << connectors are quite confusing when used inside a {}, I think the reason is that those connectors, even if they look to only modify one of the entries, they interact with the object factory around it. Like, I'd like to be able to replace << account with something that has the same value, and for it to keep working:

var accountOver = << account
---
{
  ++ user,
  accountOver
}

I think that if we don't have that property, it will compose badly with other stuff.

What would be the type of accountOver ?

Overwriting or merging

Today I saw a piece of code that did someething similar to what this issue proposes, but instead of overwriting the value, it recursively merged it. So, if I merge (>>) two objects

{ foo: 1, bar: {name: Cito, lastName: Freund} }
  >> { foo: 2, bar: {name: Teo} }

I'd get

{ foo: 2, bar: {name: Teo, lastName: Freund} }

So maybe we should look at this behaviour on one side, and on the other, a dedup that all it does is delete repeated keys. I think this kind of merging is also quite useful.

A module?

Finally, I think we could experiment about this on an external module (I know, type information will be bad) as menduz said:

This problem can be solved both syntactically and idiomatically, I'd prefer to first solve it with an idiom (function) and then create the sugar syntax on top of it once the semantic of the function is battle tested.

Also, we need to come up with 5 dataweave modules, so we could give this some extra priority

martincousido commented 3 years ago

On the other hand, I think the ++, << connectors are quite confusing when used inside a {}, I think the reason is that those connectors, even if they look to only modify one of the entries, they interact with the object factory around it. Like, I'd like to be able to replace << account with something that has the same value, and for it to keep working:

100% Agreed, it seems like we are building those things only for this particular case, it seems similar to the feeling I get from update.

Today I saw a piece of code that did something similar to what this issue proposes, but instead of overwriting the value, it recursively merged it. So, if I merge (>>) two objects

I like this approach the best really but I don't know if it goes the same line of thought as the first presented approach where you are kinda describing the structure of your object rather than operating with different objects (?).

Maybe we should just let ppl build their objects with the () and {} and use a dedup afterwards

In the case of the dedup should we also make explicit which key should prevail?

Another example to think about:

{ foo: 1, bar: {name: "Cito", lastName: "Freund"}, bar: { name: "AnotherName"} }
  >> { foo: 2, bar: {name: "Teo"} }

In this case, which bar would you replace?

Today if you use mergeWith

%dw 2.0
output application/json
import mergeWith from dw::core::Objects
---
{ foo: 1, bar: {name: "Cito", lastName: "Freund"}, bar: { name: "AnotherName"} }
  mergeWith { foo: 2, bar: {name: "Teo"} }

you get:

{
  "foo": 2,
  "bar": {
    "name": "Teo"
  }
}

menduz commented 3 years ago

The more I read symbols, the more I am convinced to create functions with clear names inside the dw::core::Objects enabling more flexibility and customization.

// this one could be an alias to shallowMergeWith(obj, obj)
{ a: 1, b: { value: true } } mergeWith { a: 2, b: { } } 
// => { a: 2, b: { } }

{ a: 1, b: { value: true } } deepMergeWith { a: 2, b: { test: 1 } } 
// => { a: 2, b: { value: true, test: 1 } }

pickLast({ a: 1, a: 2, a: 3 })
// => { a: 1 }

// objectToJson: returns json object, recursive. selects last value in duplicated keys, removes namespaces.
objectToJson({ a: 1, c: 1 } ++ { a: 2, b: 2 })
// => { a: 2, b: 2, c: 1 }

Making easier to document and develop the functions, resulting in fewer maintenance code of grammar, parser and compilation phases. Also making the code more portable to different versions of the compiler.