pdenno / interop-mapping

LaTeX for OAGi Interoperable Mapping paper
1 stars 0 forks source link

General Feedback on Mapping Specification #1

Open ryanlunka opened 2 years ago

ryanlunka commented 2 years ago

Just wanted to provide some feedback that came to mind as I read through the current (I know early) specification. This is all top of mind without much deep thought...

Nulls From Arrays

In 2.2.1 you mention that when there are no matches in a vector, no value is returned, and that it's JSONata behavior that you don't like. I agree and disagree. We actually make use of that behavior quite regularly in JSONata, but it's not always intuitive that it would behave that way. And, sometimes if I'm not paying attention, it does get me.

Biggest example, if I map an array of objects to what should be an array of transformed versions of those objects, if (unbeknownst to me at map dev time) the array has one object in it, that will output as a single object (not an array of one), usually invalidating against the intended schema. I think this is similar to what you describe in the spec.

Perhaps some kind of optionality to express the intended behavior?

Regex for Filtering

I agree with your thoughts on Regex. It's just so abstract to a business user, it loses its meaning in that context. I wish there was a natural language way to express the same idea.

Index Variables

Your thoughts on this are spot on, IMO. I've used almost exactly that expression a number of times, when I need to add an index to array elements. Example... One system produces an array of order line items without a "line number" but the target system requires a "line number" for each item. (Think eCommerce orders.)

Code Lists

Agree with this as well. There are kind of two versions of this: 1) mapping static code to static code and 2) mapping dynamic business references. In the former, think mapping an ISO2 country code to an ISO3 country code. In the latter, think mapping a SKU in one system to an associated GTIN in another or shipping codes represented differently across two systems. These are mechanically equivalent concepts, but I would imagine describing them should be different.

FWIW we usually use the term "lookup table" to describe these scenarios, and sometimes that table can have more than two columns.

Tabular Data

Another common scenario for us, but this transformation from flat "Excel" data is handled via pre-processing before the map. Should this be a concern of a mapping language? Or should you just assume the column names are mapped to the property names for each item in the array?

Other Thoughts

Calculation is something that is underserved by JSONata. For one thing, the functions available to support calculation are limited. For another, the JavaScript implementation of JSONata (which I understand is the primarily used one) suffers from issues with floating point calculation in JS. This causes a lot of headaches for us when precisely calculated values are important. JSONata will spit out something like 4.999999998 instead of 5. I don't know how important this fact is to what you're doing, but food for thought.

Common language for defining string manipulation, combining values into one, break values down into multiple, etc. would be very useful to us.

We define complex transformations using psuedocode. We'll map "field to field" but then write a small paragraph that describes in fairly plan English how that value should be manipulated between fields. We have no consistency for how this is done, and having that would be quite helpful. This may start to look more procedural than functional though.

JSONata forces me to use the $lookup function when I really want to just do something like $$.someArray[key=value] to filter something out of the array. It feels heavy handed to me, but then again, maybe I'm not seeing something important. Again, no idea if that's helpful for you, but food for thought.

You can chain processing in JSONata using firstFunction()~>secondFunction()~>... It always feels like a little bit of a violation of the functional nature to me--like there should be a way to do it that's less procedural. But, practically I end up using it sometimes. Is this something that should be encouraged and if so, how would you define it in the spec?

No idea if any of this is helpful, but I look forward to seeing updates and continuing to collaborate if I can provide value.

pdenno commented 2 years ago

Thanks for this, Ryan!

I'll take your advice on "Nulls From Arrays" and leave things as they are. In fact, generally speaking I am reluctant to violate JSONata behavior without a strong counter-argument. My thinking on that one was that the behavior loses the correspondence to the input sequence. Of course, that gets lost by $filter and such things anyway. Just last night I noticed the behavior you speak of (when the result of mapping is one item, the item is returned, not a vector containing one element). I think I'll write an issue against JSONata on that one and see where it leads. (Or you can, if you'd like.)

Regarding "Code Lists" and "Tabular Data" maybe we could do better than things like the function I suggest. For example, simply making a reference to a correspondence table published elsewhere.

Regarding the pre-processing of "Excel" tables, I suppose the challenge is in anticipating what sort of fancy things might be encountered. Some spreadsheet authors can be quite creative, viewing the spreadsheet as a blank canvas for their art! At the other extreme, were we provided with normalized relational form and knowledge of primary and foreign keys, we could reconstruct object-oriented or similar "networked" data...towards some end. We'll need to refine some our understanding of the use cases for this language too. Perhaps our next step (maybe with the OAGi) should be to write some about the envisioned use cases of the language generally and see what people think. Work with tables seems to be on everyone's radar.

Lots of good thoughts in your "Other Thoughts" too. Perhaps at some point we can break this down into a number of separately tracked issues.

Regarding firstFunction().secondFunction(), personally, that sort of thing appealed to me. f().g().h() is just h(g(f ()). I suppose it isn't Haskell sort of functional, but I doubt that would appeal to most users anyway. I agree, however, that functional can still feel procedural; and functional doesn't mean declarative. I think a declarative feel would be good from a usability viewpoint. But in as far as people can't guess the execution model behind the declarative statements, they might not be able to translate those into results, or into their "commercial mapping language" (one use case). The hard part, as we start thinking about more complex data than hierarchical JSON or XML is in making the trade-offs between declarative semantics, functional features, complexity of the execution model, and goals of interoperable exchange of mapping specs.

Lots of food for thought here. Thanks again!