mavoweb / treecle

WIP. A toolbox for hierarchical JS objects.
https://treecle.mavo.io
MIT License
7 stars 0 forks source link

Get and set object values by path #22

Open LeaVerou opened 8 months ago

LeaVerou commented 8 months ago

Background

This came up in #18 as an implementation detail, but I think it's incredibly useful in its own right.

Sample use cases and prior art:

I pushed two util functions a few days ago (getByPath() and setByPath() but I think we should clean up the code, make it more flexible, and expose at the top level.

Some design decisions, requirements, questions below.

Signature

Function names

I’m leaning towards the simple get() and set() that are already established in prior work. Is there anything else we may want to save the names get() and set() for?

Arguments

Strawman:

Might be worth to later have overloads that allow specifying value, path as part of the options object, but don't see compelling motivation to include that in the MVP.

Should there be a way to set value via a function that takes the path and current value as parameters? Or is it encroaching too much into transform() territory at this point?

Path structure

Data type

Paths should be provided as arrays, we don’t want to deal with string parsing and trying to distinguish paths from property names. Strings/numbers should be accepted as well, but they’re just a path of length 1.

We may want to also support objects to provide additional metadata (see below).

Predicates

It seems obvious that entirely literal paths will not suffice (at the very least we need wildcards). Should we just use JSON Path? Hell no! First it's overkill for these use cases, and second once you go beyond literal property names + wildcards, the syntax becomes cryptic AF. And despite its complexity, there are some pretty common use cases like case insensitivity it doesn’t seem to support.

So since we can’t just use JSON Path, what do we use? What predicates do we want to support? Examples:

  1. Wildcards (any property at this level)
  2. Case-insensitive property names?
  3. Alternatives? (e.g. "foo or bar")
  4. Ranges of numbers? (e.g. "top 3 items")
  5. Property queries (e.g. "get items with id=foo") — essentially the path version of CSS :has(), so we'd probably want to frame it that way, i.e. "children that match this path", so I’ll call them child queries from now on
  6. Property names that start/end with a given string?
  7. Property name regex?

We generally want to keep the MVP simple until use cases emerge, but it helps to take these things into account at the design stage so that the API has room to expand.

As mentioned above, wildcards are certainly needed. Case-insensitive matching might be worth to include in the MVP, since at least the Mavo use cases need it. The rest we can probably ship without and add as needed.

Syntax for predicates

So that begs the question, how do we express these predicates?

Special syntax. This works decently for some of them:

However, but there is no obvious fit for any of the others. Also, inventing a new microsyntax has several drawbacks:

So instead, I think we should go with an approach of strings for literals + wildcards as the only exception, since these are very common and have a very obvious syntax. Anything else would require making that part of the path an object literal.

This means even if we only ship wildcard as the only predicate, we need to support object literals at least to escape that and specify that something is a literal property name. If we have that escape hatch, we could in the future explore more options to add syntax for certain things where a readable syntactic option is obvious, as a shortcut (e.g. "foo|bar" for alternatives)

Predicate schema

Strawman for all of the above predicates (even though we don't plan to implement them all):

Notes:

How do predicates work with set()?

Setting is only an issue for the last part of the path — until then it's still a getting task.

So if the last part of the path is a…

Return value

Following the design principle that function return values should not vary wildly based on the options passed, perhaps we actually need more than just a single get() function:

Or perhaps get() for one value and getAll() for multiple?

Options for the whole path

These will be passed to the functions as part of the options dictionary.

adamjanicki2 commented 7 months ago

Here are some of my thoughts on everything above:

Function names

I also think these are the simplest applications of getting and setting, just getting and setting arbitrary nodes, so I'm on board with get and set as names, I'm not sure if there are any other applications that they would fit better in as a name.

Or perhaps get() for one value and getAll() for multiple?

I like this so that we're not returning a single node in the generic case and an array in the case where wildcards/other more complex operations were involved.

How do predicates work with set()?

I think by default set should only set paths if they exist, including the case where a wildcard/other expression comes into play, then it should set all matching and existing paths. Then we can allow an optional param, something along the lines o.setNonexistent, which would enable an author to tell us to set a path even if it does not exist.

But now that I'm thinking about it, setting a non-existent path is challenging because we do not know for sure how to add nodes to their tree structure. For example, what if their node is a custom class? We couldn't simply create objects/properties to create this path. So we'll have to think about this case more. Maybe this is a sign that this may be a use case that we should wait on verifying that we need need to support it?

Predicate schema

I like the idea of having this since it provides flexibility to add new operations and features in the future, and would allow us to start with simple and common usecases first and add new ones as they arise

LeaVerou commented 7 months ago

Let’s start simple. Paths are arrays with values:

Thoughts?

LeaVerou commented 7 months ago

Wrt setting, the idea is we'd use {} as a default, but users can customize it

adamjanicki2 commented 7 months ago

Let’s start simple. Paths are arrays with values:

  • string | number: wildcard or property name
  • {} (empty object): same as wildcard
  • {name: string | number}: literal name (i.e. {name: "*"} is not a wildcard).

Thoughts?

I like it, it's simple and easy to understand

adamjanicki2 commented 7 months ago

@LeaVerou A few more clarifying points on get before implementing:

  1. Should this function be calling context.getChildProperties in the case of a wildcard?
  2. Should this function be checking that a key in the path is actually a valid key of that node (i.e. checking that context.isNode(node[key])) before exploring further?
  3. In the case where node is something like {left: {name: "leaf1"}, right: {name: "leaf2"}}, and path is ["*", "nonexistentKey"], should it return [] since after the wildcard nothing matched nonexistentKey (meaning the path was not valid)?
  4. In the case where there are no wildcards in the path, and the path does not exist, should it return undefined or null?

Just wanted to get your opinion on these things, for 1-3, my answer would be yes, it should do those things, and for 4, I would lean toward returning undefined.

LeaVerou commented 7 months ago

After thinking about this some more, I wonder if we could get rid of all this complexity and just have an array of properties that point to one or more children. The nodes that have a single children property (or whatever it's called) are basically special cases of how children work in ASTs, since there you have nodes that point to single children OR arrays of children. The only wart is how to figure out whether node[childProperty] points to a single child node or a data structure containing many children, but that's what isNode() is for!

adamjanicki2 commented 7 months ago

After thinking about this some more, I wonder if we could get rid of all this complexity and just have an array of properties that point to one or more children. The nodes that have a single children property (or whatever it's called) are basically special cases of how children work in ASTs, since there you have nodes that point to single children OR arrays of children. The only wart is how to figure out whether node[childProperty] points to a single child node or a data structure containing many children, but that's what isNode() is for!

I like this idea much better than having a wildcard operator and all the complex syntax for defining it versus "*" as a standard key

adamjanicki2 commented 7 months ago

So if I'm understanding your idea correctly, get would look like function get(node, path) where path is Array<string | number>, and among those properties could be something like children, where itself is not a node, but contains node since it's either an object or an array, in which case we'd return all of them.

For set(node, path, value), it would be similar, except one question I have is what to do if the path ends with a type that's not a node but contains them, for example, path = ["children"]. In this case, should it set all nodes inside children to value?

LeaVerou commented 7 months ago

It means get() and set() are not on the critical path any more.

LeaVerou commented 6 months ago

@adamjanicki2 what happened with this? Being able to set how to get from parent to children in a more generic way is pretty essential.

adamjanicki2 commented 6 months ago

@adamjanicki2 what happened with this? Being able to set how to get from parent to children in a more generic way is pretty essential.

What does this mean? Are you referring to general set/get functions or something else?