Array merge operator - Githubissues

bcamper commented 7 years ago

Adds a special "array merge operator", ..., to the scene merge process. This provides more flexibility in scene merging by allowing the scene author to choose how source and destination array values are merged.

Functionality Object merge works by merging source nodes (e.g. those from a scene being imported with import) into destination nodes (the scene being merged into, which includes the import statement). In the existing merge behavior, any time a source node value is an array, it overwrites the destination node value (regardless of the destination node's type).

This PR modifies the behavior such that when the source node value is an array: If the string ... appears as one of the source node's array values, the destination node values are inserted into the source node array, at the index of the ... operator (the ... values are replaced). If the existing destination node value is a scalar, that single value is inserted into the source (as if it were a single-element array).

This syntax can be used to append, prepend, or insert values within the destination:

Example 1: append values

  dest: [1, 2, 3]
  source: [..., 4, 5, 6]
  -> merge result: [1, 2, 3, 4, 5, 6]

Example 2: insert in middle

  dest: [1, 2, 3]
  source: [a, ..., b]
  -> merge result: [a, 1, 2, 3, b]

Example 3: scalar value in source is treated as a single-element array

  dest: x
  source: [1, 2, 3, ...]
  -> merge result: [1, 2, 3, x]

The operator syntax choice is loosely inspired by the JS rest/spread operator, and C variadic arguments, though the context and usage is a bit different here.

Motivation In many cases, such as color values, it is desirable to simply replace an array value. However, there are some cases where it can be awkward to "lose" the existing array, rather then being able to extend it. We accepted this limitation because we wanted to keep the merge behavior as generic as possible, but as scene imports have been used in many more contexts, the limitation is more apparent.

The array merge operator is an attempt to keep the behavior backwards compatible, while extending the capability in a generic fashion.

It can be used to better control the order of style mixing (using mix) and/or shader blocks.

Additionally, to support more data visualization options, we are considering introducing a scene-level scripts option for importing external JS libraries (which would remain backwards compatible with the existing scripts parameter in sources -- which does not lend itself to source-agnostic data visualization scene fragments to be imported). Since the scripts parameter (both existing and proposed) is an array, relying on the current overwrite-only behavior is both error-prone and couples the source and destination scenes in an impractical way.

bcamper commented 7 years ago

@matteblair can you check this out please

matteblair commented 7 years ago

An aspect of this that's a little strange to me is that, as I understand it, the "main"/"parent" scene file doesn't control how elements are merged from it's subordinate scene files. This seems contrary to the rest of the merging behavior, where the "parent" scene has the final say. For example, with this feature I think it would be impossible to overwrite an array value that includes the merge operator (though you could delete it with a null).

bcamper commented 7 years ago

@matteblair

This seems to imply that '...' cannot be safely used as an actual value in an array. Does this cause problems for any parts of the scene syntax? text_source maybe?

Yes, it does make '...' a reserved keyword/syntax inside of arrays (meant to make note of that in the PR description). I couldn't think of a place where this would likely be a problem currently - for text_source, array strings can be used but they indicate feature property names. The string '...' would be a valid GeoJSON feature property name, though it is unlikely. You could get around this by writing your text_source as a JS function returning that property value instead though. I'm not able to think of any other scene properties that would accept freeform strings except for url_subdomains, where '...' wouldn't be valid.

If a scene file uses '...' to specify import order, this seems to imply that it can't be used alone because the '...' would remain in the final scene file as a (likely) invalid value. Should '...' array values be automatically purged after merging? Is that too destructive to existing scene files?

I hadn't even thought of using this operator on import! And in fact I don't think it would be applied in the current JS implementation (the import array is read and a new scene is imported, with the import property then removed before the scenes are merged). So while it's not documented here (and should be), I think that import has special behavior and this concept doesn't even make sense as applied to it?

bcamper commented 7 years ago

@matteblair you're right about the child/parent relationship being a bit "backward" here. Can you think of another way to do this that would be more consistent?

matteblair commented 7 years ago

If a scene file uses '...' to specify import order

What I meant here is "if a scene uses the array merge operator somewhere", I didn't mean to specifically refer to the import section of the scene.

bcamper commented 7 years ago

@matteblair I guess the source/dest rules could just be flipped here - and that would work OK?

bcamper commented 7 years ago

Oh I see what you mean about "leftover" '...' values now. I suppose we would want to scrub them if they were not merged... yes?

bcamper commented 7 years ago

I think that the order would make more sense reversed, but that I ended up doing it this way and not noticing because I had a sort of contrived example of extending a dash pattern in a sub-layer (which is a much less compelling need than the import scenario).

matteblair commented 7 years ago

I agree, it seems better to check for a merge operator in the "destination" array.

Re: scrubbing leftover merge operators - that does seem necessary for practical use. I can imagine ... being plausibly used as a data value in global (maybe in an array of string constants used by a function), in which case the value would be erroneously removed in this process. On the other hand, I'm not sure that there's an identifier we can use for this syntax that isn't a plausible data value.

bcamper commented 7 years ago

@matteblair OK I managed to thoroughly confuse myself.

I think that the behavior as implemented is what we actually want, but that I described some of it backwards in the description -- the merging does happen from "source" to "destination" as described (whether these terms are the right ones), but what I stated incorrectly was the order of merging / which scene is parent/child.

Consider an example where scene A imports scene B, which in turn imports scene C.

Scene A: import: B.yaml Scene B: import: C.yaml Scene C: ...

While scene A is usually referred to as the "parent" scene, it is actually the last scene that is merged (the last "source"), such that its values "win" over the previous scenes. The merge order is: C, B, A. In this context, you want scene A to be able to use the ... operator to specify how the values in scene B (and/or C) are merged in.

Similarly, with sub-layers, as in my dash example:

layers:
  ...
  paths:
    filter: { kind: path }
    draw:
      lines:
        dash: [2, 1]

    parks:
      filter: { landuse_kind: park }
      draw:
        lines:
          dash: [..., 4] # extend the dash pattern

The parks sub-layer is merged in last, so the merge order is: paths, parks. The parks layer can use ... to extend the dash pattern from the parent, rather than overwriting it (again this example may be contrived but demonstrates another merge context where this could work and produce rational results).

Does this make sense?

matteblair commented 7 years ago

@bcamper Yes I see what you mean about "source" and "destination" now. Understanding "source" as the "parent" scene and "destination" as the "child" scene (or the result of merging child scenes recursively), then this agrees with the desired behavior in our earlier comments.

However I think I failed to understand another aspect of this change. The PR description discusses this behavior in terms of merging scenes, but your example with the line dash pattern suggests that the behavior of evaluating draw rules would also be changed. Is that correct? I understand that these two processes might be implemented with the same function in Tangram JS, but in Tangram ES they are implemented very differently and I think in general these processes have different requirements.

bcamper commented 7 years ago

Yes the merging is generic for a few of these operations in JS, tried to sneak that by you ;) I agree it makes sense to limit it to scene merging (and we can evaluate other cases in the future if needed). I'll update it to be an option in the merge code.

bcamper commented 7 years ago

OK I updated this to make the array merge only applied to scene merges (and optionally available for other features if we later decide we want it), and to remove any unmatched or extraneous (if more than one was found) merge operators from arrays.

The only caveat to scrubbing the leftover merge operators is that this is only applied to scenes that import others. Meaning that if you put a ... value in an array but then never import other scenes, it will not be scrubbed. This was to save an expensive additional (recursive) parsing pass over the scene object in cases where it's not otherwise being merged into. Do we think that's acceptable?

meetar commented 7 years ago

if you put a ... value in an array but then ever import other scenes

…Just to clarify, this should be "never import other scenes", correct?

bcamper commented 7 years ago

@meetar yes, thanks, fixed :)

matteblair commented 7 years ago

Thanks for understanding my concern about over-generality :)

Scrubbing the leftover merge operators seems like generally a good thing. From a user's perspective I think it would be distinctly more useful to do the scrubbing in all cases, even when no scenes are imported. In contexts like Mapzen.js and the Android and iOS SDKs, imports are often specified programmatically at run-time, so I worry that it will be difficult to reason about the behavior of the scene file if it includes merge operators and a variable set of imports.

bcamper commented 7 years ago

I agree it would be better to scrub in all cases, but I'm concerned it's going to add a noticeable performance hit to do a complete recursive pass (on the Mapzen base maps, which are absurdly large). I'll try to benchmark.

bcamper commented 7 years ago

Alternatively, we could introduce this syntax for a subset of scene nodes (the scripts parameter I want to add, styles, etc.). This adds more complexity to limit the replacement logic based on scope, but avoids a recursive process over deep structures like layers.

matteblair commented 7 years ago

The motivations for this syntax seem very focused on this scripts feature. Is it really best to incorporate large amounts of Javascript through a YAML file? This strikes me as something more appropriate to do through a Javascript API to the library itself. I'm far from an expert JS developer but maybe it's easier to use an existing build/debug pipeline with your libraries instead of letting Tangram pull it in at runtime?

bcamper commented 7 years ago

It's actually necessary for Tangram to pull in any external script dependencies, because the code needs to be loaded into the web worker threads (but workers have a native importScripts() method, so Tangram doesn't need to actually do much).

The scripts feature (which exists for data sources now) just specifies the URL for the external JS, not the actual JS itself. It's an advanced feature, but I think it's actually worked surprisingly well so far.

Beyond that, I think there's some argument to configuring these via YAML so that the scene captures all of its dependencies. There's otherwise no way for these scenes to be self-contained (in the way that the rest of our scene logic and resources are).

matteblair commented 7 years ago

I see, the web workers do complicate the issue :\

Checking all of the scene structure for leftover merge operators may not be overly slow since relatively few leaf nodes are arrays. I think it's worth benchmarking.

Another approach would be to use a mapping for scripts, which would allow merging using the current behavior. I suppose the trade-off would be potential key collisions, but if keys are named methodically then I think it would be a rare problem.

bcamper commented 6 years ago

Closing in favor of the more targeted, scripts syntax-specific change in #634.

tangrams / tangram

Array merge operator #608