Add support to parse some or all MVT feature properties as JSON

bcamper commented 5 years ago

Sometimes source data includes properties with a richer object format than just strings or numbers -- e.g. arrays, nested objects, etc. The MVT spec doesn't prescribe explicit behavior for these cases, but notes that common tools such as Tippecanoe and Mapnik will encode these properties as stringified JSON.

This requires client-side support for parsing these fields. This PR adds support for this on MVT data sources through a new parse_json property:

If parse_json is true, then each property will be checked to see if it "looks like" stringified JSON (defined as a string with first character being { or [); if so, it is parsed as JSON (with JSON.parse).
If parse_json is an array of property names, only those specific properties are checked for JSON parsing. This is preferred to the above, because it limits the parsing impact to only the fields that need it.
If parse_json is undefined/null/false, no special parsing of properties is done (e.g. the current behavior).

Example usage:

sources:
  tiles:
    type: MVT
    url: https://...
    parse_json: [prop_a, prop_b] # treat feature properties 'prop_a' and 'prop_b' as stringified JSON

These parsed properties will now be properly returned as full JS objects through functions such as scene.getFeatureAt() or scene.queryFeatures().

To make use of these properties in filtering and styling, you will need to write a JS function to access the complex properties, e.g. if prop_a is an array of values, you could filter on it containing the value x like so:

filter: function() { return feature.prop_a.indexOf('x') > -1; }

bcamper commented 5 years ago

@nvkelso I think we also had some values similar to this in TIlezen for awhile (for multiple shields?), but removed them (for unrelated reasons).

nvkelso commented 5 years ago

Tilezen still exports the array properties in GeoJSON and TopoJSON, but currently strips out in MVT. We can start exporting them in MVT when Tangram grows support. Primarily these are multishields (road, transit, walk, bike).

bcamper commented 5 years ago

Tilezen still exports the array properties in GeoJSON and TopoJSON, but currently strips out in MVT. We can start exporting them in MVT when Tangram grows support. Primarily these are multishields (road, transit, walk, bike).

Oh great! Bring it on :)

bcamper commented 5 years ago

@matteblair thoughts on this on the ES side?

matteblair commented 5 years ago

This would be a bigger change for Tangram ES since we currently model feature properties explicitly as a mapping of strings to scalar values. However I can see the value in supporting more structured data for properties.

This is also a topic of discussion for the next version of the MVT spec (https://github.com/mapbox/vector-tile-spec/issues/75). It does not seem like the proposed solution in MVT would be mutually exclusive with the one here, but it would make this one redundant. The MVT proposal also has the advantage that it's a much more compact representation of structured data: values can be de-duplicated in MVT but in stringified JSON they must be duplicated.

bcamper commented 5 years ago

This would be a bigger change for Tangram ES since we currently model feature properties explicitly as a mapping of strings to scalar values. However I can see the value in supporting more structured data for properties.

This is also a topic of discussion for the next version of the MVT spec (mapbox/vector-tile-spec#75). It does not seem like the proposed solution in MVT would be mutually exclusive with the one here, but it would make this one redundant. The MVT proposal also has the advantage that it's a much more compact representation of structured data: values can be de-duplicated in MVT but in stringified JSON they must be duplicated.

I agree the eventual MVT v3 solution will be better, however after some time (that ticket was opened in 2016...) I decided to finally move ahead with this because: 1) there are popular tools that implement this JSON stringify behavior, particularly Tippecanoe, so it's a reality for a lot of existing data, and 2) MVT v3 is going to be a substantial upgrade (tons of other things in the spec) that will require widespread updates to both encoding and decoding libs and I think a lot of MVT v2 data will persist.

matteblair commented 5 years ago

That makes sense. We probably shouldn't count on the MVT v3 spec being finished any time soon, and even with that approach we would have to update the feature data model in Tangram ES. This feature sounds good to me.

matteblair commented 5 years ago

I'm also interested in finding ways to make these properties available to filters in a "native" way, since using a JS function in a filter makes it an order of magnitude slower in Tangram ES. Having built-in accesors for structured properties could also implicitly define which ones need to be parsed, and allow us to do it lazily.

bcamper commented 5 years ago

Agree on finding adding support for more native syntax. I'm mostly concerned about how to do it in a way that doesn't impact performance, since the assumption of a 1:1 mapping from YAML string to feature property key is fast. But I'm sure we can use some clues from data source definition and/or caching strategy to mitigate this.

bcamper commented 5 years ago

OK @matteblair inspired me to take this further. I've added support both for accessing nested properties with native syntax in filters, and for doing array-to-array filter operations. I'm not 100% sure of the syntax/naming decisions here, so feedback/alternatives welcome.

Nested properties in filters

Dot notation with . can be used to access nested feature properties (these properties could have been encoded in a GeoJSON/TopoJSON source, or parsed from MVT stringified JSON properties as covered by this PR):

Given a feature property a: { b: { c: 'test' } }, this filter will match:

filter: { a.b.c: test }

Feature property names that include a . can be escaped with \., e.g. a feature property named 'd.e.f': 'escaped' will match with:

filter: { d\.e\.f: escaped }

These could be mixed, e.g. a property { 'a.b': { c: 'mixed' } would match with:

filter: { a\.b.c: mixed }

Array-to-array filters

We have existing syntax for checking if a single value matches one of a list of values, e.g.:

Does value x match a, b, or c?

filter: { x: [a, b, c] }

As part of this new support for "complex" property values that may be arrays, we need to expand this syntax. This branch adds two keywords for this, by extending the same filter object pattern we use for min/max range filter syntax:

To check if an array a contains one or more of the values p, q, r:

filter: { a: { includes: [p, q, r] } }

Note: while it would be possible to check the type value of a to see if it's an array, and apply this logic without additional syntax, I was weary of introducing an additional run-time check that can occur on many many features (vs. the other additions in this branch, which only occur once per filter, lazily when the filter is first parsed/built), especially for a case that is only needed for a small subset of feature properties. On the other hand, jsperf says my browser can perform 500 million array type checks per second, so this may be unfounded.

There are also cases where you'd like to test that an array contains all of a list of values. An includes_all variant was added for this. For instance, to check if an array a contains the values p, q, AND r:

filter: { a: { includes_all: [p, q, r] } }

One could check for multiple array values with multiple filters, but supporting direct syntax for them would be much more efficient than this, which will execute 3 separate array tests, e.g.:

filter:
  all:
    - a: { includes: p }
    - a: { includes: q }
    - a: { includes: r }

Note: includes (implicit "or"/"one of", like existing single-value-to-array filters) vs. includes_all doesn't feel entirely right, but I couldn't come up with something I liked better, suggestions welcome!

matteblair commented 5 years ago

I like the property accesors! Should be both more performant and easier to use for authoring filters. I'd suggest includes_any instead of includes just to make the meaning more obvious.

matteblair commented 5 years ago

An alternative way of accessing nested properties in YAML would be to use arrays for the filter keys, like:

filter: { [a, b, c]: test }

But I don't think this works in the JS YAML model - plus it's not very natural to read.

The \. escaping just seems a little awkward so I'm trying to think of other syntax options. Hopefully it should be very rare that we encounter data with property keys including ., so it's probably not worth worrying about.

nvkelso commented 5 years ago

I dig it, and @matteblair suggestion for includes_any.

filter: { d.e.f: escaped }

For the complicated property names, could we allow that be be quoted like "d.e.f", too?

nvkelso commented 5 years ago

@bcamper can you add an example on how to access the multishields in the all_shields properties in Tilezen's GeoJSON output, which are arrays (so first element in the array, or for each element in the array)?

Will Tangram abstract that GeoJSON and MVT store it in different ways, but I think Tangram will expand the MVT shorthand to the GeoJSON internally?

bcamper commented 5 years ago

Agree on includes_any, will update.

Yep, property names can be quoted or unquoted (e.g. d.e.f or "d.e.f"), as per standard YAML syntax (it's just personal preference).

@matteblair you're following my thinking too -- I considered an array syntax, but you're correct JS YAML parser doesn't support it, and then it seemed just as complicated or more to parse it out manually from a string key. And this type of dot notation would be valid JS. Agree \. is a bit awkward, but I concluded it was common enough as a string escaping pattern in other cases. It could also use JS-style bracket notation like a[b][c], but this similarly felt a little more involved to parse and still requires some form of escaping (to support [ and ]) if you're really trying to be complete.

bcamper commented 5 years ago

@nvkelso can you provide an example tile/location where that property shows up? Not finding it in a quick search.

nvkelso commented 5 years ago

"all_networks":["US:US","US:CA"],"all_shield_texts":["101","128"]

12-648-1568.topojson.json.zip

bcamper commented 5 years ago

@nvkelso

"all_networks":["US:US","US:CA"],"all_shield_texts":["101","128"]

Can you elaborate on how you think we would want to filter on and use these array fields? For example, are the index positions 0, 1, etc. going to reliably reflect a fixed network "level" (national highway, state highway, etc.)? It seems like we couldn't necessarily guarantee that for all data?

It's easy enough to match on the values, like "does all_networks contain US:CA", but I don't have an immediate answer for how we would then match that index to all_shield_texts to get the desired shield text out (this could be done in a JS function, but we are first seeing what/if we can create native syntax support). We could put the whole object into one array, e.g. something like:

[{ network: 'US:US', text: '101' }, { network: 'US:CA', text: '128' }]

But that also doesn't fit simple filter matching syntax (it's more like the complex destructuring matching offered JS and other languages... which is cool but likely overkill here!); so we'd probably want a JS filter function for it as currently stored.

Let's figure out how we want to style them, then determine what additional scene syntax and/or data format changes are needed (since they aren't in use anyway, right?).

@matteblair other thoughts?

nvkelso commented 5 years ago

Mostly I'm interested in multishield support, for roads, walking paths, bicycle, and transit. As long as this gets us part way there, and it's possible to step all the way there with an extension then I'm good.

matteblair commented 5 years ago

I don't know the full range of data for shields, but for the case above it we might be able to use an object property like:

"networks": { "US:US": "101", "US:CA": "128" }

Then do filtering and styling like:

filter: { networks.US:US : true }
draw:
  text:
    text_source: networks.US:US

This implies that property accessors work in styling expressions as well as filtering expressions - not sure if that was planned or not. Encoding this way is also not very compact if we're stringifying the whole JSON object :\

nvkelso commented 5 years ago

😂 I think this is taking a complex turn when I'm just interested in multi-shields ;)

bcamper commented 5 years ago

It's the tangram way. I feel comfortable we can prototype multi-shields with this data and some of the new syntax, and then go from there.

On Tue, May 7, 2019 at 5:24 PM Nathaniel V. KELSO notifications@github.com wrote:

😂 I think this is taking a complex turn when I'm just interested in multi-shields ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tangrams/tangram/pull/715#issuecomment-490260874, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAECXMUZACAUKRZ7CUAPVDPUHXQZANCNFSM4HHMPEXA .

matteblair commented 5 years ago

@nvkelso I don't think I understand what you mean when you say "multi-shields". Can you explain?

nvkelso commented 5 years ago

The section of road given as an example above has data to support 2 shields, a US route and a California route. On many maps both routes would have a shield drawn, so 2 shields total in this example. Often this is zoom 13+ as mid zooms can only accommodate 1 shield for density issues. But both routes will be signed in real life, on the ground.

On Tue, May 7, 2019 at 19:15 Matt Blair notifications@github.com wrote:

@nvkelso https://github.com/nvkelso I don't think I understand what you mean when you say "multi-shields". Can you explain?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tangrams/tangram/pull/715#issuecomment-490320632, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGQIO4U55TSXTY5JEAXJBDPUIZSVANCNFSM4HHMPEXA .

tangrams / tangram

Add support to parse some or all MVT feature properties as JSON #715

Nested properties in filters

Array-to-array filters