Tile Continuity with Updating Tiles

flippmoke commented 6 years ago

Problem

Tilesets are often viewed as static datasets, but over time it could be very possible to treat them as a dynamic datastore. This is already being done in some cases but there is always the possibility of clients downloading tiles at the time that they change. This can cause discontinuities between the same feature as it might have been edited or changed. In the past this was not as big of an issue as we were wanting to consider each tile as its own dataset, however, with the introduction of the possibility of tiles being required to know if they match their neighbor via possible additions such as https://github.com/mapbox/vector-tile-spec/issues/104 this may no longer be the case.

An example line from tile that was different from the other two tiles containing the same feature.

Possible Solution 1

The simplest solution would be to simply have all features or layers to have a hash or version to make sure that all other tiles match that same version. However, this could make there be a requirement to update all tiles at once even if only one tile changed a feature. This is not ideal for high update rates of tilesets.

Possible Solution 2

It would be possible to version the edges between tiles and encode this information within each tile. This could be encoded in each layer to know if the layer version matches the edge version of the tile next to it. This would be encoded in the layer as a set of 4 repeated integers, with each integer representing the edge's version in the order of Top, Right, Bottom, Left edges of the tile. This would be optional and if edge versioning was not required this would assume that no check is required or the tileset is not needing to update.

required uint64 version = 12;

The edge version would be incremented each time an edit is made to a feature, and would require that surrounding tiles also be updated with a new edge version to match (even if data was not changed in that tile). This means only the areas changed plus a 1 tile buffer would require version changes.

The version 0 is reserved in this model as it would signify that there is no layer's data in the tile next to it. This would be useful if a feature is dropped from a tile or if the edge is on a tile boundary.

Conclusion

I would support the second solution much more then the first and there are other likely solutions to this problem. Thoughts?

anandthakker commented 6 years ago

Another possibility would be to have each tile to have a single version. Compared to the edge-versioning approach, this would be simpler for encoding and shift a bit more burden onto clients: if I see that a given tile has a new version, I then have to request the four (eight?) adjacent tiles to check whether their versions have changed, too.

flippmoke commented 6 years ago

@anandthakker single versions for each tile would not solve the problem completely because you might not know that you need to change the tile you have as you pan into new areas. You need to have a connection of versions across the tiles in some manner so that one tile knows if it works with the other.

anandthakker commented 6 years ago

single versions for each tile would not solve the problem completely because you might not know that you need to change the tile you have as you pan into new areas.

I think it would be possible, but would require making more requests than having each edge versioned independently, since if, say, you saw a new version of tile 2/3/3 as compared to your cache, then for any member of {1/3/3, 3/3/3, 2/2/3, 2/4/3} that is in your cache, you'd have to request it again to see if its version, too, had changed. If not, then this indicates that whatever change was made to 2/3/3 was only local to that tile.

e-n-f commented 6 years ago

If you don't want to force a reload of the entire feature when its hash changes, isn't it sufficient to check whether all the edge transitions of the feature in your tile connect to corresponding edge transitions in the adjacent tiles, and vice versa? If they all connect, you are guaranteed to have continuous topology, without requiring complete coherency.

flippmoke commented 6 years ago

@ericfischer I will provide an example when that does not hold:

For example tile 1/0/0 is downloaded and stored, later we request 1/1/0. However, between that time we have updated 1/0/0 and 1/1/0. In the update we changed a feature's property that crosses the tile boundary such that a key now has a new value. At this point, which feature having the same id is correct?

Setting a version on the edge would signify which version is incorrect and which tile you would need to refresh for a new tile quickly.

e-n-f commented 6 years ago

OK, I see your point, and will revise mine:

all the edge transitions of the feature in your tile connect to corresponding edge transitions in the adjacent tiles, and vice versa, and the features in each tile have identical properties

I'm not necessarily opposed to edge IDs, but it seems like if you use them, you always have to invalidate two tiles instead of possibly just one. You also can't update a tile without looking up the previous version to find which edges have changed. This also makes it more difficult to do atomic updates, since you would need a lock to keep two writers from potentially giving two revisions of the edge in quick succession the same ID.

flippmoke commented 6 years ago

@ericfischer checking that two features have the same properties is an intensive check where we would have to perhaps decode all of the properties or do big string compares.

Additionally, does this cover all the cases? I am not sure. I will have to brainstorm on this further.

e-n-f commented 6 years ago

True, it might be an expensive comparison. Maybe we could drag an attribute hash along as part of the feature to allow quicker acceptance or rejection without having to inspect each attribute separately.

(I'm not sure either whether it covers all the cases.)

e-n-f commented 6 years ago

Just realized: if gridded data is an attribute, the same feature in adjacent tiles will not have identical attributes.

So if an attribute check is part of it, and gridded data is in the attributes, only the row and column of the grid that directly intersect the tile boundary can be part of the check for that attribute.

mapbox / vector-tile-spec