Support 'correction-of-prediction' encoding for linestrings

Jakobovski commented 5 years ago

I am suggesting a new encoding type for linestrings which is smaller for objects that have minimal curvature, such as roads which generally continue in their previous direction.

The current approach is to encode the first point and then the delta for the following points. I tested a slightly different approach that is ~17% smaller (at least on the data I used). The idea is as follows. The first point is encoded in full, the second point is encoded as the delta of the first, and then all subsequent points are encoded as a 'correction-of-prediction'.

The first two points are identical to the current approach. To calculate the 3rd point (and subsequent points), a vector is drawn from the first to second point, that vector is then added to the second point(this is the prediction of where the 3rd point should be) and then the value that is encoded is used to correct the prediction. Mathematically

p1: encoded as (x1, y1), calculated as (x1, y1) p2: encoded as (dx1, dy1), calculated as (x1 + dx1, y1 + dy1)
p3: encoded as (dx2, dy2), calculated as p2 + (p2-p1) + (dx2, dy2)

This works best for roads because they generally continue in their previous direction.

Next step: Test this encoding on a larger dataset and compare results to current encoding.

joto commented 5 years ago

It would be interesting to see how this approach works with other types of data. For instance I would assume that data for buildings (4 corners, right angles) becomes worse, because the prediction will be quit bad. It might also depend on how large objects are to begin with compared to the (typically 4096) extent. After all the new encoding only matters if the deltas are now so much smaller that the varint encoding actually uses less bytes.

I would want to see more data from diverse datasets before this backwards-incompatible change would be made.

Of course we can make this optional which would help a bit with compatibility, but many options can lead to problems with interoparability between different implementations.

If we make it optional so the user can decide which one to use and maybe use the new one for roads and the old one for buildings etc., we put the burden on the user to decide which one is best for their data. In this case we would need some good guidelines and/or software that does the right thing.

Jakobovski commented 5 years ago

@joto

In my opinion the encoding type should be optional, as there are many cases where it is inferior to the current encoding. I think it should be left to the user to decide which encoding type is best for the given data.

mapbox / vector-tile-spec

Support 'correction-of-prediction' encoding for linestrings #137