Open mourner opened 9 years ago
I made a proof of concept encoding to compare the size wins. A sample z15 tile containing only poi_label
and housenumber_label
layers weights 44.36K gzipped with the current approach, and 31.88K gzipped with the proposed one. This is a 28% cut, which is pretty significant considering it's gzipped. Non-gzipped sizes are 104.87K and 59.94K respectively.
I got the previous result with z-curve ordering. If I sort by id instead, and delta-encode ids, this brings the sample down to 26.04K. This is a 41% cut.
Note that the proposal is totally bluesky, just to start a discussion, it's not something we necessarily have to do in the next spec version.
Here's the code for proof of concept conversion: https://gist.github.com/mourner/a7c6ea76f450bb92e316. I'll have to repeat the tests after fixing big OSM ids which contribute too much to the size — this will produce more fair results.
This approach has also some advantages (apart from the size reduction). When working with WebGL for example you work with array buffers so if the data is stored in the right way it'd be possible to just memory-map the XMLHttpRequest.request
response to a typed array in javascript (the data should be padded to sizeof(typedarray type))
That would require changing the varlen encoding for a fixed byte encoding but I think is much better to allow the compressor do its jobs rather than being smart moving bits here and there and would be better than having to decode byte per byte. Not sure if protocol buffers support that kind of thing.
I understood from a conversation with @springmeyer that the way MVT was though was because you wanted to have compressed features and decompress them on demand.
I thought I'd share a benchmark I while comparing against a point could encoded in draco vs mvt.
The mvt was 171K uncompressed, 68K with gzip and 45K with brotli. The draco encoded data was 14K. I didn't test out any of the ideas in this thread, but my test showed draco to be a 79% cut over gz mvt.
Results would vary a lot based on the type of point cloud data being encoded, but still, it would be interesting to explore what techniques draco uses and if they could be applied to further improve encoding points in mvts.
UPDATE: I think draco might be doing more lossy compression, although I can't tell by looking at the outputs by eye, I'll need to do further benchmarking to ensure I'm comparing for equivalent detail.
Encoding a huge number of points in VT is currently pretty inefficient in terms of output size. This is especially a big problem on the highest zoom levels where POIs and address points are encoded. Let's discuss what we could do about that in a future major spec revision to make encoding optimal.
This is how points are currently encoded in a layer:
Problems with the current approach:
Some of the repetition gets compressed away by gzip, but there's still a lot of data we could eliminate with a smarter encoding approach. Here a rough idea how the sample above could be encoded optimally:
To understand how much of a win this will be in terms of compression, I'll make a quick proof of concept encoding on sample z15 point data and post back here.
cc @yhahn @springmeyer @flippmoke @kkaefer @tmcw @sgillies @ericfischer