Vector tiles are too large

knutole commented 8 years ago

Goal

The goal here is to create animations with snow-rasters. We have GeoTIFF's of snow coverage, one raster per day, for fifteen years back. We want to be able to play animation of this coverage - preferably from any given period.

screen shot 2016-03-16 at 4 22 43 am

Vector tile size

Working on "snow-raster" datasets, vector tiles are up to several megabytes (which is kind of insane, given the original raster is a mere 1.6MB.) Should be optimized as much as possible, preferably down to a few kB per tile, although this might be but a dream...

It seems that the vectorized dataset is much larger than the original raster, when vectorized from pixels to polygons. A single vector tile can be as much as 10MB from this 1.6MB raster file.

Having second thoughts about the whole approach.

Possible solutions

1. Points instead of polygons

Vectorizing the raster into points instead of polygons should save us some space. However, the points would have to be represented as squares, and it seems messy.

2. GIF-tiles

Crazy idea, creating 256x256 GIF's on demand on server, and playing them as tiled images in browser. Problems: hard to control playback (if at all possible); stupid solution in general; no dynamic control of style, nor querying of data.

3. Playing raster tiles with transition

Perhaps the easiest solution, after all, but hard to say if it will work without having tried. Seems the size of raster tiles might be smaller than vector tiles (contrary to intuition), so at least an improvement in that regard. Also, no dependency on GL.

4. Heavy simplification of polygons.

Currently simplification of polygons are not really working, perhaps because all polygons are perfect squares, so not much to simplify without dropping polygon altogether. Another way to simplify polygons, would be to combine four of them into one large polygon (with average value). This will lose a lot of the resolution, however - but something's gotta give!

5. TopoJSON?

Instead of storing all four corners of polygons, perhaps store only one corner (similar to what TopoJSON does). This is feasible for vectorized rasters, and will cut size - in theory - by a factor of four. (Which is not enough, however.) It seems there's a lot to gain with simplification in general, though.

6. D3.js

Eureka: The polygons are always the same - only the value val changes in the timeseries! Thus, no need to have several representations of polygons, only need the val and id. This saves massive bytes.

The geometry of the datacube (2D datasets over time) could be represented as a TopoJSON, created from vectorized raster - then all the values (val) of each polygon (id) for each slice (of the timeseries) could be queried at will and colored in by d3.js.

Tests and debug tools

Debug page: https://dev.systemapic.com/debug/snow-raster.html. Code in /wu/views/debug/.
See https://github.com/systemapic/pile/tree/master/test for other tests and debug tools (explained in commit message)

knutole commented 8 years ago

D3.js solution

References:

knutole commented 8 years ago

Still having problems with exporting geometry packed enough.

First order of business:

Get a representation of vectorized (snow) raster as small as possible. Need to find limits of both TopoJSON and vector tiles.
What about a grid? Simply representing the pixels as a grid - horisontal + vertical lines, representing each pixel as a grid-box? Geometry will be much smaller, but perhaps too much of a workaround to be feasible? (D3.js could probably handle it.)

knutole commented 8 years ago

@strk Would be great if you could help me solve this problem on Friday!

strk commented 8 years ago

I'm not surprised that a vectorized version of a raster is bigger then the raster. In a raster every pixel can be a single byte (depending on the amount of information stored for every pixel) while the vectorized version has at least 93 bytes in WKB form (and 14 in TWKB form). So how much would a vectorization benefit in size highly depends on how many polygons are needed to represent how many pixels. Remember also that adjacent polygons have duplicated boundary which makes things even worst in terms of size.

So, going with a raster representation is IMHO a good idea. What you call "a grid" is effectively not different from a raster.

The whole dataset is composed roughly by ~5,500 values (one value per day for 15 years) per pixel, and by ~1,182,276 pixels (801x1476 resolution). Which means 6,502,518,000 values to be possibly represented.

At any given time the user would probably only be looking at a smaller resolution version of the raster and for a less detailed time-resolution too, so we probably don't want to transfer the whole data to the client at once, for rendering on the client.

What is being served right now is not clear to me. I understand protobuffer-encoded mapnik vector tiles containing a vectorized version of the raster are being sent, but dunno yet the amount of data they contain (which resolution, how many timeslots). Will be setting up some debugging tools to figure that out and add more info here as I have them.

knutole commented 8 years ago

Possible solutions (cont'd):

As per discussion with @strk, there are several ways to do this, all with their own pro's and con's. This is an overview of those:

Raster tiles animated with CSS transitions

Simply generating raster tiles for each slice. Need to serve them continuosly and transition with CSS.

Problems: Frames-per-second: If we're doing 1fps, and the area is say 25 tiles, then we need to serve 25 tiles per second (and transition). Each tile is approx. 25kB, ie. 25*25kB = 625kB per frame. 30 days worth of animation, is then 30*625kB = 18.75MB. With 50% optimization, we're still at 10MB. This is perhaps within acceptable size.

Tiles will have to be fetched with websockets (at 625kB/s), due to number of requests.

Video tiles

Generating a 256x256 video (per tile) of arbitrary time-range on demand, adding this to tile grid.

Problems: Format of video must be investigated - gif or ogg or mp4. Possibly heavy for browser to play 25 tile-videos at the same time.

Single video

Generating a video with same pixel ratio as original raster, adding to map by georeferenced frame.

Problems: The video could become quite big. 800x1400 pixels. What about streaming, instead of contained file?

systemapic / pile