systemapic / wu

Systemapic web server and API
https://systemapic.com
2 stars 3 forks source link

Discussion: Import series of raster data #336

Open knutole opened 8 years ago

knutole commented 8 years ago

In regards to version 1.6 (snow raster):

We need to add possibility for a workflow that takes in data and puts it into a "set", from which new layers can be created with updated data.

todo: expand.


knutole commented 8 years ago

Processing with PostGIS:

PG functions for:

http://postgis.net/docs/manual-2.1/RT_ST_FromGDALRaster.html

knutole commented 8 years ago

@strk The next main issue I'd like us to get into, is creating PG functions for dealing with rasters. I can explain the use-case, and perhaps you have ideas of how best to do this:

We have a client which will upload a GeoTIFF each day. These rasters are images of snow-coverage over Scandinavia. Each raster needs to be vectorized, as we'll ultimately create vector tiles and play the data as a timeline-animation in the client.

What we need to create, is a robust way to import this data into a dataset (or dataseries = many datasets).

This is the flow I see could work:

  1. User registers a new dataseries table with the API
  2. User pushes raster to dataseries table, with type=raster, action=vectorize
  3. dataseries got hooks which will vectorize raster into another table, and keep track of such tables
  4. the actual uploaded rasters will also be kept in a table with each row being a separate raster
  5. creating vector tiles from dataseries table will pull data from vectorized_table.

So we'll have

What would be nice is a PostGIS function for dealing with the vectorizing and inserting, as well as pulling data.

Question no. 1: This is early discussion, I'm curious as to how much of this work you think could be done by PostGIS functions/hooks. I'll be working on planning this today, so any input is very welcome! @strk

knutole commented 8 years ago

A question regarding raster vectorization: since we'll be creating timelines (and doing other queries, like averages, min, max) on rasters, i was thinking to vectorize each raster and do queries on those. However, is it equally fast (or faster) to do a query on a set of rasters directly?

For example, we will deal with rasters (GeoTIFF) that are optical sat imagery of Scandinavia, containing a single value 0-255 for each pixel (500x500m). We'll have one GeoTIFF per day - and up to 16 years back in time (365x16 = 5840 rasters). There are several things we need to do with these rasters:

  1. create vector tiles on request for any arbitrary period (eg. vector tiles from rasters between 03.03.2013 - 01.01.2016),
  2. which means querying each raster for pixel values and vectorize this into a pixel-sized polygon in vector tile
  3. calculate the average, min, max value of all pixels in a raster, for multiple rasters.
  4. calculate the avg, min, max of pixels inside a polygon-mask.

Question no. 2: So the question is, to do this kind of work on rasters, is it better to vectorize them first and do all queries on vectorized versions, or is it better to do operations directly on rasters? Ideally, this should be efficient enough to do on-request (without pre-rendering), but that's perhaps hoping for too much.

knutole commented 8 years ago

Raster vs. vectorized benchmarks: http://geeohspatial.blogspot.no/2013/04/testing-postgis-raster-performance.html

Seems doing processing/queries on rasters might be faster? @strk Do you have any thoughts on this?

knutole commented 8 years ago

Needed functions in PostGIS:

// explained here in js, but should of course be pgsql!

// avg
function raster_get_average_by_clipping_mask(raster_id, mask_polygon) {
    // - we have a dataset with one raster per row, and a clipping mask polygon
    // - we want to get average of values in pixels inside the clipping mask
    return int;
}

// max
function raster_get_max_by_clipping_mask(raster_id, mask) {};

// min
function raster_get_min_by_clipping_mask(raster_id, mask) {};
knutole commented 8 years ago
// js -> pgsql

// get vector from raster with tile bounds (need to look at how this should be done, particularily for creating vector tiles with mapnik)
function raster_get_pixel_to_polygon_vector_tile(raster_id, z, x, y) {
    // should return geometry for tile bounds
    // should simplify geometry (with calculated avg values for the combined polygons)
    return vector_tile_geom; // in whatever format mapnik needs
}
strk commented 8 years ago

Generally raster-oriented operation are likely to be faster when done directly on rasters. Vectorizing a raster can probably only help as a generalization of the data, that is when you don't really need the whole raster resolution (but even then you might reduce the resolution of the raster).

I guess this is best handled by drafting requirements in a wiki page so to see how to best organize the data to serve those needs in the best way.

strk commented 8 years ago

https://github.com/systemapic/wu/wiki/Raster-processing-roadmap

knutole commented 8 years ago

Will be organized with https://github.com/systemapic/wu/issues/411