Consider adding scale and offset fields to the gpkg_2d_gridded_tile_ancillary table

thomasneirynck commented 8 years ago

1. Context: storing elevation values

(a) For each elevation tile_matrix_set, you can optionally specify a scale and offset value in the gpkg_2d_gridded_coverage_ancillary table. These parameters apply globally to all tiles corresponding to the this tile_matrix_set. The purpose of these values is to make optimal use of the 16-bit (~65000) value-range of a PNG color channel. The idea is that if you would have highly detailed measurements in a constrained range, you can project your detailed measurements to that 16 bit range (e.g. 0->65k), thus preserving as much detail as possible.

In this case, the real height value for a given pixel i is then computed as:

elevationInUnitOfMeasure = SomeElevationCoverage.tile_data->pngpixels[i] * gpkg_2d_gridded_coverage_ancillary.scale + gpkg_2d_gridded_coverage_ancillary.offset;

(*note: the unit of measure is implied by the spatial reference of the data. Commonly, this will be either ft or m. How to store uom is still a separete discussion, and is related to the ongoing 3D spatial reference issue, cf. #19)

(b) Data producers can also use the gpkg_2d_gridded_coverage_ancillary.precision value to indicate the resolution of the elevation values. This value is more of function of the remote sensing & data collection process and is important to end-users of the data. (1a) is only relevant for implementers.

2) Limitations

There are two limitations to this approach.

(a) It is difficult to express highly detailed elevation data in PNG. For example, you would not be able to use PNG to encode elevation values with centimeter precision because the 16-bit range is not large enough to accommodate real-world fluctuations in height. (e.g. above sea-level, height varies roughly between zero to 10000m, or 1000000cm).

(b) In order to choose the most optimal scale and offset value one must know the min/max values of a given elevation dataset up front. This is impractical when transcoding from elevation-data where this metadata is lacking.

Consider:

transcoding tiles from a tiled webservice that does not advertise the min-max range. e.g. crawling a WCS service, saving a LuciadFusion coverage, ...
raster-processing software often uses parallel processes when cutting up large rasters. More coordination between processes is required when these min-max values need to be known up front.

3) Proposal

We add a gpkg_2d_gridded_tile_ancillary.scale and gpkg_2d_gridded_tile_ancillary.offset column, with a default value of 1 and 0 respectively. These values do not apply globally to the entire coverage, but apply to a single tile specifically.

Mapping the pixel-value i of a png-tile to an elevation, then becomes:

elevationInUnitOfMeasure = (SomeElevationCoverage.tile_data->pngpixels[i] * gpkg_2d_gridded_tile_ancillary.scale + gpkg_2d_gridded_tile_ancillary.offset) * gpkg_2d_gridded_coverage_ancillary.scale + gpkg_2d_gridded_coverage_ancillary.offset;

We believe this would address several issues:

-> (2a) You can now reasonably use PNG to store detailed elevation data. The reason this works is because a tile_matrix_set consists of multiple levels of detail. For real-world data, it is unlikely that the height within a tile at the most detailed zoom level will fluctuate wildly. It is for these high-resolution tiles we also want high-resolution elevation data. High fluctuations in elevation within a tile are possible on the lower, less detailed levels. But precisely because these levels have a coarser horizontal resolution, it is also less relevant to have a high vertical resolution.

-> (2b) we can scale height values for each tile optimally, with minimal upfront processing, when transcoding to gpkg

Remarks:

these tile-specific scale and offset values remain optional. No implementer would be required to optimally use the 16 bit range when producing a tile.
With this change, storing the appropriate gpkg_2d_gridded_coverage_ancillary.precision (cf. 1b) value is more important, since there is a greater possibility of artificially inflating the precision of the values.

jyutzler commented 8 years ago

Sounds like we're on board with this. @trgn please make a PR for the change.

bradh commented 8 years ago

This got applied, and so could be closed. However there is an unresolved issue about it (essentially - I'm opposed to the complexity) on the mailing list.

thomasneirynck commented 8 years ago

Some short personal comments, given the reservations from [~bradh] about this feature:

1) it simplifies data production, especially for source data which is already tiled, or for crawling scenarios where we cannot determine the elevation-min-max beforehand 2) I don't see considerable negative impact on client software . The elevation values need to be read from the png anyway. Applying the linear transformation to it seems to be a minor extra step.

for visualization: The png-tiles cannot be used as is unless you would be content with a single grey-scale visualization. Some sort of intermediate transformation will need to be applied always. This step can incorporate the scale/offset metadata from the tile/coverage table
for analytics: it is true that you no longer can stitch together tiles without applying the transformation. However, in practice, would you not construct first an intermediate representation that is agnostic of the data-format the elevation was stored in, before transformation? Analytical software generally supports multiple data formats; e.g. elevation in png or tiff from geopackage, geotiffs, ...

3) a more fundamental question is if we need scale&offset in geopackage to begin with; could we not leave that to the format completely? I think fundamentally it does have a place here, despite the added complexity. PNG is an appropriate default format for geopackage (small size, many clients, good value-range appropriate for "real world" elevation data). Using scale/offset to expand its applicability makes sense in this context.

With regards to the added complexity this introduces, I follow about halfway there. By adding both scale&offset parameters in the tile and coverage table, reading the data is indeed more complex since we are transforming twice. This could perhaps be reduced to just moving scale/offset to the tile table.

bradh commented 8 years ago

If we do agree to keep it for PNG, we need to split the spec such that TIFF does not inherit those requirements (since they aren't really applicable for a floating point format).

As a counter-example to the need for custom support, ossim would have supported a (at least relative) DEM from geopackage without additional work - there is a generic "image_directory" elevation database. Can read from any file ossim can open. http://trac.osgeo.org/ossim/doxygen/classossimImageElevationDatabase.html. However with per-tile scaling, its a whole new reader implementation.

I still think it is too much implementation complexity for a mass-market format. That is opinion, based on some implementation experience. I still haven't seen any results of anyone trying to read the sample I provided. Perhaps that indicates excess complexity.

thomasneirynck commented 8 years ago

Do you mean the Alaska dataset you produced earlier this year? We visualized it in Luciad software. We had some initial issues, but they were related to tiling/extent configs, not the scaling/offset. Compusult demonstrated that data on the TC in their viewer if I recall correctly.

As for separating out those columns. It is correct that TIFF does not inherit those requirements. But they're also not in conflict with them. If you do not need the values, you can just leave the defaults.

From an implementation perspective, it would be valuable to balance multiple use-cases. "Relative DEM" is one of them, but easy transcoding from other tiled elevation datasets is another, as is easy parallelization of producing tiles from "cloud" data sources (of course, all this under the assumption that it is inherently valuable that we can use PNG optimally).

bradh commented 8 years ago

I mean the revised (2016-03-28) OS terrain set, which had per-tile scaling and offsets.

TIFF inherits the requirements, and all TIFF consumers therefore need to support it. That is part of the complexity argument - I don't want to add complexity for consumers (especially on limited / small platforms), since we need lots more of those. I think producers are already in the position of being able to manage the complexity - typically larger server / desktop environments.

jyutzler commented 8 years ago

If we do agree to keep it for PNG, we need to split the spec such that TIFF does not inherit those requirements (since they aren't really applicable for a floating point format).

I thought we already agreed to split PNG and TIFF so that they do not inherit from one another. Has anyone signed up to actually make the change?

tseval commented 8 years ago

Just a quick question: How should the scale and offset values relate to data_missing and data_null values? Should these nodata values be stored unscaled or scaled in the tile tables? I.e should any checks for nodata be performed before or after scaling/offset?

It might be easier to identify fill values if they are unscaled, and could reduce precision issues, although since the nodata values are defined as REAL anyway it may not matter that much.

I think it should be specified in the standard documents though.

jyutzler commented 8 years ago

The GPKG-EE IE recommended to have two separate extensions and that the PNG extension would be the one to have scale and offset. The SWG voted unanimously to accept this change. We'll keep this open until the change is made @ThomasNeirynck

tseval commented 8 years ago

Will this PNG/elevation extension specify how to encode nodata/missing values in 16-bits unsigned? You probably don't want to mess up the numerical range by using large negative values for nodata, and since it is a REAL in the gpkg_2d_gridded_coverage_ancillary some kind of mapping between actual nodata value and stored value might be useful? For example saying that 0xffff always maps to nodata, and the actual nodata value is given by gpkg_2d_gridded_coverage_ancillary. No scale/offset should be applied to the nodata values.

jyutzler commented 8 years ago

@tseval I am not sure I understand the question. The data_null column in the gpkg_2d_gridded_coverage_ancillary table (https://github.com/opengeospatial/geopackage-elevation/blob/master/spec/1_tiled_gridded_elevation_data_png.adoc#coverage-ancillary) indicates what value corresponds to NULL. Isn't that everything you need?

jyutzler commented 8 years ago

We have split the extensions as previously described. #31

tseval commented 8 years ago

@jyutzler The problem here is that you need to represent the NULL value with a value that can't be misinterpreted as a real elevation value. If you for example say that the NULL value is 65536 and you operate with 0.1 scale and zero offset you get trouble in mountaineous areas with elevations of 6553.6 meters. Alternatively you could say that the NULL value was -32767, but to represent this value you would need a very large negative offset value and you lose most of your precision for the actual elevation values.

I might be confused here, and there might be something obvious that I don't see, but in my implementation of this I have had to assume that nodata values should not be modified by scale and offset, and this again implies that the NULL value must be a value that can be represented by an unsigned short if it is to be stored as PNG data.

opengeospatial / geopackage-tiled-gridded-coverage

Consider adding scale and offset fields to the gpkg_2d_gridded_tile_ancillary table #23