Which pixels to select when subset does not match pixel boundaries

jratike80 commented 4 years ago

Imagine to have some raster data with pixel size 10 by 10 units and bbox starting at 0,0. Let's put the anchor point at the centre of the pixel. Now user requests a subset as

&SUBSET=E(572,612)&SUBSET=N(477,518)

Which pixels should be selected? A) All pixels that intersect with the subset

B) Only pixels which all totally contained by the subset

C) Only pixels whose anchor points are within the subset

With plain GetCoverage no resampling nor scaling should happen and if I understand right, in all three cases the envelope of the result is different from the envelope of the subsetting request. Also, if user sends a new request with subset envelope that is adjacent to the previous one, interpretation A would give duplicate pixels, interpretation B) would lead to rows of missing pixels, and only alternative C) would give a resultset that is adjacent to the first resultset.

These sections in the WCS 2.0 Core deal somehow with the case but I feel that they do not answer my question.

The WCS Core standard defines the domain subsetting operation which delivers all data from a coverage inside a specified request envelope (“bounding box”), relative to the coverage’s envelope – more precisely, the intersection of the request envelope with the coverage envelope.

Requirement 32 /req/core/getCoverage-request-trim-within-extent: Let the extent of the coverage’s gml:Envelope along the dimension specified in the trim request range from L to H. Then, for the trim bounds trimLow and trimHigh the following shall hold: L <= trimLow <= trimHigh <= H.

Let further c be the OfferedCoverage of the server addressed; low = tLow if specified in the request, otherwise low is set to the coverage’s lower bound in dimension dname; high = tHigh if specified in the request, otherwise high is set to the coverage’s upper bound in dimension dname; B be an envelope equal to the domain of c, except that in dimension dname the extent is given by the closed interval [low,high]; Then, the following requirement holds:

Requirement 38 /req/core/getCoverage-response-trimming: The response to a successful GetCoverage request on coverage identifier id of admissible type containing no slicing and exactly one trimming operation with dimension name dname, lower bound parameter evaluating to low, and upper bound parameter evaluating to high shall be a coverage identical to c, but containing all points of c with location inside B, and no other points. NOTE This requirement does not specify the actual extent of the coverage returned. Possible options include: the minimal bounding box of the coverage returned, or the request bounding box. Servers are strongly encouraged to deliver the minimal bounding box.

percivall commented 4 years ago

This existing OGC document may have material you can reuse: http://docs.opengeospatial.org/is/10-140r2/10-140r2.html http://docs.opengeospatial.org/is/10-140r2/10-140r2.html

On Aug 7, 2020, at 9:09 AM, jratike80 notifications@github.com wrote:

Imagine to have some raster data with pixel size 10 by 10 units and bbox starting at 0,0. Let's put the anchor point at the centre of the pixel. Now user requests a subset as

&SUBSET=E(572,612)&SUBSET=N(477,518)

https://user-images.githubusercontent.com/1751612/89645168-ecb82400-d8c1-11ea-9a72-d7712e2478a0.png Which pixels should be selected? A) All pixels that intersect with the subset

https://user-images.githubusercontent.com/1751612/89646239-14a88700-d8c4-11ea-9ca7-afae44e1e75a.png B) Only pixels which all totally contained by the subset

https://user-images.githubusercontent.com/1751612/89646370-53d6d800-d8c4-11ea-9553-6577270e9c07.png C) Only pixels whose anchor points are within the subset

https://user-images.githubusercontent.com/1751612/89646455-79fc7800-d8c4-11ea-82aa-6cfc525de111.png With plain GetCoverage no resampling nor scaling should happen and if I understand right, in all three cases the envelope of the result is different from the envelope of the subsetting request. Also, if user sends a new request with subset envelope that is adjacent to the previous one, interpretation A would give duplicate pixels, interpretation B) would lead to rows of missing pixels, and only alternative C) would give a resultset that is adjacent to the first resultset.

These sections in the WCS 2.0 Core deal somehow with the case but I feel that they do not answer my question.

The WCS Core standard defines the domain subsetting operation which delivers all data from a coverage inside a specified request envelope (“bounding box”), relative to the coverage’s envelope – more precisely, the intersection of the request envelope with the coverage envelope.

Requirement 32 /req/core/getCoverage-request-trim-within-extent: Let the extent of the coverage’s gml:Envelope along the dimension specified in the trim request range from L to H. Then, for the trim bounds trimLow and trimHigh the following shall hold: L <= trimLow <= trimHigh <= H.

Let further c be the OfferedCoverage of the server addressed; low = tLow if specified in the request, otherwise low is set to the coverage’s lower bound in dimension dname; high = tHigh if specified in the request, otherwise high is set to the coverage’s upper bound in dimension dname; B be an envelope equal to the domain of c, except that in dimension dname the extent is given by the closed interval [low,high]; Then, the following requirement holds:

Requirement 38 /req/core/getCoverage-response-trimming: The response to a successful GetCoverage request on coverage identifier id of admissible type containing no slicing and exactly one trimming operation with dimension name dname, lower bound parameter evaluating to low, and upper bound parameter evaluating to high shall be a coverage identical to c, but containing all points of c with location inside B, and no other points. NOTE This requirement does not specify the actual extent of the coverage returned. Possible options include: the minimal bounding box of the coverage returned, or the request bounding box. Servers are strongly encouraged to deliver the minimal bounding box.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/OGC-API-Sprint-August-2020/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5EV7DUYIWGDMLLSW2A7LLR7P4JLANCNFSM4PXTVW6A.

-- https://www.ogc.org/webinars

pomakis commented 4 years ago

I believe the correct answer is C. At least that's the way CubeWerx has implemented it. The way I see it, coverages typically deal with points, not pixels. When a coverage is returned in a pixel-based representation such as GeoTIFF, it represents each sample point by a pixel, where the sample point is at the centre of each pixel. If a sample point intersects the requested subset, it should be returned in the GeoTIFF response as a pixel; otherwise it shouldn't.

CubeWerx's implementation of the various OGC web services distinguishes between what we call a "cell-based" extent (used, for example, by WMS) and a "grid-based" extent (used, for example, by WCS). A grid-based extent is always one pixel smaller in each dimension (0.5 pixels per side) than its corresponding cell-based extent.

So if there's a sample point at each integral unit and a coverage subset of E(572,612) N(477,518) is requested, the resulting GeoTIFF will contain 41x42 pixels (representing 41x42 sample points). Its grid-based extent is exactly what was requested (572,477 to 612,518) but its cell-based extent (that is, corner to corner) is 571.5,476.5 to 612.5,518.5. It you wanted to make a map request of this coverage subset, that'd be the extent you'd specify.

If a coverage subset of E(571.9,612.1) N(476.9,518.1) is requested, you'd get exactly the same response.

However, if a coverage subset of E(572.1,611.9) N(477.1,517.9) is requested, you'd get a response that's only 39x40 pixels in size, representing the subset E(573,611) N(478,517). It would have a grid-based extent of 573,478 to 611,517 and a cell-based extent of 572.5,478.5 to 610.5,517.5.

jerstlouis commented 4 years ago

@pomakis I am not very familiar with CIS, but aren't both pixel-based and cell-based valid coverages?

GeoTIFF for example supports both PixelIsArea and PixelIsPoint, as described here.

The choice of origin for raster space is not entirely arbitrary, and depends upon the nature of the data collected. Raster space coordinates shall be referred to by their pixel types, i.e., as "PixelIsArea" or "PixelIsPoint".

Does WCS really imply values represent point, and does not support values representing areas?

pebau commented 4 years ago

AFAIK this is a specialty of GeoTIFF (cf CIS GeoTIFF encoding) and not supported by GRIB, NetCDF, etc.

pebau commented 4 years ago

re subsetting behavior, here how rasdaman does it: https://doc.rasdaman.org/05_geo-services-guide.html#subsetting-behavior (I'd guess that all tools do it the same way).

jratike80 commented 4 years ago

The way I see it, coverages typically deal with points, not pixels.

We deliver orthophotos and DEM through WCS and I believe that majority of our users believe that they are dealing with pixels. They may be wrong with DEM but anyway, they consider WCS as a more flexible alternative for downloading GeoTIFFs because they are not tied to fixed size 6 x 6 km tiles.

jerstlouis commented 4 years ago

@jratike80 I would argue that both "Value is for point" and "Value is for area" are perfectly valid use cases for a coverage of data values... Based on what it is they are representing / how those values were obtained.

It would be nice if OGC API - Coverages could better address this.

pomakis commented 4 years ago

The behaviour implemented by CubeWerx is based on the Web Coverage Service Implementation Standard, version 1.1.2 (OGC 07-067r5), section 7.6.3, which states:

7.6.3 Treatment of edge grid points

The spatial extent of a grid coverage extends only as far as the outermost grid points contained in the bounding box. It does NOT include any area (partial or whole grid cells or sample spaces) beyond those grid points.

NOTE: This bounding box is NOT the extent sometimes considered, which also includes rectangular sample spaces (pixels) centered on the outermost grid points – as indicated in Subclause 7.3.3 6 of WMS 1.3 [OGC 04- 042]. Such pixel extents are often poor approximations of the sensor physics / grid data collection process.

If the more recent standards are more flexible than this and can support either model, then this certainly needs to be made more clear (especially in the way it affects how subset extents are to be interpreted).

jerstlouis commented 4 years ago

@pomakis Agreed.

Such pixel extents are often poor approximations of the sensor physics / grid data collection process.

If the client (or the maps renderer) does not know whether a value represents a sample collected exactly at that position, or an average, then it cannot know how to interpret or render the data and greatly contribute to further reduce the quality of the data (lossy operation), and this is a much worst problem than this being an average (which might be intended, and disagree with the point about this being a poor approximation).

pebau commented 4 years ago

@pomakis Ha, that dates back to Arliss Whiteside who had very clear opinions about remote sensing.

What would you phrase it like instead? Would we want to discuss it in the Coverages.SWG? (need to get used to the new name!)

pomakis commented 4 years ago

@pebau , I'm not a coverages expert, so I really don't know how it aught to be worded. I only know that it aught to be spelled out clearly in "OGC API - Coverages". If individual coverages are allowed to be either value-is-point or value-is-area, it aught to be clear to the client how to determine this, and it aught to be clear to the client how that affects the interpretation of the subsetting parameters. And if possible, "OGC API - Coverages" aught to be compatible with WCS in this respect so that the same subsetting request issued to either service will result in the same samples being selected.

cmheazel commented 4 years ago

"Value is for point" and "Value is for area" are properties of the coverage. Unless the API knows how to get this information from the coverage, it cannot pass it on to a client.

Perhaps we need to specify a minimum set of coverage metadata for the collection information. Understanding that nil values may occur (see gml:nilReason).

jerstlouis commented 4 years ago

@pomakis @pvretano An interesting fact is right now the Daraa DTED

shows as AREA_OR_POINT=Area in QGIS. I am guessing there would be a tag or metadata to set so it could properly report Point?

Band 1
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=1579
STATISTICS_MEAN=531.70923223104
STATISTICS_MINIMUM=-371
STATISTICS_STDDEV=481.86566683912
STATISTICS_VALID_PERCENT=100

More information 
AREA_OR_POINT=Area
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
TIFFTAG_SOFTWARE=CubeWerx Suite 9.3.14
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1

Schpidi commented 4 years ago

I agree with @cmheazel, that this is a property of the coverage. In the OGC® GML Application Schema - Coverages - GeoTIFF Coverage Encoding Profile we tried to make this explicit, e.g. with:

Note GMLCOV does not store the raster type i.e. PixelIsArea or PixelIsPoint information in its current version. However, Requirement 9 above specifies that the gml:boundedBy element shall respect the raster type i.e. it shall include the half pixel border in case of PixelIsArea (see also Figure 1) but not in case of PixelIsPoint. In other words, in case of PixelIsArea the gml:boundedBy element is decreased by the half of both gml:offsetVector elements in the gml:lowerCorner coordinate and increased by the half of both gml:offsetVector elements in the gml:upperCorner coordinate compared to the case of PixelIsPoint. The gml:origin element stays the same independently of the raster space.

jerstlouis commented 4 years ago

@pomakis

GTIFKeySet(gtif, GTRasterTypeGeoKey, TYPE_SHORT, 1, RasterPixelIsPoint);

with libGeoTIFF.

ghobona commented 3 years ago

From GML 3.2.2 "When a grid point is used to represent a sample space (e.g. image pixel), the grid point represents the center of the sample space (see ISO 19123:2005, 8.2.2)."

Note that the extract references ISO 19123 - Geographic information — Schema for coverage geometry and functions

chris-little commented 3 years ago

To add my perspective to the confusion, it seems common practice for ‘Pixel is Point’/’Pixel is Area’ to be combined with the grid definition to give a grid with offsets. This ‘simplification’ strikes me as unnecessarily complicated.

I think grid definition and pixel/grid-point representativeness are orthogonal and any combination could allowed. This is certainly the practice in NetCDF and GRIB formats used in meteorology.

E.g. either define the grid (1-D for simplicity) to be, say, [0, 1, 2, 3, …] then the ‘Pixel is Point’ represents data at [0, 1, 2, 3, …] and ‘Pixel is Area’ could represent data at [(-½ to +½), (+½ to 1½), (1½ to 2½), (2½ to 3½), … ]

Of course, one could define a grid to be [½, 1½, 2½, 3½, … ] then the ‘Pixel is Point’ represents data at [½, 1½, 2½, 3½, … ] and ‘Pixel is Area’ could represent data at [(0 to 1), (1 to 2), (2 to 3), …]

I suspect the fundamental ISO19123 definition allows both options but is also misinterpreted. Which is the most convenient approach to present to a user, or the most convenient for a service to implement?

jerstlouis commented 3 years ago

@chris-little In my opinion it is fundamentally wrong to move the geo-referenced position of sample data, whether they represent an area (average, minimum, maximum or otherwise) or a point by an offset based on what it is. It could well be a misinterpretation of ISO19123 -- by all means please help me if you find any evidence of it :) And my belief is that a CIS 1.2 could add this (optional, to be backwards compatible) explicit distinction between point and area, and one would not have to offset anything to express this.

Searching through CIS 1.1, the only reference I see to "offset vector" is this sentence:

Such a General Grid does not contain global offset vectors because these are available with the axis subtypes where appropriate. It does not contain a rotation vector as this can be modelled by concatenating the CRS with an appropriate Engineering CRS for general affine transformations.

And from the UML diagram of GeneralGrid, it describes exactly the simplistic model I was discussing today (no origin or direction or offset vector):

Lower bound and upper bound in index axes
lower bound and upper bound in geo space (regular axes)
resolution and unit of measurement and that does the mapping -- except for missing the point vs. area notion!!

So I am also of the opinion that the "pixel is area" or "pixel is point" is entirely orthogonal, or worded otherwise (and to support n-dimensions, e.g. voxel is volume), along each axis, the "span" in units of measurement (or fraction of the resolution) to which the data values correspond.

pomakis commented 3 years ago

Chris Little wrote:

[...] I think grid definition and pixel/grid-point representativeness are orthogonal and any combination could allowed. This is certainly the practice in NetCDF and GRIB formats used in meteorology.

E.g. either define the grid (1-D for simplicity) to be, say, [0, 1, 2, 3, …] then the ‘Pixel is Point’ represents data at [0, 1, 2, 3, …] and ‘Pixel is Area’ represents data at [(-½ to +½), (+½ to 1½), (1½ to 2½), (2½ to 3½), … ]

My question at this point is which way bounding boxes are interpreted. In this example, would the lower coordinate of the bounding box of the coverage be -½ or 0? And what about the bounding box of the map layer for the coverage? Are they the same, or is the bounding box of the map layer larger by half a pixel on each side (because the map world is always PixelIsArea)?

-- Keith Pomakis pomakis@cubewerx.com Senior Software Developer, CubeWerx Inc.

jerstlouis commented 3 years ago

@pomakis I would understand both Bounding Boxes / Subset to always be specifying exactly the same geographic extent. This way subset=Lat(10:15),Lon(20:25) and bbox=20,10,25,15 always consistently refers to exactly the same thing.

For example, for a PixelIsPoint DEM, requesting exactly its geographic extent/envelope (where the corners match precisely to the corner elevation samples), returns you all points.

The exact same subset or bounding box coordinates for a map of that DEM, also returns you a full map of that DEM, but a map renderer to fill that "area" between 4 samples would then shade the area in between the 4 DEM samples with an interpolated gradient between those 4 points).

If the coverage is PixelIsArea satellite imagery, requesting exactly its geographic extent/envelope (where the corners match precisely to the corner elevation samples), returns you all 'pixels'.

The exact same subset or bounding box coordinates for a map of that imagery, also returns you a full map of that imagery (i.e. exactly the same thing if the resolution is the same).

pomakis commented 3 years ago

Okay, so suppose you have a PixelIsPoint DEM that represents 4x4 samples in a Lat/Lon grid where the points are at Latitudes 20 to 23 and Longitudes 40 to 43. You're saying that the bounding box for this coverage would be (20,40) to (23,43) and that making a coverage request for bbox=20,40,23,43 will return a 4x4 grid containing all 16 points. And that makes sense.

But if you wanted a map of this coverage at its native resolution (one sample per pixel, so 4x4 pixels where each pixel represents exactly one sample), you couldn't do it by making a map request for bbox=20,40,23,43. Map bounding boxes are always interpreted as representing the outside edges of the pixels, so you'd get back a 3x3 map where each pixel represents the average of the four samples at its corners. The only way I can see around this is to advertise the bounding box of the map to be (19.5,39.5) to (23.5,43.5).

Am I missing something?

jerstlouis commented 3 years ago

@pomakis Glad we agree that the DEM coverage request makes sense :) To clarify, when doing a subset of a coverage in its native resolution, whether the point represents an Area or a Point makes no difference to what is being returned -- you get exactly the same cells back in both cases. The distinction is only relevant for knowing "what to do" with those grid points/values.

Short answer for rendering a map of that 4x4 PixelIsPoint DEM: the native resolution to render a map out of it is NOT 4x4, it's 3x3, because a map pixel is inherently an average of the area, and those grid sample values are not averages. Detailed explanation below :)

Assuming your map renderer is capable of interpolating between the original sample values, my approach is essentially identical to your (19.5,39.5)-(23.5,43.5) bbox, except for the map advertising the same extent as the coverage, but it is more accurate, as that 0.5 offset is stretching it beyond the geospatial extent actually covered by the data, since the values are spot values and not an average. It is much more convenient as well, considering that maps and coverage now share the same collection description document where the extent/envelope is specified.

Now let's consider the use case of rendering a map from a DEM, and let's consider both a PixelIsArea DEM (average height) and a PixelIsPoint DEM (theoretically precise height at the sample position). In both cases, the geographic extent of the DEM will be the same Lat(20:23),Lon(40:43), and are the CIS GeneralGrid lowerBound and upperBound of the regular axes.

For the PixelIsPoint DEM, (20,40) is the bottom-left corner of the DEM for which you have an exact value, and you have a 4x4 grid of values, so the GeneralGrid index bounds are (0,0)-(3,3).

For the PixelIsArea DEM, (20,40) is still the bottom-left corner of the DEM, but the value that you have is valid for a whole cell -- so that DEM grid is 3x3, and the GeneralGrid index bounds are (0,0)-(2,2).

Now let's render maps, with same bbox or subset request of the whole extent which is the same in both cases: Lat(20:23),Lon(40:43), and let's first consider rendering a map with a lot more pixels than cells in the coverage,-- i.e. still using the native resolution DEM cells values (not interpolating those), but rendering hillshading and an elevation color map with smooth interpolation.

For the PixelIsPoint DEM, you will want to use the heights for exactly where they were measured, or the maps will be off, and multiple tiles rendered the same way will show a seam -- and you can interpolate the hillshading and color map in between a fair bit: e.g. larger map vs. coverage resolution map.

For the PixelIsArea, if rendering a larger map, you may want to assume it represents the center position of a cell to be able to calculate hillshading and interpolate colors etc.

Now about retrieving a map of the coverage native resolution... In our PixelIsPoint DEM with a 4x4 cells grid, the map should actually be a 3x3 pixels map, as you have no information about the height outside that extent: the corner values does not carry any average information for anything extending beyond the geospatial extent of the coverage. So best way to render the map (whose pixels represent an area) would be to calculate the average between 4 (corners of the pixels) samples of the coverage (that's exactly how we implement rendering maps of elevation coverages in our software).

In the PixelIsArea DEM with a 3x3 cells grid, the map pixels can directly represent those values (same as if the map was imagery, also representing the average color of an area).

So in both cases, you would get a map with 3x3 pixels, of the exact same area.

No pesky 0.5 offset involved anywhere in any of these scenarios! :)

(Except perhaps the maps renderer turning the 4x4 PixelIsPoint DEM into a 3x3 map which will multiply the sample values by 0.5 as it averages between the 4 corner values, but that's just a special case of 'rendering' the coverage at that resolution -- you would use different factors if you were generating more than a single pixel in between those corners).

Knowing that a value represents an area or a (theoretically) infinitely small point is much more useful than an offset, which still doesn't carry that information.

I really believe CIS needs to be extended to add this notion, so we can stop messing up the geo-referencing of coverages, without actually solving the fundamental problem of needing to know whether a value spans the grid resolution, or indicates a value at a precise point on the grid.

@chris-little could confirm but I believe this logic can be applied not only to elevation values, but just as well for retrieving values or rendering maps of values where samples can represent anything else, at least for scenarios mapping to precise location or area of a cell (regardless of the meaning of the value: average, minimum, maximum as well, I believe...).

jerstlouis commented 3 years ago

@Schpidi Do you know what those GMLCOV constructs you mentioned, e.g. gml:boundedBy, gml:offsetVector, gml:lowerCorner, gml:upperCorner, gml:origin map to in CIS 1.1?

I see a mention in passing of "global offset vector", and "gml:boundedBy", but those notes seem to be floating in the air, and those elements are not defined in the CIS GeneralGrid class.

I really like the simplicity of this GeneralGrid class (which corresponds directly to the DomainSet), and as I undertand it there should not be any offset in there based on the raster type.

But I do think we need to add this information about the span along axes of a grid value (either simply "point or area", or as a fraction of the resolution where 0 = point .. 1 = area), in a 1.2 update to CIS (while also tackling the other important CIS issues: https://github.com/opengeospatial/ogcapi-coverages/issues/104 and https://github.com/opengeospatial/ogcapi-coverages/issues/102).

jratike80 commented 3 years ago

I would like to remind that my original question was about how to select pixels when the bounding box limits do not match with the pixel limits (or with the centre points). See the images there at the top. The context is core WCS/OGC API Coverages without resampling.

But naturally there must be a consensus about the special case with pixel aligned bboxes first.

jerstlouis commented 3 years ago

@jratike80 my opinion on how this should work: With PixelIsPoint, the cells get included if the point is within the inclusive subset limits. With PixelIsArea, the cells get included if any part inside the area (excluding the area limit) is within the inclusive subset limits.

Regardless of whether the data is PixelIsPoint or PixelIsArea, if you do a subset for the GeneralGrid lower & upper bounds / the envelope (assuming those are the same, with no 0.5 offsets between these as discussed above), in both cases you would get all cells by passing that exact extent.

Would that make sense?

jratike80 commented 3 years ago

@jerstlouis It does make sense but the PixelIsArea case would have a side effect: If user makes requests with adjacent bounding boxes, then the pixels from the boundary that is common for the both BBOXes and which are only intersected by the bounding box, but are not completely within, would be included in both adjacent result sets. In this context pixels behave like they were rectangular polygons and user makes ST_Intersects query with another geometry.

It may be the best and most logical way to handle the case but it should be defined and documented. Users who are building mosaics from the individually downloaded pieces of the coverage should be aware that duplicated rows of pixels may exist at the seams of the tiles and avoid some workflows in further processing. For example more exotic blend modes https://en.wikipedia.org/wiki/Blend_modes than "normal" might lead to surprising results.

With pure GetCoverage / ?Items requests the duplicated pixels in the result sets should be perfectly aligned with the same pixel values and user can simply select one or another, or the average, and the result is always the same as in the source data. If any resampling or processing is involved then the overlapping pixels are not necessarily equal.

pebau commented 3 years ago

a few more issues, just to underline the complexity of the topic:

discussion seems to think of regular grids only. What would it mean in case of irregular grids?
discussion seems to focus on Lat/Long. What about pixel-in-X for time etc?
why do we assume that pixels have clearly delineated polygons and do not overlap? From sensor physics I'd rather think that generally a pixel is obtained through a weighted integral, conceptually going from the CCD element's center to infinity, but at least involving some neighbourhood possibly larger than the distance to the next element.
why do we assume that the value of a pixel is spatially extended and constant over some neighbourhood? Interpolation will yield different values even close to a pixel. The pixel areas discussed seem to apply only to nearest-neighbour interpolation where "nearest" already is interesting in itself (cf. irregular grids, krieging details, etc.).
pixel-in-corner...in what corner? 2D as 4 corners, 3D has 8 corners, etc. And, as we are going: why in some corner, and not at 75%? Supporting just one corner seems random.

This may motivate why I am quite hesitant about quick shots.

chris-little commented 3 years ago

@jerstlouis I believe this logic can be applied not only to elevation values, but just as well for retrieving values or rendering maps of values where samples can represent anything else, at least for scenarios mapping to precise location or area of a cell (regardless of the meaning of the value: average, minimum, maximum as well, I believe...). Well, yes, but I take issue with the assumption made earlier: ... best way to render the map (whose pixels represent an area) would be to calculate the average between 4 (corners of the pixels) samples of the coverage... This may not be the best thing to do with values which may or may not be rendered as a map. If PixelIsArea values occur near or on the edge of a specified boundary, the values do represent something that occurs outside that boundary. The choice of what to do has to be determined by what information is available about the PixelIsArea and the cells. Interpolation, for example, may not be the best choice, such as for rainfall. Many meteorological parameters can be interpolated but need several hours on a supercomputer to get sensible results.

As @pebau says above, the 'cell areas' may change across the grid. The only difficult issue is when the boundary cannot be physically extended such as north of the North Pole on a Lat/long grid. And that begs the question why did someone choose such a grid with a value 'representativeness' that extends into imaginary space?

chris-little commented 3 years ago

To summarise my position:

Grid definition is independent from the 'representativeness' of a value at a grid point.
Values that are not point-like and are representative of an area around a grid point, can convey useful information outside any boundary that passes through that grid point.
What to do with that information is very use case dependent and needs more information to do so.

chris-little commented 3 years ago

@jerstlouis I agree offsets are pesky. I certainly did not intend that the words 'vector' or 'offset' implied data would be moved. My reading of the above thread is that we all agree that defining grids and representativeness of points are separate, and that there is an issue with how to interpret a PixelIsArea cell that falls partially outside of a subsetting box.

pomakis commented 3 years ago

I know that in the end we need to figure all of this out for the general case, but I'd like to get a handle on the simple grid case first (because if we can't get that straight then we're in trouble).

So lets get back to the example of a DEM that represents 4x4 samples in a Lat/Lon grid where the points are at Latitudes 20 to 23 and Longitudes 40 to 43. Let's say the 8-bit sample values are:

255 000 255 000
000 255 000 255
255 000 255 000
000 255 000 255

And let's say that this coverage has a corresponding map layer whose style is simply a grayscale (0..255) representation of the (possibly interpolated) value.

@jerstlouis, if I understand you correctly, you're saying this:

If the DEM is PixelIsPoint, the advertised BBOX (full extent) of DEM would be:

(20,40) to (23,43)

the response of full-extent coverage request at native resolution would be:

255 000 255 000
000 255 000 255
255 000 255 000
000 255 000 255

the advertised BBOX (full extent) of corresponding map layer would be:

(20,40) to (23,43)

and the response of full-extent map-layer request at native resolution would be:

128 128 128
128 128 128
128 128 128

This would mean it's impossible to get a one-sample-per-pixel map layer of the entire coverage. In fact, for the client to request a one-sample-per-pixel map layer of even a subset of the coverage, the calculations for what bounding box to request would have to include a half-pixel expansion on each edge.

Now lets move on to the PixelIsArea case. You're saying that if the DEM is PixelIsArea, the the advertised BBOX (full extent) of DEM would still be:

(20,40) to (23,43)

the response of full-extent coverage request at native resolution would still be:

255 000 255 000 255
000 255 000 255 000
255 000 255 000 255
000 255 000 255 000

the advertised BBOX (full extent) of corresponding map layer would still be:

(20,40) to (23,43)

and the response of full-extent map-layer request at native resolution would still be:

128 128 128
128 128 128
128 128 128

Is this right? So there's no difference at all between the PixelIsPoint and the PixelIsArea cases? And in neither case is it possible to get a one-sample-per-pixel map layer of the entire coverage?

I dunno, maybe I shouldn't be stuck on trying to allow the retrieval of a one-sample-per-pixel map layer of an entire simple-grid coverage. But it just seems to me that this is what the client would expect back when making a collections/{coverageId}/map?width={coverageNSamplesWide}&height={coverageNSamplesHigh} request.

(In fact, the above request would return a weirder result still, since it's requesting a 4x4 interpolation of a 3x3 space.)

So you can see that I'm still confused about all of this. Forgive me if I'm being dense, but I figure that if I'm struggling with this, others probably are too.

chris-little commented 3 years ago

@jerstlouis @pomakis The coverage responses are what I would expect, as a non-expert.

I do not know why any mapping software has to turn 4x4 meaningful data points into something misleadingly 3x3 with the data smoothed away to uselessness. :-{

i accept that is what some software might do. Does that matter for a coverage API?

jerstlouis commented 3 years ago

@chris-little

This may not be the best thing to do with values which may or may not be rendered as a map. If PixelIsArea values occur near or on the edge of a specified boundary, the values do represent something that occurs outside that boundary.

I think you may have misunderstood what I tried to say in one of my statements earlier... I was talking about PixelIsPoint coverage data, out of which you specifically want to render a map. For a map, on which you may render multiple things (e.g. all kinds of coverages, representing both points & areas, and features alike), a 'pixel' always represents an area.

@pomakis In my opinion, turning a 4x4 grid of PixelIsPoint elevation values to render a map of 3x3 pixels is the only thing that makes sense.

For example, to render a 256x256 map tile of a PixelIsPoint elevation coverage, you would need to sample 257x257 elevation values, and average 4 of them to generate each pixel. You also need those same 257x257 points to generate a 3D mesh of the terrain on which to drape that map, and those 256x256 area pixels is what will look correct when draped onto that terrain. But you could render a 4096x4096 map of the same area with a much more gradual gradient between the data values (with good interpolation you can get a nice looking map with a much coarser coverage).

The right-most of those 257 source elevation values horizontally, is exactly the same value as the first value of the tile to its right. When rendering a map of each of these two tiles, the last pixel of the left tile is a completely different area to the first pixel of the right tile (adjacent, but non-overlapping).

jerstlouis commented 3 years ago

@pomakis In your example with byte values, you got the first (PixelIsPoint) case correct.

However, that's not what I intended to say for PixelIsArea. Also you mentioned "still have the same" but in fact you added a column for a 5x4 PixelIsArea DEM. I will assume you meant to still have the same 4x4 values:

255 000 255 000
000 255 000 255
255 000 255 000
000 255 000 255

So yes, you would still get the exact same value with a coverage subset.

However, what I was saying is in this PixelIsArea case, the full extent map request would return you exactly the same 4x4 values as the coverage request:

255 000 255 000
000 255 000 255
255 000 255 000
000 255 000 255

because the map pixels are also an area that correspond directly to your coverage.

And I will argue that in both cases, you are getting back a one-(area)sample-per-pixel map layer of the entire coverage. It just so happens that in the PixelIsPoint DEM case, an averaging of the point coverage samples is necessary to calculate the values for each area (because your coverage values do not represent an area, and the map pixels do represent an area).

jerstlouis commented 3 years ago

@jratike80 I understand your concerns about the adjacent bounding box requests, which is why I was suggesting to avoid the area limit, in order to be able to have clean non-duplicating adjacent requests of the raw coverage data:

With PixelIsArea, the cells get included if any part inside the area (excluding the area limit) is within the inclusive subset limits

The case where the bounding box request cuts through the middle of an area however cannot really be handled easily, which is why it could be documented that in order to avoid any duplication, requests should be made at intervals which are a multiple of the resolution, e.g. lowerLeft.lon + degPerCell * k. This would apply to both PixelIsPoint and PixelIsArea, except that if you don't want duplication with PixelIsPoint, you would request e.g. lowerLeft.lon + degPerCell * k .. lowerLeft.lon + degPerCell * (k + numCells-1).

However sometimes duplication is actually desired (e.g. in our GNOSIS Map Tiles we use a 257x257 grid for coverages, which allows you to generate a triangular mesh of the whole area without depending on the adjacent tiles). So this behavior actually makes a lot of sense to me: you get all the data within the extent -- that's what subset is about!

jerstlouis commented 3 years ago

@pebau

discussion seems to think of regular grids only. What would it mean in case of irregular grids?

I still struggle to understand irregular grids, so focusing on regular grids for now :) We can see whether and how this applies to irregular grids after we can agree on how to handle regular grids.

discussion seems to focus on Lat/Long. What about pixel-in-X for time etc?

I see no reason why this logic couldn't apply equally well to instant time vs. average over a period of time (or other dimensions).

why do we assume that pixels have clearly delineated polygons and do not overlap? From sensor physics I'd rather think that generally a pixel is obtained through a weighted integral, conceptually going from the CCD element's center to infinity, but at least involving some neighbourhood possibly larger than the distance to the next element. pixel-in-corner...in what corner? 2D as 4 corners, 3D has 8 corners, etc. And, as we are going: why in some corner, and not at 75%? Supporting just one corner seems random.

I find the area vs. point clearer than talking about 'corners' -- there's no 'corner' if we're talking about a value representing a single point? (See also my comments on GeoPackage Tiled Gridded extension grid-value-is-corner vs. grid-value-is-center at https://github.com/opengeospatial/ogcapi-coverages/issues/92#issuecomment-770170261 -- it's an offset in disguise, with no additional information about the span of the measurement) However as suggested earlier, independently for each axis, area vs. point could be represented by a span factor ranging from 0 for point, to 1 for area, with values in between. 1 would indicate that the value is representative of a whole 'resolution' unit around the value's corresponding position. Handling values in-between is a bit tricker while keeping the concept of the envelope/extent correctly mapping to the exact area represented, as the extent along one axis would not be a multiple of the resolution. e.g. if the span factor is 0.3, then the extent would be ((numCells-1) + 0.3) * distanceBetweenCells. For PixelIsPoint (span=0), it's (numCells-1) * distanceBetweenCells. For PixelIsArea, (span=1) it's numCells * distanceBetweenCells.

why do we assume that the value of a pixel is spatially extended and constant over some neighbourhood? Interpolation will yield different values even close to a pixel. The pixel areas discussed seem to apply only to nearest-neighbour interpolation where "nearest" already is interesting in itself (cf. irregular grids, krieging details, etc.).

We are not making this assumption in terms of interpolating an existing coverage which has precise point values (I was talking about super-smooth interpolation in examples above). But we are trying to have awareness of when the coverage native data values already are averages of a certain length/area/volume/hypervolume.

jerstlouis commented 3 years ago

I have prepared this presentation to better illustrate my suggestions and concerns:

Grid Coverage Cell Span.pdf

An interesting observation on Slide 5 is that:

span = (CRSAxis::upperBound - CRSAxis::lowerBound) / resolution - (IndexAxis::upperBound - IndexAxis::lowerBound) e.g. Point (span = 0): (33 - 32) / 0.25 - (4-0) = 0 Area (span = 1): (33 - 32) / 0.25 - (3-0) = 1

although I wonder if in practice current CIS definitions and WCS services would be compatible with this approach to determine the span?

jerstlouis commented 3 years ago

SWG 2021-04-21: Motion to adopt a clarification on how to determine the information whether coverage cells represent a point or area (or overlapping cells with a gap) for CIS 1.1 GeneralGridCoverage by comparing the lower and upper bound of the CRS axis against the lower and upper bound of the matching index (gridspace) axis.

We will also discuss how the traditional GMLCov coverage definition of CIS 1.0 maps to the GeneralGridCoverage, and to what extent this point vs. area distinction can be transposed to that form, and which cases can be expressed losslessly as a GeneralGridCoverage (likely not possible in the opposite direction, to confirm). This information and guidance as examples is important for users to fully grasp what can be expected from a CIS 1.1 document.

Motion: Jerome Second: Stephan Discussions: NOTUC.

jerstlouis commented 2 years ago

Regarding the above note on GMLCOV / CIS 1.0, it provides no mechanism to distinguish between ValueIsPoint or ValueIsArea. However, the current GeoTIFF Coverage Encoding Profile (OGC 12-100r1) discusses the topic, since GeoTIFF does (also clarifying that lack of distinction by GMLCOV):

Note: GMLCOV does not store the raster type i.e. PixelIsArea or PixelIsPoint information in its current version. However, Requirement 9 above specifies that the gml:boundedBy element shall respect the raster type i.e. it shall include the half pixel border in case of PixelIsArea (see also Figure 1) but not in case of PixelIsPoint. In other words, in case of PixelIsArea the gml:boundedBy element is decreased by the half of both gml:offsetVector elements in the gml:lowerCorner coordinate and increased by the half of both gml:offsetVector elements in the gml:upperCorner coordinate compared to the case of PixelIsPoint. The gml:origin element stays the same independently of the raster space.

This means that for the same number of data cells along one axis, the gml:RectifiedGrid inside gml:domainSet will be exactly the same, regardless of whether the cells represent a single point or a sample area. The gml:Envelope inside gml:boundedBy however will encompass the half-resolution padding in the case of PixelIsArea, but not for PixelIsPoint. This provides a mechanism to store the area vs. point for gml:RectifiedGrid information directly in the GML, although this requires the presence of the gml:Envelope, and this requirement might be specific to the GeoTIFF Coverage Encoding Profile.

The main problems with this half-pixel approach for ValueIsArea are:

The DomainSet of the coverage does not cover the full range of direct positions for which one would expect to be able to retrieve values, e.g. the DomainSet spanning -179.5..179.5 degrees longitude for a world-wide coverage at 1.0 degrees/pixel resolution does not cover -179.75 which should really be part of the coverage's domain.
The bounds of the DomainSet differ from the Envelope
It relies on the presence of the Envelope to infer whether the cells are measurements at a particular point or for a sample space
It should be possible to retrieve values at a direct position not falling exactly on the center of sample cells without a need to involve interpolation -- the values are already well defined for the extent of the cell. This assumes a square cell of course, but that is actually the case for many imagery products which consist of square sample spaces produced at much lower resolution than one at which the optical characteristics of the sensors result in more complex shapes coming into play.
For ValueIsArea coverages spanning the exact same extent, the DomainSet spatial axes bounds oddly change based on the resolution, including if the same coverage is downsampled or upsampled, which is particularly confusing for services, viewer and processing tools able to perform such interpolation.

Therefore I suggest that a new GML GeoTIFF encoding profile should support GeneralGridCoverage (not currently one of the supported coverage types), and allow for carrying the distinction directly in the DomainSet which would always correspond to the Envelope regardless of whether values represent points or areas, without relying on the presence of that envelope.

Representing the ValueIsArea / ValueIsPoint distinction in legacy CIS 1.0 / GMLCOV can still be done by encoding and comparing the gml:boundedBy / gml:Envelope against the gml:domainSet / gml:RectifiedGrid (much like the new CIS 1.1 GeneralGridCoverage approach above comparing the spatial axes with the grid axes), as laid out in the current GeoTIFF Coverage Encoding Profile.

In summary, the clarification about distinct DomainSets for spatial / temporal axes for ValueIsArea vs. ValueIsPoint should apply only to the new CIS 1.1 GeneralGridCoverage domainsets, and not to those inherited from CIS 1.0 such as gml:RectifiedGrid to ensure compatibility.

@Schpidi

pebau commented 2 years ago

Therefore I suggest that a new GML GeoTIFF encoding profile should support

hm, what is "GML GeoTIFF"?

Friendly amendment: enhance the GeoTIFF Coverage encoding spec, maybe by adding a new conformance class.

@Schpidi has written the GeoTIFF extension originally, so he will know best how to update.

Schpidi commented 2 years ago

Thanks @jerstlouis for the thorough analysis. I agree with your assessment and suggestions. Whom would we need to ask to initialize a GitHub repo with the current version in Asciidoc to start working on an update?

jerstlouis commented 2 years ago

@pebau

hm, what is "GML GeoTIFF"?

By "the new GML GeoTIFF encoding profile" I meant the successor to the current OGC® GML Application Schema - Coverages - GeoTIFF Coverage Encoding Profile.

Friendly amendment: enhance the GeoTIFF Coverage encoding spec, maybe by adding a new conformance class.

As I suggested before, perhaps this could be defined by specifying GeoTIFF tags encoding the CIS 1.1 GeneralGrid DomainSet/RangeType information directly inside the GeoTIFF (and making a better association with TIFF bands for temporal axes/range value fields), instead of relying on a separate GML files, and it could then be called something like "OGC GeoTIFF CIS 1.1 Profile" or something similar.

@Schpidi @gbuehler might be able to help us to set up an ASCIIDoc-populated GitHub repo for this new GeoTIFF coverage specification.

pebau commented 2 years ago

ok, looking forward to an "upgrade" of "CIS GeoTIFF"! Should we include GeoTIFF activists in the task force?

jerstlouis commented 2 years ago

@pebau Exciting! For sure :)

opengeospatial / coverage-implementation-schema

Which pixels to select when subset does not match pixel boundaries #6