stac-extensions / raster

Describes raster assets at band level (one or multiple) with specific information such as data type, unit, number of bits used, nodata.
Apache License 2.0
13 stars 7 forks source link

Multiple no-data values #33

Closed m-mohr closed 1 year ago

m-mohr commented 2 years ago

I have GRIB files from NOAA MRMS QPE, which can have the following values:

No-Data: -1: missing value -3: no coverage

Valid values: >= 0 (in mm)

I'm trying to describe this with STAC extensions, but are somewhat failing:

This is unfortunate, because using an old extension version that has no upgrade path seems unfortunate. What do the extension authors think? How/where could we address this? @drwelby @pjhartzell @emmanuelmathot For me it seems like allowing an array of no-data values in raster extension as an upgrade path from the old file extension like the best fit.

pjhartzell commented 2 years ago

This is a tough one. My initial reaction is that I don't like allowing multiple nodata values in the raster extension for a few reasons:

  1. It opens the door for a disconnect between the actual raster files and the raster extension. For example, when working with GeoTIFFs/COGs, they only allow one nodata to be set. So if you specify several nodata values in the raster extension, what value do you set for the TIFF nodata? Or do you not set the TIFF nodata value it in this case?
  2. If multiple nodata values are specified, the natural question is why? So some type of description would be valuable. This starts to feel like classification.

Another use-case example is the VIIRS snow product, VNP10A1, which has a valid data range from 0-100. In addition, there are 8 different flags for nodata, i.e., 8 unique pixel values (all greater than 200) which designate why there is no data in a pixel: no_decision, night, lake, ocean, cloud, missing_data, bowtie_trim, and L1B_fill. It looks kinda like classification in this (extreme) case, except that the extension does not allow for a range of values to be used for the valid data (range 0-100).

Short of splitting the data into two files (one for the valid data, another with the multiple nodata flags/values), a change to the raster or classification extension seems necessary.

drwelby commented 2 years ago

To confirm: is -3 (no coverage) the equivalent to nodata pixels that fill the rectangular pixels extents outside the coverage area? And -1 (missing value) is within the coverage area?

m-mohr commented 2 years ago

That how I'd understand it, yes. @drwelby

I'm not as critical regarding the no-data values array in raster. Raster should not be GeoTiff specific so if there's a format out there that can have multiple no-data values, then it should be covered by the extension. In the end, you need to make good decisions anyway and as such the metadata should just reflect what is in the file. And you also don't always to change the original data as STAC is often just an addition to what is already out there.

An alternative would indeed be to introduce somethline likes a "role(s)" string or array in the classification extension, where we could pre-define a no-data role.

pjhartzell commented 2 years ago

And you also don't always to change the original data as STAC is often just an addition to what is already out there.

Yeah, agreed on this. There should be a way to describe (with STAC) the data in its native format (GRIB, HDF, etc).

drwelby commented 2 years ago

An alternative would indeed be to introduce somethline likes a "role(s)" string or array in the classification extension, where we could pre-define a no-data role.

This sounds promising.

pjhartzell commented 2 years ago

An alternative would indeed be to introduce somethline likes a "role(s)" string or array in the classification extension, where we could pre-define a no-data role.

I'm not clear on this. I guess I need to see a concrete example.

m-mohr commented 2 years ago

Here's an example for the usecase above:

      "raster:bands": [
        {
          "spatial_resolution": 1000,
          "data_type": "float64",
          "unit": "mm",
          "classification:classes": [
            {
              "value": -1,
              "roles": ["nodata"],
              "description": "Missing value"
            },
            {
              "value": -3,
              "roles": ["nodata"],
              "description": "No coverage"
            },
            {
              "value": [0, null], # >= 0
              "roles": ["data"],
              "description": "Precipation in mm"
            }
          ]
        }
      ],

I'm not sure what other roles it could have. If it' just these two roles, it could also be a simple boolean flag "nodata": true instead of "roles": ["nodata"], but in the past it has shown that usually have other use-cases and it's good to keep things flexible.

drwelby commented 2 years ago

Since only nodata values are classified, maybe something like:

"nodata:classes": [
            {
              "value": -1,
              "description": "Missing value"
            },
            {
              "value": -3,
              "description": "No coverage"
            }

or

 "nodata": -3 #classic interpretation of no data coming into the system, ie fill
 "nodata:classes": [
            {
              "value": -1, # no data coming out of the system
              "description": "Missing value"
            }
m-mohr commented 2 years ago

Why not the more general-purpose approach? I don't see the benefit of your proposal.

Also, this is the really expressive way. I'd also be happy with a simple "nodata": [-1, -3] in raster and then use classification:classes in addition.

pjhartzell commented 2 years ago

I'd also be happy with a simple "nodata": [-1, -3] in raster and then use classification:classes in addition.

I like this option. It keeps nodata explicitly defined in the Raster Band Object.

      "raster:bands": [
        {
          "spatial_resolution": 1000,
          "data_type": "float64",
          "nodata": [-1, -3],
          "unit": "mm",
          "classification:classes": [
            {
              "value": -1,
              "description": "Missing value"
            },
            {
              "value": -3,
              "description": "No coverage"
            },
            {
              "value": [0, null], # >= 0
              "description": "Precipitation in mm"
            }
          ]
        }
      ],
m-mohr commented 2 years ago

Yes. The last element in classification:classes would need to be removed right now (or needs a change in classification).

pjhartzell commented 2 years ago

Is [0, null] form standard for >= 0? Or would we use a "range object"?

m-mohr commented 2 years ago

This is inspired by the open ended ranges as done for the extents in collections. I'm open to any of those options though. And we don't necessarily need to have it now.

drwelby commented 2 years ago

I really don't like calling a continuous data range a "class", but we keep running into range-type objects. So maybe we need a more generic value(s)-to-concepts extension. values:mapping that's a superset of classification:classes.

pjhartzell commented 2 years ago

I really don't like calling a continuous data range a "class"

Agreed. But it was a topic when the classification extension was being put together a few months ago. I think the general thought was to kick the topic down the road until a use-case presented itself. Seems like we are at that moment.

Other than preventing the "abuse" of classification with range values, what other use cases do you see for a generic values:mapping extension?

drwelby commented 2 years ago

Other than preventing the "abuse" of classification with range values, what other use cases do you see for a generic values:mapping extension?

Multiple keys that map to one value is the first one that comes to mind, like [3, 7, 21]: "vegetation"

drwelby commented 2 years ago

Returning to this, do we need something like "flag values", for lack of a better term?

IDRISI files have a metadata field for a "flag value" and a "flag definition" so it made sense to someone.

 "raster:bands": [
        {
          "spatial_resolution": 1000,
          "data_type": "float64",
          "unit": "mm",
          "flag_values": [
            {
              "value": -1,
              "roles": ["nodata"],
              "description": "Missing value"
            },
            {
              "value": -3,
              "roles": ["nodata"],
              "description": "No coverage"
            }
          ]
      }
]
emmanuelmathot commented 2 years ago

Returning to this, do we need something like "flag values", for lack of a better term?

I unfortunately have not had the time to follow carefully all the discussion but isn't it the purpose of the classification extension?

m-mohr commented 1 year ago

The classification extension now has the nodata flag for the Class Object, so this seems to be the way forward.