stac-extensions / classification

Describes categorical values and bitfields to give values in a file a certain meaning (classification).
Apache License 2.0
11 stars 3 forks source link

Classification Extension Specification

This document explains the Classification Extension to the SpatioTemporal Asset Catalog (STAC) specification.

Note that Classification in this context is about providing semantic information on the pixel content. It does not relate in any way to security classification or other confidentiality labeling.

Classification Types

Field Name Type Description
classification:classes [Class] Classes stored in raster or bands
classification:bitfields [Bit Field] Classes stored in bit fields in the raster

classification:classes is for when one or more unique coded integer values are present within a raster asset or band therein. These coded values translate to classes of data with verbose descriptions.

An example would be a cloud mask raster that stores values that represent image conditions in each pixel.

classification:bitfields is for classes that are stored in fields of continuous bits within the pixel's value. Files using this strategy are commonly given the name 'bit mask' or 'bit index'. The values stored are the integer representation of the bits in the field when summed as an isolated string. Bits are always read right to left. The position of the first bit in a field is given by its offset. Therefore the first (rightmost) bit is at offset zero.

These classification objects can be used in the following places:

Bit Field Object

Describes multiple classes stored in a field of a continuous range of bits

Field Name Type Description
offset integer REQUIRED. Offset to first bit in the field
length integer REQUIRED. Number of bits in the field
classes [Class] REQUIRED. Classes represented by the field values
name string Short name of the class for machine readability. Must consist only of letters, numbers, -, and _ characters.
description string A short description of the classification. CommonMark 0.29 syntax MAY be used for rich text representation.
roles [string] see Asset Roles

A Bit Field stores classes within a range of bits in a data value. The range is described by the offset of the first bit from the rightmost position, and the length of bits used to store the class values.

Since bit fields are often used to store data masks, they can also use optional STAC roles to identify their purpose to clients.

Following is a simplified example a bit field scheme for cloud data using 4 bits. The bits are broken into 3 bit fields.

3210
||||
...X   - 1 here means "no data", 0 means "valid data"
..X.   - 1 here means "cloud pixel", 0 means "clear pixel"
XX..   - these two bits are "cloud confidence" and give 4 classes (binary 00, 01, 10, and 11)

Working right to left, these four bits represent 3 bit fields:

To extract the values in a bit field from a value called data, you typically would use the expression:

data >> offset & (1<<length) - 1

This does:

An example of finding the cloud confidence class value from the 4 bit example above for the data value of integer 6:

The key distinction with bit fields from other types of bit masks is that the bits in the field are summed as standalone bits. Therefore 01.. cloud confidence class uses the value of 1, not 4 (binary 0100)

For a real world example, see Landsat 8's Quality raster.

Class Object

Describes a data class

Field Name Type Description
value integer REQUIRED. Value of the class
name string REQUIRED. Short name of the class for machine readability. Must consist only of letters, numbers, -, and _ characters.
title string Human-readable name for use in, e.g., a map legend.
description string Description of the class. CommonMark 0.29 syntax MAY be used for rich text representation.
color_hint string Suggested color for rendering (Hex RGB code in upper-case without leading #)
nodata boolean If set to true classifies a value as a no-data value, defaults to false
percentage number The percentage of data values that belong to this class in comparison to all data values, in percent (0-100).
count integer The number of data values that belong to this class.

Class objects enumerate data values and their corresponding classes. A cloud mask raster could contain the following four classes:

color_hint only is intended to hint a reasonable color for clients to use and is not intended to define styling. For example, the ESA landcover datasets use "color_hint":"006400" to suggest using a green color for a class of "Tree cover".

For conveying styling see the Raster Extension and cpt2json for discussion on passing styling as an item asset instead.

Contributing

All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.

Running tests

The same checks that run as checks on PRs are part of the repository and can be run locally to verify that changes are valid. To run tests locally, you'll need npm, which is a standard part of any node.js installation.

First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run:

npm install

Then to check markdown formatting and test the examples against the JSON schema, you can run:

npm test

This will spit out the same texts that you see online, and you can then go and fix your markdown or examples.

If the tests reveal formatting problems with the examples, you can fix them with:

npm run format-examples