stac-extensions / classification

Describes categorical values and bitfields to give values in a file a certain meaning (classification).
Apache License 2.0
11 stars 3 forks source link

How to define `value` in the Class Object? #9

Closed m-mohr closed 2 years ago

m-mohr commented 2 years ago

Right now it's defined as any to potentially allow ranges or lists.

The basic use case is scalars of course.

If we allow arrays, what should that mean? A list of values? A range?

If we want to allow ranges, would it be better to use what Collection summaries uses? i.e. an object with minimum and maximum properties?

pjhartzell commented 2 years ago

If we allow arrays, what should that mean? A list of values? A range?

I believe the original intent was to allow both a list of values and a range object similar to that used in Collection summaries. In addition to the basic use case of a single value.

m-mohr commented 2 years ago

Yeah, classifing ranges for floats would be useful. I guess the easiest would be to simply adopt the syntax of Collection summaries...

pjhartzell commented 2 years ago

Agreed.

drwelby commented 2 years ago

To me ranges and arrays seems anti-pattern - the point of classifying data is usually to summarize a range of values into unique groups. Using this extension to define interpretations data ranges doesn't seem like a descriptive use of metadata.

drwelby commented 2 years ago

At some point I could see a similar use for data dictionaries in tabular data, for example a road dataset that has a surface type that codes short one or two-letter strings to categories. So should we also include strings for class values, or should we expect a tabular field classification extension to copy the Class and its fields but have different requirements for allowed types?

matthewhanson commented 2 years ago

Are there any examples of data that uses a range of values to describe a single class? It could be useful to describe hierarchical landcover data. For instance values 1-10 are vegetation, 11-15 urban, but with individual values having a specific land type. This could be modeled as two different assets that have the same href, but different classification values.

Then again, I don't have a specific example of this being used. @pgadomski was working with multiple class values for MODIS, I wonder if this extension would have utility there?

drwelby commented 2 years ago

For land cover it seems like allowing a range is more "defining a classification scheme" than "describing how data was classified". But if useful, I don't feel like it should be disallowed.

pjhartzell commented 2 years ago

I feel like I keep going in the same circle on this (use of arrays and range objects). I don't like it per the reason that @drwelby notes above, but I also can see how it could be useful/desired. My opinion is to allow it and put this to rest.