Closed m-mohr closed 2 years ago
I'm good with this.
@gadomski can you expand on your comments in https://github.com/stac-extensions/classification/issues/28#issuecomment-1151639912?
Sure, one use case would be an STAC-backed API that calculates statistics using a land cover data set. If you only want to calculate statistics for raster cells classified as "urban", you might send a query like GET /api/stats?class=urban
. By making name
required, we ensure that all classes have a "machine readable" name to use as a query parameter.
Well, I mean it doesn't say what machine-readable actually means. So these are valid names so the question is more whether we want to actually define this and restrict the allowed characters in the schema for the name? @gadomski
Something like ^[\w\d-\.]+$
?
Well, I mean it doesn't say what machine-readable actually means
True, but if we don't restrict the text field in the schema, then at least the README should reflect what we think it should look like?
I don't think we should restrict the name
field, since there's no technical requirement that (e.g.) query parameters can't contain spaces. However, it might be good to provide some text suggesting best practices practices for names, maybe something like this:
The
name
field should be picked to be short and easily understandable without domain-specific knowledge. For example, prefer "urban" instead of "urban areas". Because thename
field could be used to filter data (for example, in a HTTP query parameter), prefer hyphens to spaces, e.g. use "deciduous-forest" instead of "deciduous forest". Avoid the use of acronyms. Use thedescription
field to provide more verbose information about the class.
But I could be talked into a regex-based check, curious what others think.
Some additional thoughts: I'm right now thinking about other use cases and a human-readable title like thing would also be good for legends. We could argue that the name would also be a valid use case for that. You probably don't want to use the description for legens on maps etc, right?
I'm happy to go without a regex and just adopt your proposal, too. I have no strong preference.
If we want to cover more use cases we could consider making classes an object with machine readable keys and name/description in an object as done with assets.
Or perhaps we add "title" as an optional short name for legends, etc.
If we want to cover more use cases we could consider making classes an object with machine readable keys and name/description in an object as done with assets.
IMO this is more intuitive, since classes are unordered (they are often ordered by value
but that's convention, not structural). I did just read https://github.com/stac-extensions/classification/issues/13 to get background on why a list was picked. I personally like how assets work in STAC (they're a dictionary), since it allows for direct, unambiguous fetching of the asset you want (provided you know the key), e.g.
data = item.assets["data"]
I looked at how the Planetary Computer is using the classification extension, and they're used as legend text (aligning well with @m-mohr's thinking):
Do we have other real-world usages of the extension we can use to inform this discussion?
In the Maxar ARD Python SDK we have tile objects that are abstractions of STAC Items that represents an acquisition and files generated from it. So you can do something like:
for tile in my_s3_storage:
if not tile.cloud_mask.intersects(my_geometry):
display(tile)
We have some accessor magic that loads the assets, but what we'd like to be able to do is extend the magic to also be able to read classes to be able to test against them too . In the case of the cloud mask, we use the classification extension to describe its two classes: clouds and cloud shadows. So we'd want the machine-readable name to be able to do:
if not tile.cloud_mask.clouds.intersects(my_geometry):
So we'd want the machine-readable name to be able to do:
if not tile.cloud_mask.clouds.intersects(my_geometry):
In this case, you need more than "machine-readable", you need "valid attribute name in Python", yes? I think this would be an argument for @m-mohr's use-a-regex-to-validate-name approach, so you could s/-/_/
to get your attribute name.
We already handle making names valid in Python, we just wouldn't want someone else to get stuck having to build a similar solution that has to take a long description and make it sane to write code around.
if not tile.land_cover.herbaceous_shrubbery_not_including_wetland__and_other_broadleaf_grasses.intersects(geom)
We discussed this issue in the STAC call today:
@m-mohr 👍 works for me, merge whenever you are ready...
Implements #28