stac-extensions / classification

Describes categorical values and bitfields to give values in a file a certain meaning (classification).
Apache License 2.0
11 stars 3 forks source link

`name` is required rather than `description` #28 #35

Closed m-mohr closed 2 years ago

m-mohr commented 2 years ago

Implements #28

pjhartzell commented 2 years ago

I'm good with this.

drwelby commented 2 years ago

@gadomski can you expand on your comments in https://github.com/stac-extensions/classification/issues/28#issuecomment-1151639912?

gadomski commented 2 years ago

Sure, one use case would be an STAC-backed API that calculates statistics using a land cover data set. If you only want to calculate statistics for raster cells classified as "urban", you might send a query like GET /api/stats?class=urban. By making name required, we ensure that all classes have a "machine readable" name to use as a query parameter.

m-mohr commented 2 years ago

Well, I mean it doesn't say what machine-readable actually means. So these are valid names so the question is more whether we want to actually define this and restrict the allowed characters in the schema for the name? @gadomski Something like ^[\w\d-\.]+$?

gadomski commented 2 years ago

Well, I mean it doesn't say what machine-readable actually means

True, but if we don't restrict the text field in the schema, then at least the README should reflect what we think it should look like?

I don't think we should restrict the name field, since there's no technical requirement that (e.g.) query parameters can't contain spaces. However, it might be good to provide some text suggesting best practices practices for names, maybe something like this:

The name field should be picked to be short and easily understandable without domain-specific knowledge. For example, prefer "urban" instead of "urban areas". Because the name field could be used to filter data (for example, in a HTTP query parameter), prefer hyphens to spaces, e.g. use "deciduous-forest" instead of "deciduous forest". Avoid the use of acronyms. Use the description field to provide more verbose information about the class.

But I could be talked into a regex-based check, curious what others think.

m-mohr commented 2 years ago

Some additional thoughts: I'm right now thinking about other use cases and a human-readable title like thing would also be good for legends. We could argue that the name would also be a valid use case for that. You probably don't want to use the description for legens on maps etc, right?

I'm happy to go without a regex and just adopt your proposal, too. I have no strong preference.

drwelby commented 2 years ago

If we want to cover more use cases we could consider making classes an object with machine readable keys and name/description in an object as done with assets.

Or perhaps we add "title" as an optional short name for legends, etc.

gadomski commented 2 years ago

If we want to cover more use cases we could consider making classes an object with machine readable keys and name/description in an object as done with assets.

IMO this is more intuitive, since classes are unordered (they are often ordered by value but that's convention, not structural). I did just read https://github.com/stac-extensions/classification/issues/13 to get background on why a list was picked. I personally like how assets work in STAC (they're a dictionary), since it allows for direct, unambiguous fetching of the asset you want (provided you know the key), e.g.

data = item.assets["data"]

I looked at how the Planetary Computer is using the classification extension, and they're used as legend text (aligning well with @m-mohr's thinking):

image

Do we have other real-world usages of the extension we can use to inform this discussion?

drwelby commented 2 years ago

In the Maxar ARD Python SDK we have tile objects that are abstractions of STAC Items that represents an acquisition and files generated from it. So you can do something like:

for tile in my_s3_storage:
    if not tile.cloud_mask.intersects(my_geometry):
       display(tile)

We have some accessor magic that loads the assets, but what we'd like to be able to do is extend the magic to also be able to read classes to be able to test against them too . In the case of the cloud mask, we use the classification extension to describe its two classes: clouds and cloud shadows. So we'd want the machine-readable name to be able to do:

    if not tile.cloud_mask.clouds.intersects(my_geometry):
gadomski commented 2 years ago

So we'd want the machine-readable name to be able to do:

if not tile.cloud_mask.clouds.intersects(my_geometry):

In this case, you need more than "machine-readable", you need "valid attribute name in Python", yes? I think this would be an argument for @m-mohr's use-a-regex-to-validate-name approach, so you could s/-/_/ to get your attribute name.

drwelby commented 2 years ago

We already handle making names valid in Python, we just wouldn't want someone else to get stuck having to build a similar solution that has to take a long description and make it sane to write code around.

if not tile.land_cover.herbaceous_shrubbery_not_including_wetland__and_other_broadleaf_grasses.intersects(geom)
m-mohr commented 2 years ago

We discussed this issue in the STAC call today:

drwelby commented 2 years ago

@m-mohr 👍 works for me, merge whenever you are ready...