radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
794 stars 178 forks source link

best practices: consider increase the constraints on field/id naming? #1108

Closed fredliporace closed 3 years ago

fredliporace commented 3 years ago

From the gitter channel:

@fredliporace

The best practices suggest to use lowercase chars, - and for fields and IDs. If the idea is to limit the possible combinations for searching why not recommend only one of either - or ?

@m-mohr

I don't think it was considered, but sounds useful. I suggest to open an issue for that extension? The best practices suggest to use lowercase chars, - and _ for fields and IDs.

I must admit I don't fully understand the choice of characters, too. For fields I'd actually not recommend the - char, we usually only use alphanumerical + _ + for prefixes :. - seems impractical for code generation and such...

For IDs I guess the idea was to allow characters that work in file paths, too, as the recommendation is to use the ID as file name for items. Then - and _ seem reasonable though.

m-mohr commented 3 years ago

Actually, I think your understanding of the spec is wrong, @fredliporace.

The spec says:

When defining unique fields for search, like constellation or platform, it is recommended that the value consist of only lowercase characters, numbers, _, and -. Examples include sentinel-1a (Sentinel-1), landsat-8 (Landsat-8) and envisat (Envisat). This is to provide consistency for search across Collections, so that people can just search for 'landsat-8', instead of thinking through all the ways providers might have chosen to name it.

That just means the values for some fields should be restricted to a specific set of values. This doesn't imply anything directly towards field names or ID values. It seems confusing though that these things are listed in the chapter "### Field and ID formatting". We should make better headings.

m-mohr commented 3 years ago

Made a PR: #1110

fredliporace commented 3 years ago

@m-mohr my point is that, for example:

This is to provide consistency for search across Collections, so that people can just search for landsat-8, instead of thinking through all the ways providers might have chosen to name it.

In fact, the search would have to be for landsat-8, landsat_8 and landsat8. Fixing the best practices to either _ or - would reduce the number of possibilities without sacrificing too much. Of course we would still have ls8 but that is another problem.

m-mohr commented 3 years ago

Yes, I agree.