radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
777 stars 178 forks source link

Collection summaries including Asset-only fields #1153

Open lossyrob opened 3 years ago

lossyrob commented 3 years ago

The collection spec states:

Collections are strongly recommended to provide summaries of the values of fields that they can expect from the properties of STAC Items contained in this Collection.

One interpretation of this is that summaries should only include properties of STAC Items, which would exclude properties that might only exist on the Asset objects of the Item. However, it would be useful to include Asset-only properties in the summaries - for example, in the file extension, there is a file:values property that contains useful information about the classification values of a raster (though this will likely move to another extension). Having a summary at the Collection level for this asset-only property would be useful to allow users to know the classmap without having to dig into an Item.

Should asset-only properties be allowed in Collection summaries? If so, should they be treated the same as Item properties - with the property existing as a top level property in summaries (e.g. "summaries": { "file:values": [ ... ] }? What about if multiple assets have the same property value; should there be something which differentiates between the assets whose properties are being summarized?

I'd vote for treating Asset-only properties the same as Item properties; if there are multiple Assets that implement a property, then both could be summarized without differentiating, and the summary would remain valid IMO.

duckontheweb commented 3 years ago

Summaries of asset properties (and possible link properties) would be useful for us in the Radiant MLHub API.

We have heard from a few users that it would be nice to have a collection-level summary of the file formats of the source imagery so that users could search for collections based on formats that fit into their existing workflows. We utilize the Label Extension and typically have separate source imagery and label collections. In the source imagery collection, we would probably want to summarize the Asset media type (type property) of any data assets in the collection. One problem we might run into here is distinguishing between Item- and Asset-level properties of the same name (e.g. type). There is obviously no reason to summarize the Item-level type property (since it will always be "Feature"), but we may still want a mechanism to make it clear in these kinds of cases.

What about if multiple assets have the same property value; should there be something which differentiates between the assets whose properties are being summarized?

Not sure if this is the same issue, but in our case we would only be summarizing the media type for "data" assets (i.e. not thumbnails and other assets), so it might be good to indicate this somehow in the summaries. I'm not sure what the best/clearest way is to do this, though.

This probably goes beyond the scope of what this issue is discussing, but ideally we would have a way of summarizing the Asset media type (type property) associated with the assets listed in the label:assets property of Links with "rel": "source" as defined in the Label Extension "Links: source imagery" section. This is a bit tricky since it would requires getting the label:assets property from links in a label collection and then summarizing the media type of the assets associated with the Items that those Links point to (which could be in a totally different collection).

m-mohr commented 3 years ago

Isn't this (partially?) what the Item Asset Definition Extension is about? https://github.com/stac-extensions/item-assets

dwsilk commented 2 years ago

One problem we might run into here is distinguishing between Item- and Asset-level properties of the same name (e.g. type). There is obviously no reason to summarize the Item-level type property (since it will always be "Feature"), but we may still want a mechanism to make it clear in these kinds of cases.

Yes I think this mechanism is necessary, for example the Common Metadata notes that created and updated can be used against both Items and Assets, and there may be a need to summarise both. Raised via https://github.com/radiantearth/stac-spec/discussions/1156.

billgeo commented 2 years ago

We are thinking of progressing this in our custom stac extensions, but just wanted to check if we are doing something that would align to future stac core changes as much as possible. Can I get some feedback on this way of summarising asset metadata?

Asset created and updated summary could look like this in the collection.json.

   "summaries": {
    "assets": {
      "created": {
        "minimum": "1901-01-01T00:00:00Z",
        "maximum": "1920-01-01T00:00:00Z"
      },
      "updated": {
        "minimum": "1901-01-02T00:00:00Z",
        "maximum": "1920-01-02T00:00:00Z"
      }
    }
  }

And with some other item properties it would look like this.

   "summaries": {
    "assets": {
      "created": {
        "minimum": "1901-01-01T00:00:00Z",
        "maximum": "1920-01-01T00:00:00Z"
      },
      "updated": {
        "minimum": "1901-01-02T00:00:00Z",
        "maximum": "1920-01-02T00:00:00Z"
      }
    },
    "platform": ["Fixed-wing Aircraft"],
    "instruments": ["EAGLE IV"],
    "created": {
      "minimum": "1999-01-01T00:00:00Z",
      "maximum": "2010-01-01T00:00:00Z"
    },
    "updated": {
      "minimum": "1999-01-02T00:00:00Z",
      "maximum": "2010-01-02T00:00:00Z"
    } 
  },
m-mohr commented 2 years ago

So, I think the assets key with summaries one level below would "break" a lot of tooling that would expect JSON Schema in there. It would certainly not work well in STAC Browser at least and I think implementations would have a hard time differentiating between the new extension and what is allowed right now. I'd recommend putting the asset summaries into a new field. Or maybe it would better fit with the Item Asset definition extension? @matthewhanson In the extension you can only set a specific value and not summarize, but maybe it would be worth extending that instead of mangling with the summaries that were meant for item properties?

The item properties as you show them are already supported, that's no problem.

billgeo commented 2 years ago

Thanks @m-mohr. How would this work as a new field? Happy to suggest it as a pull request on the Item Asset extension if that will help?

  "asset_summaries": {
      "created": {
        "minimum": "1901-01-01T00:00:00Z",
        "maximum": "1920-01-01T00:00:00Z"
      },
      "updated": {
        "minimum": "1901-01-02T00:00:00Z",
        "maximum": "1920-01-02T00:00:00Z"
      }
    },
    "summaries": {
      "platform": ["Fixed-wing Aircraft"],
      "instruments": ["EAGLE IV"],
      "created": {
        "minimum": "1999-01-01T00:00:00Z",
        "maximum": "2010-01-01T00:00:00Z"
      },
      "updated": {
        "minimum": "1999-01-02T00:00:00Z",
        "maximum": "2010-01-02T00:00:00Z"
      } 
    },
m-mohr commented 2 years ago

We'd likely need a larger discussion whether it makes sense to have it in the Item Asset Definition Extension, e.g. on one of the Monday calls. Feel free to join, if you can although I guess TZ differences make it hard for you. I can put that on the agenda and discuss it for you, if you can't join.

Otherwise, you could likely come up with a new extension that looks similar to what you've shown above.

billgeo commented 2 years ago

If I can make it, I will. I think my email address is public if you want to send an invite there? Otherwise, if I'm not there, please put it on the agenda for me. Thanks.

m-mohr commented 2 years ago

Okay, @matthewhanson can you invite @billgeo, please?

m-mohr commented 2 years ago

Participation in the call was low this week so we postponed it to the next meeting, but in general people agreed on one of the approaches mentioned above. We identified that having it as part of the Item Asset Definition extension could lead to validation issues in JSON Schema.

m-mohr commented 4 months ago

Discussed in the STAC call today. We think it shouldn't be in summaries or item asset definition due to ambiguities that occur from it. It should probably a new extension and not be in core yet.