radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
779 stars 178 forks source link

Protocol Buffer version of STAC for use with gRPC #575

Open davidraleigh opened 5 years ago

davidraleigh commented 5 years ago

Submitting an issue to ask for input on the included Protocol Buffer definitions that attemp to match JSON STAC. If some of you all could review the below tables and give input as to whether this is an acceptable STAC-like implementation, that would be great. I'd love to eventually fold Protobuf STAC into the stac-spec or have it be a community accepted project. Please, let me know what I can do to make that happen.

Why gRPC and Protobufs? gRPC is a high performance micro-service RPC framework that allows bi-directional streaming and uses compact a data formats. Protobuf is the standard compact message data format for gRPC. Protobuf and gRPC are open source Cloud Native Computing Foundation projects. They are originally open sourced by Google and used since 2003. At this time Google executes 10s of billions of RPC messages a second with gRPC and Protobuf, so you can rest assured it's stable.

The repo that holds the proto IDL files and their generated code is here: https://github.com/geo-grpc/api Some documentation generated from the proto files can be found here: https://geo-grpc.github.io/api/#epl%2fprotobuf%2fstac.proto A Python Client can be found here: https://github.com/nearspacelabs/stac-client-python

There are some limitations about how you define Protocol Buffers that prevents a one-to-one match of STAC. Please look at the tables for differences and the lists of explanations beneath each table. The most significant departure is that Properties would be reserved for user defined data that is outside of the STAC specification, and the data defined by the STAC specification would exist directly on the StacItem definition. Other differences include use of a GeometryData protobuf and a preference for enums wherever possible.

STAC Item Comparison

For Comparison, here is the JSON STAC item field summary and the Protobuf STAC item field summary. Below is a table comparing the two:

Field Name STAC Protobuf Type STAC JSON Type
id string string
type NA string
geometry GeometryData GeoJSON Geometry Object
bbox EnvelopeData [number]
properties google.protobuf.Any Properties Object
links NA [Link Object]
assets StacItem.AssetsEntry Map
collection string string
title string Inside Properties
datetime google.protobuf.Timestamp Inside Properties
observation google.protobuf.Timestamp Inside Properties
processed google.protobuf.Timestamp Inside Properties
updated google.protobuf.Timestamp Inside Properties
duration google.protobuf.Duration Inside Properties
eo Eo Inside Properties
sar Sar Inside Properties
landsat Landsat Inside Properties

List of Item Spec differences and explanations:

Eo Comparison

For Comparison, here is the JSON STAC Electro Optical field summary and the Protobuf STAC Electro Optical field summary. Below is a table comparing the two:

JSON Field Name JSON Data Type Protobuf Field Name Protobuf Data Type
eo:gsd number gsd google.protobuf.wrappers.FloatValue
eo:platform string platform Eo.Platform
eo:instrument string instrument Eo.Instrument
eo:constellation string constellation Eo.Constellation
eo:bands [Band Object] bands Eo.Band
eo:epsg integer epsg uint32
eo:cloud_cover number cloud_cover google.protobuf.wrappers.FloatValue
eo:off_nadir number off_nadir google.protobuf.wrappers.FloatValue
eo:azimuth number azimuth google.protobuf.wrappers.FloatValue
eo:sun_azimuth number sun_azimuth google.protobuf.wrappers.FloatValue
eo:sun_elevation number sun_elevation google.protobuf.wrappers.FloatValue

List of Eo Spec differences and explanations:

Asset Comparison

Field Name JSON Data Type Protobuf Data Type
href string string
title string NA
type string string
eo_bands NA Eo.Band
asset_type NA AssetType
cloud_platform NA CloudPlatform
bucket_manager NA string
bucket_region NA string
bucket NA string
object_path NA string
requester_pays NA bool

Catalogs and some other features of STAC have not been implemented. The query language for gRPC can be seen in the [https://geo-grpc.github.io/api/#epl.protobuf.StacRequest] overview. Examples of queries can be seen in the python client: https://github.com/nearspacelabs/stac-client-python#queries

Thank you for reading some or all of this!

m-mohr commented 5 years ago

This seems like great work, although I'm not very familiar with gRPC. First thing I'd think you could do to promote it is to add it (via PR) to the Third Party Vendor Extensions table in the Extensions README file. You could also add it (via PR) to the implementations page.

Then I have a question regarding the Protobuf Field Names of the extensions: We have an EO and SAR extension. Both have a field bands with a different definition. The Protobuf Field Name is bands. Would it be a problem if it's name is also bands for SAR? Or would it make more sense to rename the Protobuf Field Names to include the prefix? For example, eo_bands and sar_bands? Bands is just an example here, I think there's more overlap (although the definitions don't diverge so much).

davidraleigh commented 5 years ago

Thank @m-mohr!

After FOSS4G I'll make a PR to add it to the Third Party Vendor Extensions and another PR to add it to the implementations page.

With the current protobuf definition I've submitted we have the eo and sar fields directly on the StacItem and that helps mimic the JSON-LD name definitions. Accessing bands, platform, instrument, and constellation would look like the follow for eo and for sar:

c_eo = stac_item.eo.constellation
c_sar = stac_item.sar.constellation
b_eo = stac_item.eo.bands
b_sar = stac_item.eo.bands

If we changed the names to the underscore it would look a bit redundant as you can see below:

c_eo = stac_item.eo.eo_constellation
c_sar = stac_item.sar.sar_constellation
b_eo = stac_item.eo.eo_bands
b_sar = stac_item.eo.sar_bands
m-mohr commented 5 years ago

Thank you for clarifying, I didn't catch that there's the scoping between the two dots. Now your approach makes much sense.

cholmes commented 5 years ago

+1 - great work @davidraleigh. And definitely agree that adding it to Extensions makes good sense, starting as a third party extension and could evolve to be included in the STAC extension repo (we have some work to figure out exactly how we sort out where we put and group extensions).

simonff commented 4 years ago

/sub