stac-extensions / storage

Provides additional fields relating to how the asset is stored in the cloud
Apache License 2.0
6 stars 0 forks source link

Proposal for version 2 #24

Closed m-mohr closed 2 weeks ago

m-mohr commented 6 months ago

This PR proposes version 2 of this extension and fixes most issues that are open. It is meant to be a better, more universal and more lightweight approach based on the learnings from v1. It also reworks the extension so that less duplication happens in assets based on the archtiecture in the authentication extension. This is heavily breaking but I think this is required to make this a better extension that can fullfil all the various needs.

Ideally we would define the general approach in this PR and then add the actual best practices per provider in separate PRs. The examples that I have right now are more to show how it's meant to work, not to define actual best practices for S3/AWS/Azure. Otherwise we'll probably never finish the PR itself...

Please review the changes carefully:

See the changelog below:

Added

Changed

Removed

matthewhanson commented 6 months ago

I like the use of references, which is a slightly different approach that we have done in the past and would probably be a good way to define spectral bands. But that's not relevant here.

I'm not sure why storage:refs should be a list though. Each asset has a single URL which would be for a single storage platform. Alternate assets handle the case of the same asset on different storage providers. Is there a reason this isn't just a string?

m-mohr commented 6 months ago

Could be a string, if there's no way something could be hosted with the same URL on different storage providers? Not sure, would something like hosted on AWS US and AWS EU work?

emmanuelmathot commented 3 months ago

Out of discussions with @m-mohr and @matthewhanson :

  1. The platform property should be renamed with the protocol used. For instance "type": "S3"
  2. a new property for extra parameters specific to the protocol (e.g. endpoint_url, requester_pays)
m-mohr commented 3 months ago
  1. The platform property should be renamed with the protocol used. For instance "type": "S3"

That was your proposal, which I didn't agree with yet. Neither did Matt. We need to clarify the details. Just a type seems not enough to transmit information about the storage provider. I just don't understand how this should work with just a protocol. I'd need a more concrete proposal to understand how this may work.

  1. a new property for extra parameters specific to the protocol (e.g. endpoint_url, requester_pays)

Yes, we talked about the parameters, but we did not agree yet to move the existing properties all there. I understood it more as a free-form object for additional properties, i.e. in addition to what is in this PR. Also, this might need more standardization per provider to avoid that it's actually diverging between implementations.

Additionally, we discussed whether storage:refs should be an array or string and that the endpoints discussed above should resolve to something useful, i.e. they should usually be the actual API endpoint, not a generic service webpage or so.

fmigneault commented 3 months ago

Just a type seems not enough to transmit information about the storage provider.

I second this. The type: S3, for example, could be relevant for AWS, Minio, or any other S3-compatible technologies using the same protocol. It doesn't help identify the actual provider.

m-mohr commented 2 months ago

I completely reworked the PR to make a new proposal. This is more a high-level framework now for various providers based on URI templates and other free-form variables. It is meant that the extension itself is just a framework and collections best-practices around various providers, but the providers are not actually enforced or so. Everyone can contribute additional documents for providers easily by just providing short documents with platform URI and variables. It feels like otherwise we'll always go through the hell of aligning, standardizing and are always outdated with every iteration in cloud provider evolutions. Happy to discuss and looking foward to feedback and additional providers. Kept the providers intentionally small as I don't have much experience with others and want to keep this to experienced users.

Ideally we would define the general approach in this PR and then add the actual best practices per provider in separate PRs. The examples that I have right now are more to show how it's meant to work, not to define actual best practices for S3/AWS/Azure. Otherwise we'll probably never finish the PR itself...

@matthewhanson @emmanuelmathot @fmigneault @philvarner

PS: This is my last attempt to make something useful out of this extension. If we can't agree on something in the next weeks I consider this extension to be non-functional for many cases and I'll ditch it for my use cases in favor of a custom extensions per provider.

j-musial commented 4 weeks ago

@philvarner @matthewhanson any updates regarding the reviews?

m-mohr commented 2 weeks ago

I just added a type field for validation / discriminator purposes.

@fmigneault @emmanuelmathot and everyone else: Any last minute comments? I think we can merge soon once @matthewhanson made his review.

matthewhanson commented 2 weeks ago

This looks great, @m-mohr went over it with me this morning. I like the changes, big improvement over 1.0

m-mohr commented 2 weeks ago

If there are still comments, please open issues. I'll release next week. Thanks everyone.

gadomski commented 2 weeks ago

@m-mohr coming in late ... feels pretty complex but since this is aimed at data providers and implementors, we can expect a decent level of technical competency. 👍🏼