Closed m-mohr closed 2 weeks ago
I like the use of references, which is a slightly different approach that we have done in the past and would probably be a good way to define spectral bands. But that's not relevant here.
I'm not sure why storage:refs
should be a list though. Each asset has a single URL which would be for a single storage platform. Alternate assets handle the case of the same asset on different storage providers.
Is there a reason this isn't just a string?
Could be a string, if there's no way something could be hosted with the same URL on different storage providers? Not sure, would something like hosted on AWS US and AWS EU work?
Out of discussions with @m-mohr and @matthewhanson :
"type": "S3"
endpoint_url
, requester_pays
)
- The platform property should be renamed with the protocol used. For instance
"type": "S3"
That was your proposal, which I didn't agree with yet. Neither did Matt. We need to clarify the details. Just a type seems not enough to transmit information about the storage provider. I just don't understand how this should work with just a protocol. I'd need a more concrete proposal to understand how this may work.
- a new property for extra parameters specific to the protocol (e.g.
endpoint_url
,requester_pays
)
Yes, we talked about the parameters, but we did not agree yet to move the existing properties all there. I understood it more as a free-form object for additional properties, i.e. in addition to what is in this PR. Also, this might need more standardization per provider to avoid that it's actually diverging between implementations.
Additionally, we discussed whether storage:refs should be an array or string and that the endpoints discussed above should resolve to something useful, i.e. they should usually be the actual API endpoint, not a generic service webpage or so.
Just a type seems not enough to transmit information about the storage provider.
I second this. The type: S3
, for example, could be relevant for AWS, Minio, or any other S3-compatible technologies using the same protocol. It doesn't help identify the actual provider.
I completely reworked the PR to make a new proposal. This is more a high-level framework now for various providers based on URI templates and other free-form variables. It is meant that the extension itself is just a framework and collections best-practices around various providers, but the providers are not actually enforced or so. Everyone can contribute additional documents for providers easily by just providing short documents with platform URI and variables. It feels like otherwise we'll always go through the hell of aligning, standardizing and are always outdated with every iteration in cloud provider evolutions. Happy to discuss and looking foward to feedback and additional providers. Kept the providers intentionally small as I don't have much experience with others and want to keep this to experienced users.
Ideally we would define the general approach in this PR and then add the actual best practices per provider in separate PRs. The examples that I have right now are more to show how it's meant to work, not to define actual best practices for S3/AWS/Azure. Otherwise we'll probably never finish the PR itself...
@matthewhanson @emmanuelmathot @fmigneault @philvarner
PS: This is my last attempt to make something useful out of this extension. If we can't agree on something in the next weeks I consider this extension to be non-functional for many cases and I'll ditch it for my use cases in favor of a custom extensions per provider.
@philvarner @matthewhanson any updates regarding the reviews?
I just added a type field for validation / discriminator purposes.
@fmigneault @emmanuelmathot and everyone else: Any last minute comments? I think we can merge soon once @matthewhanson made his review.
This looks great, @m-mohr went over it with me this morning. I like the changes, big improvement over 1.0
If there are still comments, please open issues. I'll release next week. Thanks everyone.
@m-mohr coming in late ... feels pretty complex but since this is aimed at data providers and implementors, we can expect a decent level of technical competency. 👍🏼
This PR proposes version 2 of this extension and fixes most issues that are open. It is meant to be a better, more universal and more lightweight approach based on the learnings from v1. It also reworks the extension so that less duplication happens in assets based on the archtiecture in the authentication extension. This is heavily breaking but I think this is required to make this a better extension that can fullfil all the various needs.
Ideally we would define the general approach in this PR and then add the actual best practices per provider in separate PRs. The examples that I have right now are more to show how it's meant to work, not to define actual best practices for S3/AWS/Azure. Otherwise we'll probably never finish the PR itself...
Please review the changes carefully:
See the changelog below:
Added
storage:schemes
,storage:ref
and Storage Scheme ObjectChanged
storage:schemes
and located in the Item Properties, Collections or Catalog metadatastorage:ref
Removed
storage:platform
,storage:region
,storage:requester_pays
andstorage:tier