Closed tschoonj closed 5 years ago
If the bucket works akin to a registry, this is fairly reasonable to do. Could you give me an example of a URI that defines a bucket and a registry (is that possible?)
Right now, here is how your uri above is parsed, notice that "registry" gets filled with bucket:
In [4]: parse_image_name(remove_uri(image))
Out[4]:
{'collection': 'library',
'image': 'image',
'original': 'bucket/library/image:tag',
'registry': 'bucket',
'storage': 'bucket/library/image-tag.sif',
'storage_uri': 'bucket/library/image:tag.sif',
'storage_version': 'bucket/library/image:tag.sif',
'tag': 'tag',
'tag_uri': 'bucket/library/image:tag',
'uri': 'bucket/library/image-tag',
'url': 'bucket/library/image',
'version': None}
So we could move the requirement of the bucket to later, and then fail if it's not provided during push / pull (or similar). The one concern would be if it's possible to have a registry and/or a bucket, then we wouldn't be able to tell the difference (and it's likely not possible unless we did something like)
sregistry push --bucket bucket --name s3://ibrary/image:tag /path/to/image.sif
Thoughts?
What would the bucket need to do to function as a registry?
What gets put in 'registry' currently? The library
?
What would the bucket need to do to function as a registry?
It would need to be the case that there is no concept of a registry for amazon s3. If there is a registry and bucket, we can't share the same parser. if you could give me a list of examples (with and without both registry and library, if that's possible) it would help.
What gets put in 'registry' currently? The library?
No, if there is no registry, it's just left as None.
I think more thinking needs to be done for this - there is in fact a difference between a registry (or base) and a bucket, see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html. And in the example docs for s3 here, we connect to a Minio server (so the base/registry is served at Localhost). This means that we can't adopt the common parser function because it can't tell between a bucket and a registry. What we could do is provide another avenue for specifying a bucket, although we would need to change around the init sequence to re-parse/connect to it.
Apologies, but I am afraid you lost me...
By registry for amazon s3: what do you mean exactly? As in a deployment of a singularity registry server in a bucket? If so, I don't think that's possible as s3 really is just for file storage...
The registry is the server - so for Minio (serves S3) it would be like http://127.0.0.1:9000. For Amazon, the pattern would be something like s3.<region>.amazonaws.com
. For example, both of these should technically be valid (although they aren't currently parsed), we require the registry set as an environment variable first:
sregistry push --name s3://s3.<region>.amazonaws.com/library/image:tag /path/to/image.sif
sregistry push --name s3://127.0.0.1:8000/library/image:tag /path/to/image.sif
The first parses s3
as the endpoint, s3.<region>.amazonaws.com
as the registry, library
as the collection namespace, image
as the image, and tag
as tag. Does that make sense? The uri can't accept a bucket there because we wouldn't be able to tell between a bucket and a registry.
We don't currently do special parsing to derive these from the URI (and someone might request this as a feature, and it would make sense)
By registry for amazon s3: what do you mean exactly? As in a deployment of a singularity registry server in a bucket? If so, I don't think that's possible as s3 really is just for file storage...
I mean the storage server. It's not a Singularity Registry, no, but it is a server with an address we push / pull / interact with.
Ok thanks for the clarification.
I think that things would be simpler if the S3 backend would be used exclusively for actual AWS S3 buckets, in which case you would no longer need to worry about the endpoint. There would also be no need for specifying the region (as is currently the case already).
In this case, to continue supporting the Minio server, perhaps it would be best to create a dedicated client for it (which could inherit from the S3 client?)
There will always be the need for specifying a different base, even for an s3 user that doesn't need minio (for example, let's say that the zone changed). We specify this registry (what is called the base in the code) when we instantiate the client:
self.s3 = boto3.resource('s3',
endpoint_url=self.base, < --- this set to None will use s3 default, or some custom base
aws_access_key_id=self._id,
aws_secret_access_key=self._key,
config=boto3.session.Config(signature_version=self._signature))
Minio is s3 storage (it has buckets too) so I don't think we would need to introduce a lot of redundant code. I think what we want to discuss is what would be a better way to designate a bucket?
There are a lot of labs / institutions that host their own s3 storage, or just have something as trivial as a different zone, so the endpoint specification can't just be removed and enforced that everyone use the same. What about just adding optional kwargs to push/pull and then we could allow a bucket argument?
You cannot actually change the zone of an existing bucket. For that you would need to copy your buckets to an intermediate bucket, delete the old one, wait 24 hours, create a new bucket with the old name in the zone of your choice, and copy the contents of the intermediate bucket to the new one.
Since bucket names have to be unique across all AWS S3 zones, I don't think one would ever need to specify the endpoint_url
argument to use them.
Using a command-line option to specify the bucket name would be slightly cleaner than using an environment variable, but not sure if that really adds much.
@tschoonj is this ok to close?
Is your feature request related to a problem? Please describe.
Currently pushing and pulling images from an s3 bucket requires the use of an environment variable
SREGISTRY_S3_BUCKET
to specify the bucket that will be used for these operations. I am thinking that it may be cleaner to use the bucket name in the uri.I understand that this is not a trivial fix, especially if the current behavior still needs to be supported...
Even though I haven't used them, I guess that the other backends may benefit from a similar approach as well?
Describe the solution you'd like
Something like this would be ideal: