singularityhub / sregistry

server for storage and management of singularity images
https://singularityhub.github.io/sregistry
Mozilla Public License 2.0
103 stars 42 forks source link

Use S3 as storage backend #361

Open tschoonj opened 3 years ago

tschoonj commented 3 years ago

Hi @vsoch

We are thinking of deploying our own sregistry instance, and I was wondering if it is currently possible to have the uploaded images stored on an S3 endpoint (Ceph in our case).

Thanks in advance!

vsoch commented 3 years ago

@tschoonj I did write a client for Ceph with sregstry-cli, but here we use Minio (which uses the S3 multipart upload protocol) to mimic the Singularity client and interact directly with their library:// API. Would you be able/open to trying that?

tschoonj commented 3 years ago

We would like our users to be able to download and run the containers using a single singularity command (e.g. singularity pull shub://containers.page/collection/container:tag), and not rely on the sregistry tool to make this happen.

I am very much familiar with the S3 plugin of sregistry-cli, and I am sure that it would work nicely with Ceph through boto3, but given the user requirement not an option here 😢

It's not a big problem though, as we also have access to CephFS shares that we can mount on the VM running registry, and use those for storing the images.

vsoch commented 3 years ago

Oh I’m not suggesting you use that, just that I’m familiar at least with the interactions. My suggestion (and question) is if you could use the default Minio backend, also S3 compliant, so you can just do singularity pull library://

tschoonj commented 3 years ago

Ah, apologies for misunderstanding then.

Is it possible then to configure Minio to use our Ceph endpoint with its credentials? And if so, how would I do that?

Thanks in advance!

vsoch commented 3 years ago

That actually might work - I haven't given it a try (but it notes that it's possible here) https://github.com/minio/minio/issues/6157. You should give it a shot!

tschoonj commented 3 years ago

Good morning/evening,

I had a closer look at the code, and it might be as easy as setting MINIO_SERVER and MINIO_EXTERNAL_SERVER to our Ceph endpoint.

In the docs you mention:

However for versions 1.1.24 and later, to better handle the Singularity library:// client that uses Amazon S3, we added a Minio Storage backend

So I assume that if this works with AWS S3, it should also work with Ceph. One thing that may be a bit of a problem is the presigned urls, where you enforce S3v4, which may not be supported by our (old) Ceph installation...

Thanks for the help!

vsoch commented 3 years ago

Sure thing! Give it a shot and let me know what issues you encounter - there likely could be workarounds for them.

tschoonj commented 3 years ago

Hi @vsoch ,

I was wondering if, given the use of S3 for storage through pre-signed urls, it is still necessary or advantageous to use your sregistry_nginx image instead of the regular nginx? If not, replacing it with traefik might be interesting, as it can take care of generating the letsencrypt certificates.

vsoch commented 3 years ago

@tschoonj the reason the sregistry_nginx image is still there is because it's used for upload from a web interface, e.g., here https://github.com/singularityhub/sregistry/blob/a076fc15c1322fa2e145067c38825c8575168b56/shub/apps/api/templates/routes/upload.html#L12 (and you can verify by manually uploading a container to a collection). If you'd like to test removing that image and replacing the upload with something else, or figuring out how to update the sregistry_nginx image to support traefik, I would definitely be open to trying that!

tschoonj commented 3 years ago

I knew you had a good reason to keep it around, I just didn't see it 😄

Do you think that these traefik config options might replicate the functionality offered by the nginx module you use for multipart uploads?

vsoch commented 3 years ago

I've never used trafik so I can't comment, but conceptually we need something alongside the registry server (that can be bound to it) where we can upload a large stream, and then have a callback that points the server to where it as (at best for a copy). This plugin looks like it serves more as a protection from large requests, which is the opposite of what we want to do.

My general theory with these things is that if you aren't sure, give it a try and see if it works!

tschoonj commented 3 years ago

Hmmm... I thought that maybe the memRequestBodyBytes setting would be useful here, the threshold (in bytes) from which the request will be buffered on disk instead of in memory...

I hope to give this a try over the next couple of days...