mamba-org / mamba

The Fast Cross-Platform Package Manager
https://mamba.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6.53k stars 342 forks source link

Request: Add support for s3 channels #394

Open hajapy opened 3 years ago

hajapy commented 3 years ago

Although not widely advertised, conda supports channels with the s3:// scheme when boto3 or boto are available in the conda environment. At present this is not available in mamba, so it prevents me from switching over as we use private s3 channels at my company. Per a discussion on gitter, it was noted there are a few options as to how this could be addressed:

1. Leverage the aws sdk for c++

https://github.com/aws/aws-sdk-cpp. Pros: signing, credentials discovery are handled for us. Cons: extra dependencies, more build requirements, potential licensing details to work through. Example of what would be needed code-wise: https://aws.amazon.com/blogs/developer/using-cmake-exports-with-the-aws-sdk-for-c/

2. Compute headers and use http

This is spelled out here with a python example: https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html, and there is also an example with curl here: https://gist.github.com/drfill/c18308b6d71ee8032efda870b9be348e. At a glance issues: only credentials by env var is demonstrated: in practice aws credentials can be granted through: env var, shared config file, IAM roles. I am unclear what c++ dependencies would be required to accomplish that, but looking at the is the aws-sdk-cpp signing code suggests a non-trivial effort: https://github.com/aws/aws-sdk-cpp/blob/master/aws-cpp-sdk-core/source/auth/AWSAuthSigner.cpp.

3. Not at all

Declare such support won't be brought to mamba. (Perhaps a list of unsupported features is in order?).

Thoughts

To me, rolling our own seems like it would be harder than integrating the sdk, given the complexities with signing and credentials. If we use the sdk, we don't necessarily need to use their s3 interface, we can just use the signing and credentials from core, and use http once we've got proper headers. I am also OK with not having this feature, given the quetz may provide a solution to even needing private s3 channels in the first place.

Lastly, the question came up about testing: for python you can leverage https://github.com/spulec/moto to mock out s3.

wolfv commented 3 years ago

Hi @hajapy thanks for bringing this up.

indeed, for all downloading tasks in mamba we're currently using libcurl, which doesn't have a native way of handling S3 URLs.

It is also not high priority for us to develop this feature. As you wrote, it would probably be doable by integrating the C++ SDK -- we could also think about a real plugin infrastructure if we develop such a feature.

We are definitely going to add S3 (and other cloud provider support) to Quetz -- but then the links will appear as HTTPS links to the quetz server.

If someone is very interested in this feature, I think it would be an excellent candidate for a Pull Request or sponsored development.

wolfv commented 3 years ago

Hi @hajapy I think unless someone wants to sponsor this work, we're not going to work on this.

Quetz is the preferred route for this.

analog-cbarber commented 2 years ago

Another possible implementation option would be to simply make system calls to the aws client app.

quetz isn't really a solution to this problem. Are you saying I need to install a server just to be able to install packages from an S3-hosted channel using mamba? That seems kind of heavy weight especially given that many people already are using different servers (e.g. Artifactory).

I can understand if you don't want to work on this, but it does mean that mamba will never be a drop-in replacement for conda.

This limitation should at least be documented.

If it were me, I would leave this issue open with a no-plan-to-fix tag or the like.

hajapy commented 2 years ago

Some other approaches might be:

It would be very nice for this to be supported, as mamba is such a time saver on the solve step, and to truly fulfill the drop-in replacement claim.

I would agree otherwise at least documenting the limitation would be beneficial.

wolfv commented 2 years ago

Sure, all of these approaches are good. I'd be happy if someone spends the time to implement this. A subprocess call would probably be the easiest/preferred way.

Also happy to merge any PRs for docs. Feel free to send them.

wolfv commented 2 years ago

OCI and S3 mirror support are now implemented in powerloader which is going to be the future downloader backend for mamba.

https://github.com/wolfv/powerloader

jpedrick commented 1 year ago

@wolfv , are there instructions for how to utilize powerloader with mamba?