nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
282 stars 93 forks source link

Use an endpoint instead of a port for the Minio service #2848

Open Adam-D-Lewis opened 2 weeks ago

Adam-D-Lewis commented 2 weeks ago

Discussed in https://github.com/orgs/nebari-dev/discussions/2845

Originally posted by **mcg1969** November 11, 2024 Hey folks! I wanted to ask if there was a specific reason why the Minio service was chosen to be deployed on a dedicated _port_ (9080) instead of somewhere on the path of the existing HTTPS port (443). In my experience, as long as the bucket name does not conflict with a path used by other applications, the ingress (e.g., Traefik) can handle routing S3 traffic just fine. Reducing the use of custom ports will make it easier to deploy Nebari in situations where it is important to re-use an existing cluster's ingress.
Adam-D-Lewis commented 2 weeks ago

We discussed this on an internal call today. The consensus was it we think this should be possible to change. There may be an issue with conda store supporting minio on a subpath, but we agree it can/should be fixed if not. Thanks for calling this out and I'll transfer this to an issue.

viniciusdc commented 2 weeks ago

For additional context, this issue was initially discussed in #980 (back when it was related to QHub), which contains a bit more info on why that port specifically was chosen. Conda-store does allow customization of the bucket name and addresses, as shown in the configuration file: https://github.com/nebari-dev/nebari/blob/59a65e25aa98b147777c304c73eeb2d80b58fd29/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/conda-store/config/conda_store_config.py#L40-L47

Since S3 (and MinIO) do not support subpaths (in the routing sense), changing the storage endpoint to port 443 would create a conflict between the conda-store's internal endpoints and storage access. We could rename the bucket to avoid the conflict; however, we need to plan how to migrate existing data safely.

I've tried some remediation in the past, but due to the internal signing of the URLs I ended encountering some problems with mismatches in the expected request bodies
https://github.com/conda-incubator/conda-store/blob/aac4f4b03033207566564efdaaa9fcc93c14d2d7/conda-store-server/conda_store_server/storage.py#L208

mcg1969 commented 2 weeks ago

Since a standard S3 url always includes the bucket name at minimum on the path I suspect conda-store will be fine.

And if the bucket name does not conflict with other API paths you can put the bucket names directly in the ingress so the minio "base" has no subpath

mcg1969 commented 2 weeks ago

We could rename the bucket to avoid the conflict; however, we need to plan how to migrate existing data safely.

This is definitely a good and important callout. One thing I've done in a similar project is to differentiate between fresh installs and upgrades. In short, upgrades would preserve the current configuration, leaving Minio on a dedicated port, while fresh installs would adopt the new approach. This buys time until a clean migration approach can be built.