quiltdata / quilt

Quilt is a data mesh for connecting people with actionable data
https://quiltdata.com
Apache License 2.0
1.33k stars 91 forks source link

[question] Quilt and min.io? #1941

Open matheusmota opened 3 years ago

matheusmota commented 3 years ago

Hi there. Is it or will it be possible to use quilt with on-premises s3-based solutions like min.io? AWS/cloud may be unavailable in some scenarios.

Thanks!

akarve commented 3 years ago

It is possible and we have min.io on the roadmap. We invite you to try Quilt with a min.io endpoint and file bugs that you encounter, as we have yet to formalize support. In theory the min.io API means that it just works, in practice it's not that simple due to assumptions in the code and/or missing features in min.io.

matheusmota commented 3 years ago

Glad to hear that. I will definitely try it and let you know the results.

Thanks

matheusmota commented 3 years ago

One suggestion to encourage more people interested in testing it is to provide a small how-to.

akarve commented 3 years ago

Indeed. Just as a heads up we are still on the bleeding edge here and therefore haven't solicited people to try it, but if you are already using min.io and are willing to try then we welcome those data points.

On Fri, Nov 27, 2020 at 3:08 PM Matheus Mota notifications@github.com wrote:

One suggestion to encourage more people interested in testing it is to provide a small how-to.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/issues/1941#issuecomment-734999992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKC5W6YOJS6IOT7UTDONJ3SSAPPTANCNFSM4UFKX5OA .

--

Aneesh Karve, Co-Founder & CTO | 765-360-9348 | LinkedIn http://linkedin.com/in/aneeshkarve | Twitter https://twitter.com/akarve

quiltdata.com | open.quiltdata.com

Midnighter commented 3 years ago

@matheusmota I know your issue is only from six days ago but have you already given this a shot? Did you by any chance take some notes that you are willing to share if you started on this?

Either way, I will try to set up MinIO on a local node and address it with quilt in the coming days.

Janus-Xu commented 3 years ago

waiting for minio, grate jobs

zerafachris commented 2 years ago

@matheusmota @Midnighter Any updates on Min.IO? Maybe you can share your experience? I am currently considering giving quilt a go, but only have mio.io available

Midnighter commented 2 years ago

I briefly tried and was not successful. I haven't been able to give it a more serious attempt since then.

marcodlk commented 1 year ago

@akarve I am trying to establish Quilt as a core component of the data infrastructure at our research org. AWS is a non-starter for us so I am attempting to slowly fill in the AWS-dependent gaps with MinIO compatibility starting with the quilt3 python package - initially as a standalone that does not rely on a registry server. I quickly hacked together a solution that mainly just involves modifying the S3ClientProvider._build_client method to create a client with endpoint_url specified. Currently I just check an environment variable for the endpoint url and if it exists, create the client with the endpoint url, otherwise the same old way.

quilt3/datatransfer.py

class S3ClientProvider:
    ...
    def _build_client(self, get_config):
        session = self.get_boto_session()
        endpoint_url = getenv_s3_endpoint_url()
        if endpoint_url:
            return session.client(
                's3',
                config=Config(signature_version='s3v4'),
                endpoint_url=endpoint_url,
            )
        return session.client('s3', config=get_config(session))

As far as credentials, I currently edit the CREDENTIALS_PATH file with MinIO user credentials and it works fine.

Now this is just a starting implementation and far from optimal, but I'm wondering if this standalone MinIO-compatible mode is something that you're interested in supporting in the quilt3 python package and if you have any ideas as far as things to consider in the design.

Thanks!

akarve commented 1 year ago

@marcodlk nice workaround and directionally correct (sorry for the slow reply). what we're planning to do here is in the next-gen client (already in the works and will be open source) to abstract the providers a little bit so that at first any object-compatible store can be interposed (GCP, Azure, MinIO) so that's the long term solution and we don't have code just yet. wanna join our Slack and we can discuss further? thank you.

sir-sigurd commented 1 year ago

@marcodlk

With boto3>=1.28.0 you can use AWS_ENDPOINT_URL_S3 to customize endpoint URL. See https://docs.aws.amazon.com/sdkref/latest/guide/feature-ss-endpoints.html.

link89 commented 1 year ago

Hi @marcodlk Can you share the diff of the change you make? It looks like quilt never access the credentials.json file.

marcodlk commented 1 year ago

@link89 I no longer have access to the codebase I was working on, but looking at the code, quilt3.session._load_credentials still uses CREDENTIALS_PATH so that's odd. Are you sure it is the "credentials.json" in the Quilt app directory as specified by BASE_PATH in quilt3.util module? Have you tried @sir-sigurd 's solution?

akarve commented 1 year ago

For min.io support we hopefully don't need to touch credentials.json as that is for the special case where users authenticate to a Quilt stack. But in the more general case quilt3 just falls back onto the boto3 credential chain (and never touches credentials.json) and that is applicable in more cases, especially for pure open source users.

kevinemoore commented 12 months ago

Here is a draft PR that allows users to create their own S3 clients (including min.io clients) and map them to specific buckets. https://github.com/quiltdata/quilt/pull/3765

We'd appreciate any feedback on the interface. This isn't necessarily the best way for Quilt to find and access min.io servers. Please let us know how you think Quilt should map min.io endpoints and bucket names.