google / neuroglancer

WebGL-based viewer for volumetric data
Apache License 2.0
1.09k stars 298 forks source link

Support for datasets in authenticated S3-compatible private buckets #507

Open unidesigner opened 11 months ago

unidesigner commented 11 months ago

I am looking into supporting authenticated S3-Compatible file protocol where one could specify accessKey and secretKey to view data in a private bucket.

I imagine to be able to specify a source like this:

zarr://https://s3.us-west-004.amazonaws.com/bucketname/dataset?s3_access_key_id=accessKey&s3_secret_access_key=secretKey

which would initialize an S3Client from the aws-sdk/client-s3 SDK and make the appropriate request to get the info file and data chunks.

Where would I get started to implement this in Neuroglancer?

jbms commented 11 months ago

The place to add support would be here: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/util/special_protocol_request.ts

However, there are a few issues to consider:

aaronkanzer commented 9 months ago

Hi @jbms -- it seems that when I click https://github.com/google/neuroglancer/blob/master/src/neuroglancer/util/special_protocol_request.ts -- I get a 404 -- any chance you know if the code moved?

Also, nice to meet you 👋 I work over at MIT with @kabilar and others on the LINC Project: https://connects.mgh.harvard.edu/. We are hoping to leverage neuroglancer in at least the short-term for viewing private zarrs stored in S3

jbms commented 9 months ago

Updated URL is here: https://github.com/google/neuroglancer/blob/master/src/util/special_protocol_request.ts

The src/neuroglancer prefix was renamed to src/.

If you decide on the approach you will use for getting the AWS credentials in Neuroglancer I can offer more advice.

aaronkanzer commented 9 months ago

Updated URL is here: https://github.com/google/neuroglancer/blob/master/src/util/special_protocol_request.ts

The src/neuroglancer prefix was renamed to src/.

If you decide on the approach you will use for getting the AWS credentials in Neuroglancer I can offer more advice.

Any advice regarding AWS creds<>auth would be great, as all our assets are all hosted via S3 -- thanks in advance

Also, just tagging a few others involved in the project here for visibility @ayendiki @balbasty @MikeSchutzman

jbms commented 9 months ago

The options that I've thought of are in my previous comment: https://github.com/google/neuroglancer/issues/507#issuecomment-1844796884

The simplest thing to implement would be to use the syntax:

s3+awskey::://bucket/path

Additionally you could support Amazon Cogito for credentials --- that would probably be preferable in most cases but would be a bit more complicated.

You can probably use this library to handle the actual requests to s3:

https://www.npmjs.com/package/@aws-sdk/client-s3

There is also this example of using Amazon Cogito: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/getting-started-browser.html

aaronkanzer commented 8 months ago

Thanks @jbms -- could you just clarify the s3+awskey:<ACCESS_KEY>:<SECRET_KEY>://bucket/path option? Unless I'm mistaken, I don't think I'd want to present these keys in plaintext -- let me know what you had in mind.

Will look further into Cognito.

@unidesigner did you ever arrive at a solution? I am looking to proof-of-concept something, but curious to understand how you approached this.

Thanks all in advance

unidesigner commented 8 months ago

Hi @aaronkanzer - unfortunately, I have not had time to prioritize work on this. Are you working on a proof-of-concept implementation?

I'd suggest before adding complexity with Cognito to support the scheme proposed by @jbms to provide access_key and secret_key in the URL. The secret_key would indeed be provided in plain text applying Security by Obscurity.

aaronkanzer commented 8 months ago

Hi @aaronkanzer - unfortunately, I have not had time to prioritize work on this. Are you working on a proof-of-concept implementation?

I'd suggest before adding complexity with Cognito to support the scheme proposed by @jbms to provide access_key and secret_key in the URL. The secret_key would indeed be provided in plain text applying Security by Obscurity.

@unidesigner Yes, we are working on a proof-of-concept. After some research, we are implementing presigned cookies via AWS CloudFront.

We have a CloudFront Distribution that then sits in front of our S3 bucket, and serve neuroglancer directly This is allowing us to fetch a handful of private chunks of data in an efficient manner (e.g. in our case, we are working heavily with .ome.zarr.).

Once we have a cleaned-up e2e solution, I'm happy to share some diagrams or example code -- would also be curious to get @jbms thoughts too, and see if we can extend support directly into neuroglancer eventually.

Cc @kabilar

aaronkanzer commented 3 weeks ago

@unidesigner Realizing I never followed up here -- did you come to an implementation here?

We've been using our CloudFront solution for quite some time now with success -- let me know, happy to transfer any knowledge if helpful.

Cc @kabilar @satra

unidesigner commented 1 week ago

Hi @aaronkanzer - no, not yet unfortunately, but it is still something I'd want to look into given time. I'd be interested in understanding your CloudFront solution! Can you post it here or write me an email git@unidesign.ch. Thank you!