google / neuroglancer

WebGL-based viewer for volumetric data
Apache License 2.0
1.03k stars 286 forks source link

Support for datasets in authenticated S3-compatible private buckets #507

Open unidesigner opened 7 months ago

unidesigner commented 7 months ago

I am looking into supporting authenticated S3-Compatible file protocol where one could specify accessKey and secretKey to view data in a private bucket.

I imagine to be able to specify a source like this:

zarr://https://s3.us-west-004.amazonaws.com/bucketname/dataset?s3_access_key_id=accessKey&s3_secret_access_key=secretKey

which would initialize an S3Client from the aws-sdk/client-s3 SDK and make the appropriate request to get the info file and data chunks.

Where would I get started to implement this in Neuroglancer?

jbms commented 7 months ago

The place to add support would be here: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/util/special_protocol_request.ts

However, there are a few issues to consider:

aaronkanzer commented 4 months ago

Hi @jbms -- it seems that when I click https://github.com/google/neuroglancer/blob/master/src/neuroglancer/util/special_protocol_request.ts -- I get a 404 -- any chance you know if the code moved?

Also, nice to meet you 👋 I work over at MIT with @kabilar and others on the LINC Project: https://connects.mgh.harvard.edu/. We are hoping to leverage neuroglancer in at least the short-term for viewing private zarrs stored in S3

jbms commented 4 months ago

Updated URL is here: https://github.com/google/neuroglancer/blob/master/src/util/special_protocol_request.ts

The src/neuroglancer prefix was renamed to src/.

If you decide on the approach you will use for getting the AWS credentials in Neuroglancer I can offer more advice.

aaronkanzer commented 4 months ago

Updated URL is here: https://github.com/google/neuroglancer/blob/master/src/util/special_protocol_request.ts

The src/neuroglancer prefix was renamed to src/.

If you decide on the approach you will use for getting the AWS credentials in Neuroglancer I can offer more advice.

Any advice regarding AWS creds<>auth would be great, as all our assets are all hosted via S3 -- thanks in advance

Also, just tagging a few others involved in the project here for visibility @ayendiki @balbasty @MikeSchutzman

jbms commented 4 months ago

The options that I've thought of are in my previous comment: https://github.com/google/neuroglancer/issues/507#issuecomment-1844796884

The simplest thing to implement would be to use the syntax:

s3+awskey::://bucket/path

Additionally you could support Amazon Cogito for credentials --- that would probably be preferable in most cases but would be a bit more complicated.

You can probably use this library to handle the actual requests to s3:

https://www.npmjs.com/package/@aws-sdk/client-s3

There is also this example of using Amazon Cogito: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/getting-started-browser.html

aaronkanzer commented 4 months ago

Thanks @jbms -- could you just clarify the s3+awskey:<ACCESS_KEY>:<SECRET_KEY>://bucket/path option? Unless I'm mistaken, I don't think I'd want to present these keys in plaintext -- let me know what you had in mind.

Will look further into Cognito.

@unidesigner did you ever arrive at a solution? I am looking to proof-of-concept something, but curious to understand how you approached this.

Thanks all in advance

unidesigner commented 4 months ago

Hi @aaronkanzer - unfortunately, I have not had time to prioritize work on this. Are you working on a proof-of-concept implementation?

I'd suggest before adding complexity with Cognito to support the scheme proposed by @jbms to provide access_key and secret_key in the URL. The secret_key would indeed be provided in plain text applying Security by Obscurity.

aaronkanzer commented 4 months ago

Hi @aaronkanzer - unfortunately, I have not had time to prioritize work on this. Are you working on a proof-of-concept implementation?

I'd suggest before adding complexity with Cognito to support the scheme proposed by @jbms to provide access_key and secret_key in the URL. The secret_key would indeed be provided in plain text applying Security by Obscurity.

@unidesigner Yes, we are working on a proof-of-concept. After some research, we are implementing presigned cookies via AWS CloudFront.

We have a CloudFront Distribution that then sits in front of our S3 bucket, and serve neuroglancer directly This is allowing us to fetch a handful of private chunks of data in an efficient manner (e.g. in our case, we are working heavily with .ome.zarr.).

Once we have a cleaned-up e2e solution, I'm happy to share some diagrams or example code -- would also be curious to get @jbms thoughts too, and see if we can extend support directly into neuroglancer eventually.

Cc @kabilar