Open joemoorhouse opened 1 year ago
Hi @redmikhail, Sorry, I really dragged my heels over this issue! - this relates to what we discussed a couple of weeks ago or so. We need to control better the OS-C hazard indicator data sets. As discussed we probably need separate and dedicated 'test' and 'prod' buckets? Not sure about naming conventions. @MichaelTiemannOSC, @MightyNerdEric and @HeatherAck also FYI
Hi @joemoorhouse , from the naming convention perspective generally to separate infrastructure buckets we following osc-physrisk-hazard-indicators
for "prod" and have second bucket named physrisk-hazard-indicators-dev01 (considering that it potentially will be used for both purposes) . Regarding public access - I am starting to wonder if we should have separate bucket that will be publicly available for OS-C general use including for physical risk . It seems that we will have some raw data that we potentially need to share with community outside of OS-Climate organization. Please let me know what do you think.
Hi @redmikhail, We discussed together on Friday an option:
The idea is that even non-members can easily get set up and take a copy of data using the public read-only bucket. Some care needed to ensure that users are taking their own data.
I think the option seems good, but am happy to be guided by you and @MichaelTiemannOSC in this. As long as we have a separate dev bucket to avoid accidents and users have some way to download data then I'm happy!
If you are both happy with the public readonly bucket option then please go ahead and create and then I'll transfer the data across.
Thanks, Joe
Hi @joemoorhouse , I have created public s3 bucket "arn:aws:s3:::redhat-osc-physical-landing-64759867891", let me know if you can move data to this
Thanks Keshav
Hi @keshavnath1; hi @redmikhail, Thanks for this. How do I get the credentials for writing? I tried and these seem to be different credentials from the bucket?: redhat-osc-physical-landing-647521352890
Also, I guess we would want credentials to allow the user of redhat-osc-physical-landing-647521352890 to be able to use PutObject and CopyObject on redhat-osc-physical-landing-64759867891 - for efficient transfer between the two? Or is there another/better way to do this? I saw this for example on the subject: https://stackoverflow.com/questions/65577223/aws-s3-copy-object-from-one-bucket-to-another-with-different-credentials
Also, I thought from the above we were going with naming convention 'physrisk-hazard-indicator-...'? Thanks, Joe
Hi @joemoorhouse @keshavnath1 is my team mate told me to look into this s3 write. Is it ok if i create a aws iam user send you the credentials and you can write to that bucket . Once you are done writing to that bucket will suspend the user. let me know if this is ok? If so can you give a email id so that I can send aws credentials to that.
Hi @joemoorhouse Who are @samanth91 and @keshavnath1 and what is their role? 64759867891 is not an aws account that is managed by OS Climate, so I have no idea where that bucket is.
Hi @ryanaslett - both are from freddiemac.com and are new users who want to test the Physical Risk & Resilience tool - I believe they plan to test it locally in their own environment.
Hi @ryanaslett, Ah, and I assumed they were colleagues of @redmikhail! Thanks for clearing that up @HeatherAck!
So @samanth91 and @keshavnath1, the idea is that OS Climate is creating a public bucket, then you can transfer from there however you like. I believe @redmikhail is working on that. I'll give you the details once that is complete.
Hi @redmikhail, Did you have a chance to create the public bucket? I believe@samanth91 and @keshavnath1 need this to continue their work. I think you were considering creating the buckets: os-climate-public-data (readonly; public) physrisk-hazard-indicators-dev01 (private; dev) physrisk-hazard-indicators (private; prod)
Although I think os-climate-public-data is the most urgent. Thanks, Joe
@ryanaslett and @MightyNerdEric - could you please create the public bucket. I think @redmikhail may be on vacation this week.
@HeatherAck - my apologies ! I was away for July 3rd and 4th for holidays
Hi @joemoorhouse , all tasks should be completed now. Here are details for the configuration:
physrisk-hazard-indicators
bucket - private, prod for physical risk application (intended to be accessed by the application). Existing setup, no changes.
single user has access to the bucket physrisk-hazard-indicators-s3-user - rwdl -> physrisk-hazard-indicators
physrisk-hazard-indicators-dev01
- private , for development purposes (intended to be accessed by the application). New setup. Access is similar to the production - single service account has privileges to read-write.
single user has access to the bucket physrisk-hazard-indicators-dev01-s3-user - rwdl -> physrisk-hazard-indicators-dev01
os-climate-public-data
- publicly accesible bucket for read-only access. Bucket to be used by OS-Climate members to share information with interested parties outside of the OS-Climate organization. Examples of types of information that can be shared this was- small sets of data for demo, testing and prototyping. Bucket is not intended to be used from applications. Currently list objects operations are not allowed for unauthenticated users, direct link to the objects should be provided. Separate user has been created for physical risk application that has read-only, list objects access to physrisk-hazard buckets and read/write/list/delete to public bucket. This user can be used to transfer data to public bucket. For physical risk data please use prefix physrisk* for any data , eg. s3://os-climate-public-data/physrisk/bucket_test2.txt
anonymous/unauthenticated users - ro - > os-climate-public-data physrisk-public-bucket-s3-user - read,list -> physrisk-hazard-indicators , physrisk-hazard-indicators buckets physrisk-public-bucket-s3-user - write, read, list -> os-climate-public-data
All credentials are added to secrets physrisk-s3-keys(physrisk-hazard-indicator), physrisk-dev-s3-keys(physrisk-hazard-indicators-dev01),physrisk-public-s3-keys (os-climate-public-data for rw access)
To copy data to public bucket you can use aws s3
commands connecting with appropriate AWS keys (from physrisk-public-s3-keys secret) , for an example:
aws s3 cp s3://physrisk-hazard-indicators/bucket_test2.txt s3://os-climate-public-data/physrisk/bucket_test2.txt
Data can be rerieved by any user using web browser specifying direct url, example https://os-climate-public-data.s3.amazonaws.com/physrisk/bucket_test2.txt, curl commands - curl -L https://os-climate-public-data.s3.amazonaws.com/physrisk/bucket_test2.txt -o ./physrisk/bucket_test2.txt
or using aws cli - aws s3 cp s3://os-climate-public-data/physrisk/bucket_test2.txt --region us-east-1 --no-sign-request
Thanks @redmikhail, that's great... I'm on vacation also hence delay in reply! I'll give the copying a go.
Hi @redmikhail, @keshavnath1, @samanth91,
Sorry for the delay - vacations got in the way. I have now copied the hazard data from the old bucket to 'physrisk-hazard-indicators'. Still to do to migrate sandbox over to point to new bucket.
I also copied hazard data into 'os-climate-public-data' which is therefore now publically accessible.
List operations on os-climate-public-data are not permitted as @redmikhail mentioned above, which will of course make taking a copy problematic! To get around this, I've added a file hazard/keys.txt with the list of the keys comprising the hazard files. There are about 78,000 / 45 GB there currently. The large number is from the chunking of the (zarr) data.
https://os-climate-public-data.s3.amazonaws.com/hazard/keys.txt
The idea is then to copy the keys in this list (e.g. using boto3 copy_object or similar) or subset for demonstration purposes.
@joemoorhouse Thanks for the info will be using those keys.
Is your feature request related to a problem? Please describe. Currently there is no dedicated S3 for hazard data. Rather we use use redhat-osc-physical-landing-647521352890 We also recently introduced another bucket for hazard model development: https://github.com/os-climate/os_c_data_commons/issues/273 physrisk-hazard-indicators But I think we need separate dedicated buckets for test and 'prod'. 'Prod' means here the store used by the sandbox, but we need a low risk of accidental overwriting even so I think.
Describe the solution you'd like Do we have: physrisk-hazard-indicators physrisk-hazard-indicators-test for example and use physrisk-hazard-indicators for 'prod' or physrisk-hazard-indicators-prod? Not sure if there is a convention used across OS-C. Secrets are maintained here by the way: https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/sandbox/secrets/physrisk-s3-keys
Separately, there is a need for non-members to be able to download bucket contents. Ideally we don't want to make buckets public as model is to federate or for members to host their own data (i.e. OS-C not to be data provider). SFTP could facilitate the latter.