pangeo-data / poseidon

Coordination repo for NSF poseidon project
0 stars 0 forks source link

transfer LLC4320 zarr data to SciServer #1

Open rabernat opened 5 years ago

rabernat commented 5 years ago

All of the data is cataloged here, with links to google cloud: http://pangeo.io/catalog.html

rabernat commented 5 years ago

Specifically, one will use gsutil with a command like

gsutil -m cp -r gcs://pangeo-data/llc4320_surface/SST .
rabernat commented 5 years ago

Hi @glemson, have you given this a try?

glemson commented 5 years ago

I have not had the time to work on this yet.

-----Original Message----- From: Ryan Abernathey [mailto:notifications@github.com] Sent: Wednesday, October 17, 2018 3:12 PM To: pangeo-data/poseidon poseidon@noreply.github.com Cc: glemson gerard.lemson@gmail.com; Mention mention@noreply.github.com Subject: Re: [pangeo-data/poseidon] transfer LLC4320 zarr data to SciServer (#1)

Hi @glemson https://github.com/glemson , have you given this a try?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo- data/poseidon/issues/1#issuecomment-430752846 , or mute the thread https://github.com/notifications/unsubscribe- auth/AB64l39Oc0pZFiEmPbCpLE3PIQQlcvVdks5ul4EDgaJpZM4XaJui . https://github.com/notifications/beacon/AB64l5hD4-0MLSVMG3IYcl- dCyxPbpIIks5ul4EDgaJpZM4XaJui.gif

glemson commented 5 years ago

Problem with gsutil, most likely due to incorrect installation. $ filedb02:/srv/data01/ocean/LLC4320:50$ ~/gsutil/gsutil -m cp -r gcs://pangeo-data/llc4320_surface/SST . InvalidUrlError: Unrecognized scheme "gcs".

rabernat commented 5 years ago

Use gs:// instead of gcs:///

On Thu, Dec 13, 2018 at 7:44 AM glemson notifications@github.com wrote:

Problem with gsutil, most likely due to incorrect installation. $ filedb02:/srv/data01/ocean/LLC4320:50$ ~/gsutil/gsutil -m cp -r gcs://pangeo-data/llc4320_surface/SST . InvalidUrlError: Unrecognized scheme "gcs".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/poseidon/issues/1#issuecomment-446955462, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJkhLIVRymp1KhU-HU4aAzjKgAuaOks5u4kucgaJpZM4XaJui .

glemson commented 5 years ago

Thanks, it seems to be working.

-----Original Message----- From: Ryan Abernathey [mailto:notifications@github.com] Sent: Thursday, December 13, 2018 8:57 AM To: pangeo-data/poseidon poseidon@noreply.github.com Cc: glemson gerard.lemson@gmail.com; Mention mention@noreply.github.com Subject: Re: [pangeo-data/poseidon] transfer LLC4320 zarr data to SciServer (#1)

Use gs:// instead of gcs:///

On Thu, Dec 13, 2018 at 7:44 AM glemson notifications@github.com wrote:

Problem with gsutil, most likely due to incorrect installation. $ filedb02:/srv/data01/ocean/LLC4320:50$ ~/gsutil/gsutil -m cp -r gcs://pangeo-data/llc4320_surface/SST . InvalidUrlError: Unrecognized scheme "gcs".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/poseidon/issues/1#issuecomment-4469554 62, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJkhLIVRymp1KhU- HU4aAzjKgAuaOks5u4kucgaJpZM4XaJui .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo- data/poseidon/issues/1#issuecomment-446976888 , or mute the thread https://github.com/notifications/unsubscribe- auth/AB64l3O4UOZjmZ0ArKox6cN-KtzeKTqGks5u4lyvgaJpZM4XaJui . https://github.com/notifications/beacon/AB64l0UGp8cr1IHioWb- 2NH51Q3M7jGsks5u4lyvgaJpZM4XaJui.gif

glemson commented 5 years ago

files have been downloaded to /SciServer/filedb02-01/ocean/LLC4320. I asked sysadmins to create a linux group 'poseidon' that should have access to this data. all/most of us will be member of that group. problem is currently I cannot chown to Poseidon, because apparently I am in too many groups. We'll solve that somehow.

Mikejmnez commented 4 years ago

Hello! Just adding to this trend. Only SST has been transferred to filedb02, and we are still missing (on SciServer) the rest of the surface variables available on the pangeo cloud, along with the GRID data. Now that I have access to the group directory, I will finish downloading the rest of the data.

Mikejmnez commented 4 years ago

@rabernat I tried downloading the rest of the surface variables onto SciServer, but was only able to download SSS and the grid. When I tried SSH, SSU, SSV or grid instead of SSS (or even SST) I get the following error

-bash-4.2$ ~/gsutil/gsutil -m cp -r gs://pangeo-data/llc4320_surface/SSH . CommandException: No URLs matched: gs://pangeo-data/llc4320_surface/SSH CommandException: 1 file/object could not be transferred.

I get the same error for SSU, and SSV (I also tried lower case). Do you know what is the correct URLs for these variables?

rabernat commented 4 years ago

All the info is cataloger here: https://github.com/pangeo-data/pangeo-datastore/blob/master/intake-catalogs/ocean/llc4320.yaml

However, we need to pause these downloads for a moment. Your transfers have racked up nearly $2000 in egress fees in the past few days.

If I put the bucket into requester-pays mode, can you pay the remaining egress fees from your account? I think you should have lots of money in your CSSI budget for this. (We only have $5K a year.)

-Ryan

Mikejmnez commented 4 years ago

Ok, I was not aware of these fees. Another alternative that I am not very familiar is Globus Connect. Tom mentioned that we have access to an account from JHU to finish transferring the files. I will look into this confirm with you once I have this set up.

-Miguel


From: Ryan Abernathey notifications@github.com Sent: Tuesday, November 26, 2019 12:10 PM To: pangeo-data/poseidon poseidon@noreply.github.com Cc: Miguel Jimenez-Urias mjimen17@jhu.edu; Comment comment@noreply.github.com Subject: Re: [pangeo-data/poseidon] transfer LLC4320 zarr data to SciServer (#1)

All the info is cataloger here: https://github.com/pangeo-data/pangeo-datastore/blob/master/intake-catalogs/ocean/llc4320.yaml

However, we need to pause these downloads for a moment. Your transfers have racked up nearly $2000 in egress fees in the past few days.

If I put the bucket into requester-pays mode, can you pay the remaining egress fees from your account? I think you should have lots of money in your CSSI budget for this. (We only have $5K a year.)

-Ryan

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pangeo-data/poseidon/issues/1?email_source=notifications&email_token=AB64CSLPFXIIQLHUN57DTMDQVVKBFA5CNFSM4F3ITORKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGYJ2A#issuecomment-558728424, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB64CSOJVY3XUVUQF5KY3JDQVVKBFANCNFSM4F3ITORA.

rabernat commented 4 years ago

Globus will not solve the problem. But I thought that this deal was supposed to help with the egress situation: https://www.internet2.edu/blogs/detail/14984

I'll ping our contacts at google for more info.

cc @lila