leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

New Dataset [E33OMA90D] #126

Open smhassanerfani opened 5 months ago

smhassanerfani commented 5 months ago

Dataset Name

E33OMA90D

Dataset URL

No response

Description

This dataset represents aerosol compositions simulated by the NASA GISS ModelE ESM for the first three months of 1950, covering three different aerosol species: Sea Salt, Black Carbon, and Clay.

Size

The dataset is a NetCDF file with a total size of 79GB.

License

Unknown

Data Format

NetCDF

Data Format (other)

No response

Access protocol

scp

Source File Organization

There are 48 files per day for velocity fields (including u, v, and w), precipitation, emissions, and concentrations of each aerosol species. The velocity fields and concentrations are vertically expanded over the 60 pressure levels.

Example URLs

No response

Authorization

None

Transformation / Processing

No response

Target Format

Zarr

Comments

The dataset is currently located in Simurgh, Marcus supercomputer.

jbusecke commented 5 months ago

Hey @smhassanerfani. Thanks for raising this.

The dataset is currently located in Simurgh, Marcus supercomputer.

Can we somehow get access to that computer via HTTP, FTP or globus?

smhassanerfani commented 5 months ago

Hi Julius, I checked with Marcus about our options for transferring data in Simurgh. We normally use scp and rsync. Do you think these options could help?

jbusecke commented 5 months ago

We do not currently support that with pangeo-forge unfortunately. I have raised https://github.com/pangeo-forge/pangeo-forge-recipes/issues/753 to discuss.

What is your timeline here? If you have a tight deadline, we can see if we can hack around this for now, and try to implement a cleaner solution later.

smhassanerfani commented 5 months ago

We need it as soon as possible. We could begin with the smaller version, which is around 70GB, and delay the larger version until a clean solution is available. By the way, can we utilize the solution you mentioned in the technical document?

image
jbusecke commented 5 months ago

@smhassanerfani yes you could, and should, particularly if this is time sensitive!

But maybe before resorting to 'pushing' the data we try https://github.com/pangeo-forge/pangeo-forge-recipes/issues/753#issuecomment-2159223296 out first? I could make some time for this tomorrow or Thu?

smhassanerfani commented 5 months ago

Awesome! let me know whenever works better for you. I can stop by LEAP or we can have a Zoom if needed.

jbusecke commented 5 months ago

Quick summary of my meeting with @smhassanerfani.