ldeo-glaciology / LEAP-Cryo-planning

A repo for planning and tracking progress on the LEAP-Cryo project: Learning ice-sheet flow with physics-based and machine learning models.
2 stars 2 forks source link

data storage for topographies (and eventually model output) #17

Open hoffmaao opened 1 year ago

hoffmaao commented 1 year ago

In order to train the autoencoder, we need topography data. There are a growing number of swath datasets that we could choose to include in the training architecture, but we need to decide on a place to store these data (10 Gb) perhaps with an eye toward the data volumes and output that we generate as part of the model simulations (~1 Tb). Any suggestions?

jkingslake commented 1 year ago

our google bucket could be a good place. I would need to add you as an authenticated user. Then you could follow instructions here (changing the directory you want to upload to from the one used there, to whatever directory you want to create for the data.

Importantly, to get the full value of storing data in the cloud, we should also move the compute tot he cloud. This is straightforward with LEAP because we can use LEAP-pangeo - a cloud-based juypterhub paid for by LEAP.

@Templar129, you should sign up as a tier 2 LEAP member here if you havent already. Then you will have access to leap pangeo here.

Templar129 commented 1 year ago

Hey all, I just signed up the tier 2 LEAP member this morning, although it says they might take some time to approve it. I will let you all know as soon as I got news from them.

jkingslake commented 1 year ago

Here is some documentation on leap pangeo that could be useful https://leap-stc.github.io/leap-pangeo/jupyterhub.html#files-and-data

Templar129 commented 1 year ago

Hi all, I just got approved for the pangeo jupyter hub. I will read some documentations and learn how to use it.

jkingslake commented 1 year ago

@hoffmaao where is this data at the moment? Are you able to access it and put it in our google bucket?

hoffmaao commented 1 year ago

@jkingslake @Templar129 I was working something up that would download these data as part of separate script that could be run from the command line that we could include in this repo as well as a "scrubbing" script that could be run to remove the large files before making new commits. Using the google bucket seems fine though too. I'll need to read more on how to upload and then download these kinds of data from a notebook.

hoffmaao commented 1 year ago

The data can be downloaded here: https://n5eil01u.ecs.nsidc.org/ICEBRIDGE/IRTIT3.002/

jkingslake commented 1 year ago

The link above has some good instructions on how to upload data to the LEAP google bucket. I thin in this case that makes sense to us (rather than our own LDEO-glaciology google bucket).

to write to the bucket the data needs to be in one or a few xarray.datasets. Then you write it to a zarr using the .to_zarr method on the xarray.dataset.