Open hoffmaao opened 1 year ago
our google bucket could be a good place. I would need to add you as an authenticated user. Then you could follow instructions here (changing the directory you want to upload to from the one used there, to whatever directory you want to create for the data.
Importantly, to get the full value of storing data in the cloud, we should also move the compute tot he cloud. This is straightforward with LEAP because we can use LEAP-pangeo - a cloud-based juypterhub paid for by LEAP.
@Templar129, you should sign up as a tier 2 LEAP member here if you havent already. Then you will have access to leap pangeo here.
Hey all, I just signed up the tier 2 LEAP member this morning, although it says they might take some time to approve it. I will let you all know as soon as I got news from them.
Here is some documentation on leap pangeo that could be useful https://leap-stc.github.io/leap-pangeo/jupyterhub.html#files-and-data
Hi all, I just got approved for the pangeo jupyter hub. I will read some documentations and learn how to use it.
@hoffmaao where is this data at the moment? Are you able to access it and put it in our google bucket?
@jkingslake @Templar129 I was working something up that would download these data as part of separate script that could be run from the command line that we could include in this repo as well as a "scrubbing" script that could be run to remove the large files before making new commits. Using the google bucket seems fine though too. I'll need to read more on how to upload and then download these kinds of data from a notebook.
The data can be downloaded here: https://n5eil01u.ecs.nsidc.org/ICEBRIDGE/IRTIT3.002/
The link above has some good instructions on how to upload data to the LEAP google bucket. I thin in this case that makes sense to us (rather than our own LDEO-glaciology google bucket).
to write to the bucket the data needs to be in one or a few xarray.datasets. Then you write it to a zarr using the .to_zarr
method on the xarray.dataset.
In order to train the autoencoder, we need topography data. There are a growing number of swath datasets that we could choose to include in the training architecture, but we need to decide on a place to store these data (10 Gb) perhaps with an eye toward the data volumes and output that we generate as part of the model simulations (~1 Tb). Any suggestions?