leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

[Tracking] Transfer data to OSN -> Virtualize OSN dataset with beam -> View dataset __repr__ in catalog #157

Open norlandrhagen opened 1 week ago

norlandrhagen commented 1 week ago

Meta tracking task to see if we can:

  1. Move a set of archival files off of NCAR or another cluster into LEAP's OSN bucket.
  2. Create a virtualizarr reference from the files on OSN.
  3. Add the virtual dataset reference to the LEAP catalog with a repr that includes: engine = 'kerchunk'. https://github.com/leap-stc/data-management/issues/150

cc @leap-stc/data-and-compute

jbusecke commented 1 week ago

@norlandrhagen this is the dataset I mentioned yesterday. We should figure this out with something smaller first, but I think this should be the first 'real world' target!

jbusecke commented 1 week ago

Another one could be Climsim