leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 5 forks source link

New Dataset [MetaFlux] #16

Closed AlexandreRebiere closed 3 months ago

AlexandreRebiere commented 1 year ago

Dataset Name

MetaFlux: Meta-learning global carbon fluxes from sparse spatiotemporal observations

Dataset URL

https://zenodo.org/record/7761881#.ZFj3LC3pNQI

Description

MetaFlux is a global, long-term carbon flux dataset of gross primary production and ecosystem respiration that is generated using meta-learning. This dataset will be added to the existing rodeo forecast model in order to improve its performances.

Size

The datasets consists of 200MB files with a total of 53.3GB

License

Unknown

Data Format

NetCDF

Data Format (other)

.nc

Access protocol

HTTP(S)

Source File Organization

data is organised with a daily resolution (One file per month from 2001 to 2021) and a monthly resolution (One file per year from 2001 to 2021.).

Example URLs

https://zenodo.org/record/7761881#.ZFj3LC3pNQI

Authorization

No; data are fully public

Transformation / Processing

Data seems to be already well organized. We will work on those datas without any modifications.

Target Format

Zarr

Comments

No response

jbusecke commented 1 year ago

Working on formalizing the dataset into a Pangeo Forge Recipe.

Temporary locations (subject to change later)

METAFlux monthly: gs://leap-persistent/jbusecke/data/library/test/dataflow/METAFLUX_GPP_RECO_monthly.zarr

@AlexandreRebiere: Note this one includes 2021, which was not included in the one I gave you earlier!

METAFlux daily: gs://leap-persistent/jbusecke/data/library/test/dataflow/METAFLUX_GPP_RECO_monthly.zarr

This one is still processing. Just wanted to put the url out here, as I think it should be done soon.

jbusecke commented 1 year ago

Finalized monthly dataset can be found here:

ds = xr.open_dataset('gs://leap-persistent/data-library/metaflux-gpp-reco-monthly-595733423-4998601040-1/METAFLUX_GPP_RECO_monthly.zarr', engine='zarr')
jbusecke commented 1 year ago

The daily dataset is still not run reliably. Will investigate tomorrow.

jbusecke commented 1 year ago

Got the daily dataset to run:

import xarray as xr
url = "gs://leap-persistent/data-library/metaflux-gpp-reco-daily-595733423-4998601040-1/METAFLUX_GPP_RECO_daily.zarr"
ds = xr.open_dataset(url, engine='zarr', chunks={})
jbusecke commented 1 year ago

@AlexandreRebiere @alexx-frcs please comment here and on #17 if there are any issues with the data!

FYI: I will be on vacation starting tomorrow until June 6. Might only very infrequently look at github.

jbusecke commented 1 year ago

I think I posted the monthly link twice above. Fixed that in the previous post

jbusecke commented 3 months ago

I have moved everything relevant to this issue into https://github.com/leap-stc/metaflux_feedstock and this will be part of the new LEAP catalog.