leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 5 forks source link

New Dataset [Subseasonal Rodeo Dataset] #14

Closed AlexandreRebiere closed 8 months ago

AlexandreRebiere commented 1 year ago

Dataset Name

The SubseasonalRodeo Dataset

Dataset URL

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IHBANG

Description

A benchmark dataset for training and evaluating subseasonal forecasting systems—systems predicting temperature or precipitation 2-6 weeks in advance—in the western contiguous United States. This dataset gathers various datas concerning precipitation and temperature, used in the Rodeo forecasting model.

Size

The datasets consists of 160 Mb files in a total of 10 GB.

License

Unknown

Data Format

HDF

Data Format (other)

.h5

Access protocol

HTTP(S)

Source File Organization

The source files aren't quite organized in this dataset. It gathers datas of different parameters (precipitation, wind, temperature...) from different eras.

Example URLs

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IHBANG

Authorization

No; data are fully public

Transformation / Processing

There's nothing to modify in this dataset. The forecasting model is able to work with this dataset. No more analysis is needed.

Target Format

Other

Comments

they have to keep the same name and the same format (.h5)

jbusecke commented 1 year ago

Is there any ability to combine some of these datasets? Or should each file be mapped to a single zarr store?

AlexandreRebiere commented 1 year ago

Actually no... each file is named for a certain reason and used at different moments in the algorithm. Each file may have to exist in its own zarr...

jbusecke commented 8 months ago

Closing this as wontfix due to inactivity.