leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

New Dataset [Ifremer WW3 global wave hindcasts] #56

Open jiarong-wu opened 11 months ago

jiarong-wu commented 11 months ago

Dataset Name

Ifremer WW3 global wave hindcasts

Dataset URL

https://data-dataref.ifremer.fr/ww3/GLOBMULTI_ERA5_GLOBCUR_01/GLOB-30M/

Description

This data set is a 20 year global ocean surface wave hindcast provided by Ifremer (French Research Institute for Exploitation of the Sea) and the documentation is here: https://www.umr-lops.fr/en/Donnees/Vagues/sextant#/metadata/857a3337-f59a-481a-bf98-5561e8b61e7b

Size

About 2.5GB per file 12 months 20 years.

License

https://creativecommons.org/licenses/?lang=en

Data Format

NetCDF

Data Format (other)

No response

Access protocol

HTTP(S)

Source File Organization

There is one subfolder for each year under /ww3/GLOBMULTI_ERA5_GLOBCUR_01/GLOB-30M/. Within the subfolder, the files of interest are located in FIELD_NC/, with each file containing data of one month. For example, /ww3/GLOBMULTI_ERA5_GLOBCUR_01/GLOB-30M/2022/FIELD_NC/LOPS_WW3-GLOB-30M_202201.nc is the 2022 January data.

Example URLs

https://data-dataref.ifremer.fr/ww3/GLOBMULTI_ERA5_GLOBCUR_01/GLOB-30M/2022/FIELD_NC/LOPS_WW3-GLOB-30M_202201.nc

Authorization

No; data are fully public

Transformation / Processing

I'm not sure... maybe lumping them all together (concatenated along the time axis)? If that is not too big to manipulate.

Target Format

Zarr

Comments

I was provided the url to the data set by the authors directly. The data set seems fully public without need for authentication but I'm not sure if downloading in large quantity is fast enough or if it will trigger some warning. (I'm not sure about the Access protocol option.)

jbusecke commented 11 months ago

Thanks for adding this issue @jiarong-wu! I am working on the recipe in #57, could you clarify the following points?

License

https://creativecommons.org/licenses/?lang=en

This points to several licenses. Do you know exactly which one is used for the WW3 product?

https://www.umr-lops.fr/en/Donnees/Vagues/sextant#/metadata/857a3337-f59a-481a-bf98-5561e8b61e7b

This link does not seem to work for me, could you double check on those?

jiarong-wu commented 11 months ago

After double checking, the link to the documentation should work (https://www.umr-lops.fr/en/Donnees/Vagues/sextant#/metadata/857a3337-f59a-481a-bf98-5561e8b61e7b) and the license is Attribution-ShareAlike CC BY-SA.

jbusecke commented 5 months ago

Hi @jiarong-wu I have restarted these efforts over https://github.com/leap-stc/wavewatch3_feedstock. This dataset is still very much posing a problem, but I am chipping away at it. Sorry for the long wait.