openclimatefix / Satip

Satip contains the code necessary for retrieving, transforming and storing EUMETSAT data
https://satip.readthedocs.io/
MIT License
41 stars 29 forks source link

India IODC satellite data #223

Open peterdudfield opened 5 months ago

peterdudfield commented 5 months ago

The long range forecast for RUVNL, as well as requiring extended ECMWF data (https://github.com/openclimatefix/nwp-consumer/issues/138) and a new Meteomatics archive, necessitates the collection of Satellite data for India.

[Part of RUVNL]

jacobbieker commented 5 months ago

Processed data is currently being stored here: https://huggingface.co/datasets/openclimatefix/eumetsat-iodc although it will be eventually collated and pushed to the GCP Public Dataset

peterdudfield commented 5 months ago

Do what years fo data have been collected?

jacobbieker commented 5 months ago

No, its currently filling in random timesteps from the whole archive, 2017-now. Once dagster has its external assets tracker thing, it'll be able to know what's on HF and we can more systematically fill in the dataset. It has some data from all years though.

devsjc commented 4 months ago

Currently at 16,000/~175,000.

peterdudfield commented 4 months ago

Could speed it up by removing the europe 15 minutes. 2017 to 2024. Shrink this down to doing 2019 to 2024.

devsjc commented 2 months ago

Seems the vast majority of this is now available on huggingface: https://huggingface.co/datasets/openclimatefix/eumetsat-iodc. This might be enough to mark the task as done @peterdudfield?

Also, for future reference as part of the handover from Jacob I've been instructed to now include this as part of the Google Public Datasets -hosted dataset, as opposed to updating the huggingface-hosted dataset. This is because GCP has faster reads for Zarr and can handle non-zipped yearly Zarr folders as opposed to zipped zarrs per timestep. See https://github.com/openclimatefix/Satip/issues/223