pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
452 stars 165 forks source link

Create ML-based nowcast plugin for pysteps #394

Open ladc opened 1 month ago

ladc commented 1 month ago

Include a deep learning based nowcast function that has the same / comparable interface as the other nowcast functions of pysteps.

ladc commented 1 month ago

For this, the pysteps/nowcasts/interface.py needs to include a new function to discover the new nowcast methods from plugins - similar to importers currently, and postprocessors (cfr ongoing work on pysteps-precip-type).

Loickemajou commented 1 month ago

Good afternoon, everyone,

My name is Loic, I am currently working on integrating the Deep Generative Model of Radar (DGMR) into the pySTEPS framework. I am facing an issue regarding where to implement the preprocessing function for the input frames of the DGMR.

This preprocessing function takes an HDF5 filename as input, converts it into a NumPy array, and crops it to an image of 256 by 256 , which is the format required by the model.

I would greatly appreciate any guidance or suggestions on where to add this preprocessing function within the pySTEPS structure.

Thank you for your help!

dnerini commented 1 month ago

Hi @Loickemajou , very exciting stuff!

I would greatly appreciate any guidance or suggestions on where to add this preprocessing function within the pySTEPS structure.

in my opinion (but feel free to disagree), the preprocessing you described should not be part of your plugin: the function will require the input in the correct format and it'll be up to the user to load the data and prepare it. You just need to make sure that the requirements are well documented (in the docstrings the type hints), the input validated and, if necessary, that a kind error message is raised.

Loickemajou commented 1 month ago

Thank you very much @dnerini for your explanation.

Will try to make it as clear as possible.

Loickemajou commented 1 month ago

Good morning everyone,

I hope you are all doing well.

I have presently completed the DGMR plugin. I would greatly appreciate your guidance on the steps required to register this plugin with pysteps in order to test its functionality.

Additionally, the DGMR model weights are currently not publicly available and must be downloaded using the Google Cloud Shell terminal. You can do this with the following command:

gsutil -m cp -r "gs://dm-nowcasting-example-data/tfhub_snapshots ."

Thank you in advance for your assistance!

dnerini commented 1 month ago

hi @Loickemajou very nicely done!

I have presently completed the DGMR plugin. I would greatly appreciate your guidance on the steps required to register this plugin with pysteps in order to test its functionality.

Plugins use the entry-point specifications, which is something worth reading about in depth before implementing it.

I suggest that you open a PR on the pysteps repo following the same structure as done by @joeycasey87 in #405. This follows the pattern for io plugins, ie discover plugins, which is then called in the init . We also need to document the plugin (Note though that we don't need to provide a cookiecutter in a first phase).

In your case, the plugin will extend the methods in the nowcasts module: https://github.com/pySTEPS/pysteps/blob/master/pysteps/nowcasts/interface.py

Your actual plugin should become a new repo in the pySTEPS organization and should be named something like pysteps-dgmr-nowcast or similar. Here I'm open to suggestions, from others too @ladc @aitaten @RubenImhoff .

I also suggest that you and @joeycasey87 work together to implement the new plugins to make sure that you do it consistently.

Additionally, the DGMR model weights are currently not publicly available and must be downloaded using the Google Cloud Shell terminal. You can do this with the following command:

gsutil -m cp -r "gs://dm-nowcasting-example-data/tfhub_snapshots ."

What do you mean with "not publicly available"? Anyway, I wonder if your plugin should not include the DGMR weights packaged directly with it?

Loickemajou commented 1 month ago

Thanks @dnerini for the detailed procedure.

Will work with @joeycasey87 for a smooth and consistent implimentation of the new plugins.

Concerning the DGMR weights, Will try to work on and figure out if there are possibilities, though the folder is quite heavy (1 GB).

dnerini commented 1 month ago

Concerning the DGMR weights, Will try to work on and figure out if there are possibilities, though the folder is quite heavy (1 GB).

Uh right, good point! Then we need to find an alternative solution.

What about an integration with hugginface? we could upload the DGMR weights into a new pysteps account on huggingface (or reuse an existing one like https://huggingface.co/openclimatefix/dgmr) then use the hf_hub_download function to download the files when needed.

We should also automatically download the model weights the first time the package is launched and then cache the file for future use.

For the cache directory, we could either try to follow the same logic as for the pysteps config file (that is, $HOME/.pysteps/pystepscacheon unix/macos or %USERPROFILE%\pysteps\pystepscache on windows) or directly use https://github.com/platformdirs/platformdirs.

edit: for reference, ECMWF's ai-models needs to deal with a similar challenge, see https://github.com/ecmwf-lab/ai-models?tab=readme-ov-file#assets. It might be worth having a look at their approach too.

Loickemajou commented 1 month ago

Thank you very much @dnerini for your suggestions! I am currently exploring the different options you provided and am in the process of implementing them for testing.

I appreciate your guidance and support, and I will keep you updated on my progress.

Loickemajou commented 1 month ago

Good afternoon

Concerning the DGMR weights, Will try to work on and figure out if there are possibilities, though the folder is quite heavy (1 GB).

Uh right, good point! Then we need to find an alternative solution.

What about an integration with hugginface? we could upload the DGMR weights into a new pysteps account on huggingface (or reuse an existing one like https://huggingface.co/openclimatefix/dgmr) then use the hf_hub_download function to download the files when needed.

We should also automatically download the model weights the first time the package is launched and then cache the file for future use.

For the cache directory, we could either try to follow the same logic as for the pysteps config file (that is, $HOME/.pysteps/pystepscacheon unix/macos or %USERPROFILE%\pysteps\pystepscache on windows) or directly use https://github.com/platformdirs/platformdirs.

edit: for reference, ECMWF's ai-models needs to deal with a similar challenge, see https://github.com/ecmwf-lab/ai-models?tab=readme-ov-file#assets. It might be worth having a look at their approach too.

Good afternoon @dnerini,

I hope this message finds you well.

I wanted to share that I have successfully tested the integration of model weights with Hugging Face by creating a personal account. The model can be downloaded directly for the first time and is subsequently cached for future use. It is stored in the cache directory, which is located at $HOME/.pysteps/pystepscache on Unix/macOS or %USERPROFILE%\pysteps\pystepscache on Windows (in my own case).

Regarding the new PySTEPS account, would it be advisable to transfer ownership to PySTEPS?