mllam / neural-lam

Research Software for Neural Weather Prediction for Limited Area Modeling
https://join.slack.com/t/ml-lam/shared_invite/zt-2t112zvm8-Vt6aBvhX7nYa6Kbj_LkCBQ
MIT License
119 stars 48 forks source link

Multiple Zarr to Rule them All #54

Closed sadamov closed 2 months ago

sadamov commented 5 months ago

Describe your changes

This PR is massive and was heavily discussed in the dev-team. It introduces a shift towards .zarr based data storage and .yaml based interaction with them using config files. The model itself should now be fully domain agnostic. All user input is either defined in aforementioned yaml files or via the train_model.py flags.

Please have a look at #24 to learn more about the main reasoning behind these changes. In short:

New dependencies: xarray, dask, zarr, pandas

BREAKING While this PR tries to implement all functionalities present in main there is one major setup that is not supported anylonger. The training is based on (re)-analysis now and the reforecast based training previously used with e.g. MEPS is not supported. Especially the subsample_step logic has been removed from the code. This functionality can be re-introduced in "zarr-workflow" with a later PR. This will also break the tests!

MISSING The whole handling of the boundary data is only implemented in the dataloading part but not actually used in the model. This will be added in a later PR. Parameter Weights are currently not used in the model since @joeloskarsson added probabilistic loss functions.

Issue Links

Solves #24

File by file changes

Remarks

I did some thorough testing of this code, but given the sheer size of the PR it was not easy. I am open to both in-depth reviews and/or splitting the PR up into multiple smaller ones.

@SimonKamuk the tests don't work anymore and I tried my best to re-implement them for DANRA. But the pooch does not seem to work with Danra. I am always getting access rights issues, unless I access them directly with xr.open_zarr("..."). Could you help me fix them? Currently, your tests are disabled as well as my last test.

That being said, I am pretty proud to say that it is now possible to fully train Neural-LAM on DANRA data (and COSMO btw) and here I want to show some technical results as proof:

This is the graph created by create_mesh.py and the new get_xy() logic: Screenshot from 2024-06-02 14-31-03

This is the validation error_map after very few training steps and a subset of variables defined in data_config.yaml image

This is the resulting map of a 10step * 3hour forecast using the model in --eval "test" mode: t_100_example_1_7_459d7ec4806ef7efc1e2

Please, test this code with your own data over your own domain, this should help confirming that the model is truly domain-agnostic now.

Type of change

Checklist before requesting a review

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

Author checklist after completed review

Checklist for assignee

joeloskarsson commented 5 months ago

This is amazing stuff Simon! :smile: Looking forward to sink my teeth into all of this. I don't mind this being one big PR, as these changes are so interconnected.

I'll do a proper review of this later, but here's a short answer to the question about variable weights for now: I think we really want to keep the option to have a fixed variable weighting. There are in many cases good reasons to want to specify such weights manually rather than letting the model handle the balancing. I liked how it was handled before, with everything converted to standard-deviations that were either output by the model or derived from manual weights. That being said, I don't mind if we get rid of it with this PR and just re-introduce it later. As you rightly point out the logic for how to create and use parameter weights has to be reworked anyhow. We can make a new issue about that, but my initial thoughts would be to have such weights specified in the config yaml, either per-variable or using some generic strategy names (e.g. variable_weighting: uniform or variable_weighting: graphcast/pressure_proportional).

leifdenby commented 5 months ago

Yes, amazing work @sadamov! I will read through this over the next days and then we can may schedule a call to discuss in more detail? There is some of the reasoning in the structure I don't quite follow, but that is probably just because I have thought about some things slightly differently :)

mpvginde commented 5 months ago

HI @sadamov, I was wondering if, with the current implementation here, it possible to specify different levels and/or level types for different upper air variables? For example if I want u and v at 850 and 500 hPa but t at 925, 500 and 250 hPa.
Or more extreme, if I wanted u and v at 500 hPa and 100 m and t at 925 hPa and 150 m.

It seems that your example yaml-file, the levels keyword is defined at the category -level, but maybe it is possible and I need to read the code in more detail.

Kind regards, Michiel

sadamov commented 5 months ago

@mpvginde That is currently not possible. I just discussed it with @leifdenby and this functionality will be added back before the PR is merged. Good point, thanks!

mpvginde commented 5 months ago

Thanks for the response. Let me know if I can help!

sadamov commented 2 months ago

This PR is superceded by #66