mllam / neural-lam

Neural Weather Prediction for Limited Area Modeling
MIT License
64 stars 24 forks source link

Pytorch Dataset Class that Reads From Zarr Archive #24

Open sadamov opened 1 month ago

sadamov commented 1 month ago

Summary Since the weather community and especially ECMWF moved towards a single zarr archive that contains all the data in the state (domain), and one that contains all the data in the boundary, this project should follow the same approach. Zarr has many advantages like parallel computing with dask, lazy loading with xarray, efficient compression with different algorithms and chunking.

Specifics There are three main data-processing steps happening in the current pipeline. This is a proposal how the work would be split between the three:

Interfaces

Implementation One example for such a pytorch dataset and dataloader can be found here for inpisration: https://github.com/MeteoSwiss/neural-lam/blob/main/neural_lam/weather_dataset.py It needs however quite a bit of work:

Draw IO dataset

joeloskarsson commented 1 month ago

Could very nicely use https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#on-after-batch-transfer to normalize once data is on GPU. Makes sure that you never forget about it (all batches on GPU are normalized).