Open JackKelly opened 2 years ago
Just skim-read through this paper, it seems really relevant and has some of the same insights as in the graph paper! As in, the finer grained data, and more variables allows the model to learn more of the physics, and give better results. The code doesn't seem to be public, unfortunately, but the building blocks of the model seem to be. The ERA5 dataset seems like the one that most of these papers tend to use, it gives more years of reforecasts, and better temporal resolution, although both this paper and the graph paper only select every 3rd or 6th hour of data.
Adding the precipitation as a separate model to train on the output of the other variables seems like a cool idea to deal with the problems of most precipitation being zero. They also used a log scale to help with that issue.
The large-scale ensembles exactly track with what we are hoping to do with this model, and I think validates that idea for this too.
Overall, really cool paper! Might be worth trying to reimplement it and see how it compares to the graph model, it seems like this model is slightly slower, and potentially larger? In either case, we have 6-hourly analysis files we could train on which are 0.25 degrees and have around 288 variables, so much larger than either of the inptus to these models, as well as the forecasts which have fewer variables, but more timesteps.
This paper has a preliminary release of code here: https://github.com/NVlabs/FourCastNet
Personally I find this paper much less relevant than the authors claim it to be, owing largely due to:
Massive computational requirements making it near impossible to verify results, authors claim 16hr training time across a cluster of 64 A100's. The only way to get access to that kind of hardware would be to be granted some kind of special project access to a HPC centre - and why would anybody waste precious HPC time trying to get poorly (in its current form) maintained code to run.
Relatively small out-of-sample testing period. The authors claim:
"The training dataset consists of data from the year 1979 to 2015 (both included). The validation dataset contains data from the years 2016 and 2017. The out-of-sample testing dataset consists of the years 2018 and beyond."
It is not made explicit as far as I can see exactly what years are included but I would assume it would be 2021 inclusive as that is the last complete year. This seems, in my opinion, too small a period to make any kind of generalised performance claim.
I am not entirely against research which utilises computational resources beyond the means of most researchers in the field but I am against tiny out of sample validation periods, biased performance reporting and no proper public code releases!
Another interesting-looking paper!
https://arxiv.org/abs/2202.11214
To quote the abstract: