Re-define RMSE metric to take sqrt after sample averaging

joeloskarsson commented 7 months ago

The current definition of RMSE averages only spatially over the grid points, then takes the sqrt and averages these per-sample RMSEs for the final value. This choice of definition was made to be consistent with weatherbench(1 and 2). However, in the latest version of weatherbench 2 this has been changed, and the RMSE definition is to average over both spatial points and samples before taking the sqrt. Researching this a bit myself I agree that this is a better definition and the one I think we should use. It is additionally more suitable for future use in computations of Spread-Skill-Ratios.

This PR changes the implementation of the RMSE to match this new definition. Note that to achieve this we can not have RMSE as a metric along with others in metrics..py, but rather have to store MSE in validation and test steps to later average and take the square root.

joeloskarsson commented 7 months ago

@sadamov can you review this change? Will assign you as reviewer, but I had to invite you to the repository first.

joeloskarsson commented 7 months ago

Thanks for looking at it @sadamov. I'll merge now.

mllam / neural-lam

Re-define RMSE metric to take sqrt after sample averaging #10