The current definition of RMSE averages only spatially over the grid points, then takes the sqrt and averages these per-sample RMSEs for the final value. This choice of definition was made to be consistent with weatherbench(1 and 2). However, in the latest version of weatherbench 2 this has been changed, and the RMSE definition is to average over both spatial points and samples before taking the sqrt. Researching this a bit myself I agree that this is a better definition and the one I think we should use. It is additionally more suitable for future use in computations of Spread-Skill-Ratios.
This PR changes the implementation of the RMSE to match this new definition. Note that to achieve this we can not have RMSE as a metric along with others in metrics..py, but rather have to store MSE in validation and test steps to later average and take the square root.
The current definition of RMSE averages only spatially over the grid points, then takes the sqrt and averages these per-sample RMSEs for the final value. This choice of definition was made to be consistent with weatherbench(1 and 2). However, in the latest version of weatherbench 2 this has been changed, and the RMSE definition is to average over both spatial points and samples before taking the sqrt. Researching this a bit myself I agree that this is a better definition and the one I think we should use. It is additionally more suitable for future use in computations of Spread-Skill-Ratios.
This PR changes the implementation of the RMSE to match this new definition. Note that to achieve this we can not have RMSE as a metric along with others in
metrics..py
, but rather have to store MSE in validation and test steps to later average and take the square root.