xarray-contrib / xskillscore

Metrics for verifying forecasts
https://xskillscore.readthedocs.io/en/stable/
Apache License 2.0
222 stars 38 forks source link

Result of `halfwidth_ci_test` may not correct. #413

Open Tianjie-Wu opened 8 months ago

Tianjie-Wu commented 8 months ago

In document of xskillscore, result of halfwidth_ci_test is described as follow: difference in scores (score(f2) - score(f1))

However, I discovered that this is only partially right. For example, when the metric is RMSE, this method returns the difference score(f2) - score(f1), but when the metric is ME, this function returns the difference score(f1) - score(f2).

Code Sample, a copy-pastable example if possible

length = 100
obs_1d = xarray.DataArray(
       numpy.random.rand(length),
       coords=[
           numpy.arange(length),
       ],
       dims=["time"],
       name='var'
   )
fct_1d = obs_1d.copy()
fct_1d.values = numpy.random.rand(length)

# create a worse forecast with high but different to perfect correlation and choose RMSE as the distance metric
fct_1d_worse = fct_1d.copy()
step = 3
fct_1d_worse[::step] = fct_1d[::step].values + 0.3

# half-with of the confidence interval at level alpha is larger than the RMSE differences,
# therefore not significant
metric = "rmse"
alpha = 0.05
significantly_different, diff, hwci = xskillscore.halfwidth_ci_test(
    fct_1d, fct_1d_worse, obs_1d, metric, time_dim="time", dim=[], alpha=alpha
)
print("RMSE DIFF:",diff.values)
print("RMSE HWCI:",hwci.values)
print(
    f"RMSEs significantly different at level {alpha} : {bool(significantly_different)}"
)

metric = "me"
alpha = 0.05
significantly_different, diff, hwci = xskillscore.halfwidth_ci_test(
    fct_1d, fct_1d_worse, obs_1d, metric, time_dim="time", dim=[], alpha=alpha
)
print("ME DIFF:",diff.values)
print("ME HWCI:",hwci.values)
print(
    f"MEs significantly different at level {alpha} : {bool(significantly_different)}"
)

print ( "RMSE 2,1: ",xskillscore.rmse(fct_1d_worse,obs_1d, dim=[]).values.mean(),xskillscore.rmse(fct_1d,obs_1d, dim=[]).values.mean() )
print ( "RMSE DIFF 2-1:", (xskillscore.rmse(fct_1d_worse,obs_1d, dim=[]).values-xskillscore.rmse(fct_1d,obs_1d, dim=[]).values).mean() )
print ( "ME 2,1: ",(fct_1d_worse-obs_1d).mean().values,(fct_1d-obs_1d).mean().values )
print ( "ME DIFF 2-1: ",((fct_1d_worse-obs_1d).values-(fct_1d-obs_1d).values).mean() )

print ("xskillscore version:",xskillscore.__version__)

the result is listed as follow:

RMSE DIFF: 0.002145380792967111
RMSE HWCI: 0.03084942672029623
RMSEs significantly different at level 0.05 : False
ME DIFF: -0.10199999999999998
ME HWCI: 0.027772820244038265
MEs significantly different at level 0.05 : True
RMSE 2,1:  0.35575854107274524 0.3536131602797781
RMSE DIFF 2-1: 0.0021453807929670793
ME 2,1:  0.02998835594288012 -0.07201164405711986
ME DIFF 2-1:  0.102
xskillscore version: 0.0.24
aaronspring commented 8 months ago

Hi @Tianjie-Wu, thanks for reporting this. Could you specify a bit more what you think should change? Just the documentation or do you think the computation is wrong somewhere?

Tianjie-Wu commented 8 months ago

Hi @Tianjie-Wu, thanks for reporting this. Could you specify a bit more what you think should change? Just the documentation or do you think the computation is wrong somewhere?

Hello @aaronspring , thank you for your response. I believe that just mentioning it in the manual should be sufficient, considering the problem is simple to solve (just multiply by -1).

aaronspring commented 8 months ago

Feel free to open a PR

Tianjie-Wu commented 7 months ago

Feel free to open a PR

thanks