The current implementation of rank histograms is not optimal if we set the threshold X_min at higher values (e.g. 10 mm/h).
In such cases, the condition for ignoring pairs of observations and forecasts is not enough restrictive.
This is especially visible when all the M ensemble members except one are equal to 0. If the observation is 0, it is randomly assigned in the first M-1 bins. If the observation is larger than the only ensemble member that is different from 0 (which occurs often), it is added to the M+1 bin. The probability of being in the Mth bin is therefore very low. In addition, the histogram is flat for all bins up to M-1 (due to random assignment), which is a bit misleading.
I am wondering how this effect of random assignment is also impacting the rank histograms for lower values of the X_min threshold.
https://github.com/pySTEPS/pysteps/blob/40572e2465675c95fda689f9107739a4771967a8/pysteps/verification/ensscores.py#L167
The current implementation of rank histograms is not optimal if we set the threshold X_min at higher values (e.g. 10 mm/h). In such cases, the condition for ignoring pairs of observations and forecasts is not enough restrictive. This is especially visible when all the M ensemble members except one are equal to 0. If the observation is 0, it is randomly assigned in the first M-1 bins. If the observation is larger than the only ensemble member that is different from 0 (which occurs often), it is added to the M+1 bin. The probability of being in the Mth bin is therefore very low. In addition, the histogram is flat for all bins up to M-1 (due to random assignment), which is a bit misleading. I am wondering how this effect of random assignment is also impacting the rank histograms for lower values of the X_min threshold.