Closed RubenImhoff closed 3 months ago
Attention: Patch coverage is 98.33333%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 83.81%. Comparing base (
d77fe73
) to head (a46e62a
). Report is 1 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
pysteps/postprocessing/probmatching.py | 92.30% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Apologies for the ugly plots, but this was the quick and dirty way to test three versions:
This is a blended forecast up to 12 hours ahead (1 member) for 16 July 2024 in the Netherlands. The left column shows a forecast without the resampling and also no smoothing as introduced by @sidekock yesterday. The middle column does both the smoothing and resampling prior to the probability matching and the right column is a check with only smoothing and no resampling. Overall, I would say that the smoothing already makes a differences on the edges of the radar domain (@sidekock gave a better example in his PR), but the resampling clearly results in less suppression of the extremes in the NWP (you kind of see a dissapation for the longer lead times in the left and right columns, but not in the middle one). I think this is what we are looking for, although one example is of course a bit limited.
I think the PR is ready to go, but definitely feel free to give it a last thorough review. :)
Looks good like this to me, @dnerini, thanks for the help!
@ladc @RubenImhoff @sidekock the effect of this new approach seems to be fairly strong, see for example the example in the gallery:
v1.10 | latest |
---|---|
Isn't that what we were looking for? I think the problem in left example (60, 90 minutes) is that the mismatching precipitation bring lower precipitation values than the actual distributions either from NWC or NWP. And these resampling guarantees a similar distribution. In the right example, the values look similar in all the lead-times whereas in the left it is clear that the smoothing (lost of high values) is happening in the middle lead-times where I am assuming the weight it is closer to 0,5. Just let me know what you think :smile:
totally agree with you, just wanted to make it clear that this has a significant impact on the output.
Thanks for sharing, @dnerini! Based on the input radar data (see below), I'd agree that the changes seem to improve the distribution. Nevertheless, would be nice to see some more examples at some point (I'm sure some will come up from both KNMI and RMI soon).
Yes, I agree that this is actually better since you don't lose the extremes as much. Thanks for the examples. It might also fix a problem I had with the control member but I still need to run some tests, sorry for the delay.
Applying it on the sample dataset for the hackathon I get a strange result, but maybe I'm using a bad combination of parameter settings. Maybe because the NWP intensities are so high, the nowcast intensity increases immediately at timestep 1 (ens 0 and mean are the images on the top right and bottom left, respectively). Any idea what's going on?
Top left is the "control run" by the way.
Hi @ladc, Thanks for sharing this example! Did you also turn on the smoothing by @sidekock? I think the top right and bottom left do, in principle, look quite good, but I agree that it is weird to have such an intensity jump. This almost gives the feeling that the weight for NWP compared to the nowcast is too high, but that should go well, I think..
Something else that could play a role is the following in the blending/steps.py: code
if probmatching_method is not None and resample_distribution:
# deal with missing values
arr1 = R_pm_ep[t_index]
arr2 = precip_models_pm_temp[j]
arr2 = np.where(np.isnan(arr2), np.nanmin(arr2), arr2)
arr1 = np.where(np.isnan(arr1), arr2, arr1)
# resample weights based on cascade level 2
R_pm_resampled = probmatching.resample_distributions(
first_array=arr1,
second_array=arr2,
probability_first_array=weights_pm_normalized[0],
)
We fill up the 'outside radar domain' extrapolation values with the NWP values to have no nans (which I initially thought was a good idea). But maybe that skews the distribution in the wrong direction? So maybe we should change that back to actually allowing for nans in the extrapolation component?
@ladc, would the above be worth the test for your case?
Thanks @RubenImhoff. I didn't turn on the smoothing, this was on purpose. The main issues for me is indeed the intensity jump. Could this be because the NWP has very high intensities, but as these are not present in the blended forecast, the latter is unrealistically scaled up overall? Do you suggest not filling anything outside the radar mask (so we have nans in the final forecast)? Or just for the resampling?
I think just the resampling for now. This would at least keep the radar distribution lower in a case like this (because you push the distribution up by filling the nans with the values of the NWP forecast outside the radar domain).
Hi! Sorry to jump late for this discussion. I have a small question; how is it looking before doing the resampling/cdf? Because I am not sure if the problem is coming from the resample or from the merging of the cascades with different weights in different areas? @ladc , I did run your example data; maybe you could update or send me the snippet for recreating the figure so I can have a more detail look. :)
Hi @aitaten you can find the plot_nwc.py and plot_radar.py script here - apologies for the messy code: https://github.com/ladc/rmi-scripts
I tried not filling the nan values (or rather not including them in the pmatching) - but I was a bit rushed so it's possible that the code contains errors:
Warning: I will be fully offline in the coming week and still on vacation until 14 August.
The code is in my branch mentioned in #370
For me it is not working either ... I had to change a couple of things in the merged code (#370 ) because noise = None
was throwing an error and not creating any output. After, making sure the noise was 0, I obtained (as expected) the smoothing of the unpredictable scales. Not sure how @ladc you are making it without losing the small scales :disappointed: . Here my example:
@ladc, your second attempt looks already better to me! Should I change the code (back) to accepting nans outside the radar domain? Then we're good to go, I hope.
And seeing the vacation of @ladc, what do the others think?
And seeing the vacation of @ladc, what do the others think?
one remaining issue might be that using the weights from level 2 gives too much credit to NWP already from the first lead time, while you would expect a more progressive transition from the radar distribution...
Based on the figure we made for the QJRMS paper, that should not be too much of an issue at that scale level:
(the lead time is in hours going from 0 to 12 hr)
@dnerini, any idea on how to nicely fix this? Right now the nans outside the radar domain are filled with the NWP values, which changes the distribution of the extrapolation cascade data. We could go back to our earlier version where nans were allowed, or maybe we can think of a more elegant solution to handle the nans outside the radar domain without changing the distribution?
I wonder if we could simply "blend in" the NWP values outside the radar domain, also improving the seamlessness at t0. In practice we could try replacing the radar mask with zeros and see if that is enough. Alternatively, one could treat the part outside the radar domain separately and use a simple blending function to introduce the NWP values. Both approaches should avoid the jump at t0 while the resampling approach can stay the same. What do you think?
That would be worth the try. Would we introduce any statistical problems in the resampling if we introduce zeros (or np.nanmin(precip_radar)
) in the extrapolation cascade outside the radar domain? If not, that might be an easy solution while keeping the code nearly the same.
First attempt to add a resample distributions function for probability matching, see also issue #353.
postprocessing/probmatching.py
.R_pm_blended
variable inblending/steps.py
.probmatching_method
iscdf
ormean
inblending/steps.py
, add resampling prior to using the probability matching as an optional functionality.