pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
466 stars 168 forks source link

Main module blending #233

Closed RubenImhoff closed 2 years ago

RubenImhoff commented 3 years ago

First 'steps' towards a main module for a blended nowcasts-NWP forecast following the original STEPS approach as described in Bowler et al. (2006) and Seed et al. (2013).

Part of issue #212

ladc commented 3 years ago

Question: the V_models has no time dimension as it represents the velocity field of the NWP precipitation forecast at the valid time = the start of the nowcast (taking into account the last 3 rainfall fields from NWP). I wonder if it would be feasible to have a velocity field per lead time instead. This can be computed ahead of time from the NWP forecast. By the way, @pulkkins, is there anything we need to take into account when applying an optical flow algorithm on NWP output (5-minute accumulations or longer) instead of on the near-instantaneous precipitation intensities measured by radar?

codecov[bot] commented 3 years ago

Codecov Report

Merging #233 (d1b1ac4) into steps_blending (fa1f816) will decrease coverage by 1.03%. The diff coverage is 85.66%.

:exclamation: Current head d1b1ac4 differs from pull request most recent head dddf02e. Consider uploading reports for the commit dddf02e to get more accurate results Impacted file tree graph

@@                Coverage Diff                 @@
##           steps_blending     #233      +/-   ##
==================================================
- Coverage           75.67%   74.64%   -1.04%     
==================================================
  Files                 123      157      +34     
  Lines                9271    12075    +2804     
==================================================
+ Hits                 7016     9013    +1997     
- Misses               2255     3062     +807     
Flag Coverage Δ
unit_tests 74.64% <85.66%> (-1.04%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pysteps/cascade/bandpass_filters.py 80.00% <ø> (-12.14%) :arrow_down:
pysteps/cascade/decomposition.py 90.21% <ø> (+0.10%) :arrow_up:
pysteps/cascade/interface.py 92.85% <ø> (+0.54%) :arrow_up:
pysteps/datasets.py 20.23% <ø> (-18.09%) :arrow_down:
pysteps/downscaling/rainfarm.py 100.00% <ø> (ø)
pysteps/exceptions.py 100.00% <ø> (ø)
pysteps/extrapolation/__init__.py 100.00% <ø> (ø)
pysteps/feature/interface.py 93.33% <ø> (ø)
pysteps/feature/shitomasi.py 86.11% <0.00%> (+2.32%) :arrow_up:
pysteps/io/archive.py 82.22% <ø> (+0.40%) :arrow_up:
... and 206 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e9059cb...dddf02e. Read the comment docs.

RubenImhoff commented 3 years ago

There are still some issues left, as also discussed in #233 and on the Slack page, but let's start the review, as this will surely lead to improvements (and mistakes that will be found).

RubenImhoff commented 3 years ago

The output of the blended forecasts is still too noisy. This could be partly explained by the high weight the noise component gets in the implementation by Bowler et al. (2006). There may also be an error in the code, and the masking and prob_matching has to be checked as well. Hence, I actually expect that the noisy output is a result from both. Let's see what we can find (and improve). Below two example forecasts with the RMI data where the first figure is the blended forecast for lead time +5 min and the second figure is the STEPS nowcast for the same lead time. Blended_forecast_0 Blended_forecast_0

The weights for the three components (nowcast, NWP and noise) were as follows: image , whereas the STEPS nowcast gave the following weights for the noise extrapolation and noise component: image. (by the way, the first steps of the blended forecast lead to the seem correlation- and phi-values as the STEPS nowcast - so the problems occur during the blending procedure). Hence, quite a high weight for the noise component in the blended forecast.

For two different dates, below the CRPS for the STEPS nowcast and the blended forecast (5-min time step, just 2 ensemble members, 8 cascade levels, mask_method is incremental and prob_matching method is cdf). As you can see, the CRPS of the blended forecast is not that bad compared to the nowcast. Skill_blended_forecast Skill_blended_forecast

RubenImhoff commented 3 years ago

Part of the problem found, I think. The cascade was in our implementation renormalized after blending the three components, which is not necessary due to the mean and sigma of the normalization which also get updated during this blending step. The skill scores follow soon. Blended_forecast_0_norenormalization

It still leaves a somewhat noisy forecast, but next step is to find any possible other issues. I will have a look at the weights implementation and the effect of the masking and prob_matching implementation next.

RubenImhoff commented 3 years ago

Updates skill plot after removal of renormalization: Skill_blended_forecast Skill_blended_forecast

RubenImhoff commented 3 years ago

Regarding the weights, for now the method described in Bowler et al. (2006, WRR) is implemented. The sum of the weights can exceed the value of 1.0, which is suboptimal: image

The noise weight (eq. 13) is the root of the residual of the sum of the squared nwp and nowcast weights. Taking the residual of the squared weights, is perhaps not odd seeing that we are using correlations (and covariances in that sense) to calculate the weights. However, wouldn't it make more sense to use the squared weights as weights (instead of taking the root)? In that case, the sum of the (squared) weights is always 1.0 and the noise term does not 'explode'.

(By the way, the effect of doing that is rather minor on the results, but it reduces the noisy character a bit)

ladc commented 3 years ago

Great, this is starting to look pretty good! Just thinking out loud here: the squared weights are smaller than the weights (since the weights are all smaller than 1), so scaling the contributions by the squared weights instead of the weights would reduce the overall intensity, reduce the relative importance of both NWP and noise in the beginning, and reduce that of the radar extrapolation at the end. It would also make the blending more abrupt. Not sure if that is what we want? I guess just try it out to see what it gives. It does still bother me that with the current weights, we potentially have a sum of the weights of NWP and radar extrapolation larger than 1, which could artificially increase intensities above the observed level; that's what the Seed et al implementation aims to avoid I guess.

dnerini commented 3 years ago

Very nice, @RubenImhoff !

The figure above looks much better now. Apart from the noisy structure, we can still see some artifacts that might be linked to masking and prob_matching. Can you briefly explain what you had to change in these two components?

Unfortunately I haven't had the time yet to look into this into detail, but I hope to do it soon. For the more fundamental questions, what about asking Alan directly? I'm sure he would have some interesting answers to our concerns...

Otherwise, I'm with Lesley: let's just try out and see if it helps!

loforest commented 3 years ago

Hi @RubenImhoff. Thanks for sharing these results! I'll try to give my contribution to understanding the problem.

The fact that weights do not sum to 1 is not a formula error, but a desired feature. I had asked the same question to Alan many years ago and he had answered that it is needed to ensure that we do not lose variance.

The noise term receives too much weight probably because the radar extrapolation does not receive enough. In Bowler et al. (2006) after Eq. 15, it is said that re is calculated from the AR-2 parameters. How do you calculate it? For AR1, it is simple re=phi1. For AR2 it is a bit more complicated. You can find useful equations here: pysteps.timeseries.autoregression. It can probably be derived also from phi0. Could check with Lesley the STEPS C++ implementation?

Another minor issue that I noticed is that both nowcast and NWP weights are not monotonically decreasing as a function of scale. I do not remember if it was also the case in the original STEPS implementation or if there was an adjustment for this, e.g. similar to the adjustment of AR2 parameters to make the process stationary (see functions adjust_lag2_corrcoef1 and adjust_lag2_corrcoef2 in the module pysteps.timeseries.autoregression). Are you applying these adjustments before computing the weights?

RubenImhoff commented 3 years ago

@dnerini, coming back to your question about the masking and prob_matching implementation and changes.

I also think that some of the artifacts result from that. When going through the code, I found a small error (as usual haha - I'm testing the changes now), so let's see what that brings. Anyway, we've adjusted the implementation in the following way:

In the pySTEPS nowcasting implementation, the advection of the rainfall fields took place after the post-processing steps, such as masking and prob_matching. In the blending procedure, however, this is not possible, as the extrapolation cascades are first advected before blending. After the blending step of the different components, the post-processing steps can take place. This leads to the problem that the precip fields are no longer in Lagrangian coordinates (and the NWP component never was) and thus that the original implementation of masking and prob_matching will not work.

Therefore, @ladc and I decided to test a method that (kind of) does the opposite. Instead of using the latest radar rainfall field for masking and prob_matching when having the forecast in Lagrangian coordinates, we advect the latest radar rainfall field with the same velocity field as used for the extrapolation component and blend this with the NWP forecast. The difference with the extrapolation component is that no field evolution has taken place and no noise is added. Hence, this gives sort of a benchmark from what @ladc called Lagrangian blended probability matching in issue #212. This 'benchmark' is used as observation in the masking and prob_matching methods.

Finally, the prob_matching methods have not changed. Masking has partly: we no longer use the S-PROG masking and the incremental mask had to be adjusted as well. The incremental mask is not optimally working, yet. In the original implementation, the incremental mask has a buffer that increases over time, but that is in the current blending implementation not yet the case. Hence, a critical look on that and all suggestions are more than welcome!

RubenImhoff commented 3 years ago

@loforest and @dnerini, thanks a lot for your inputs so far!

Regarding @loforest's comments, the calculation of r_e is described in the technical report of Bowler et al. (2004):

image

We have implemented the same, see also my response to @dnerini's comment.

Regarding the implementation of e.g. adjust_lag2_corrcoef2, thanks for mentioning this! It is applied to the extrapolation component, in the same way as in the pysteps.nowcasts.steps code, but not to the NWP component. I'll have to look into that.

RubenImhoff commented 3 years ago

Two other points:

RubenImhoff commented 3 years ago

By advecting the noise cascade as well, I think we're one step closer again: Blended_forecast_0 Skill_blended_forecast

However, Bowler et al. (2006) state that: "The advection of the noise cascade ensures the correct spatio-temporal development of the precipitation pattern, but does pose a practical problem. Upon advection of the cascade, some data will be lost to numerical diffusion and some will be lost by advection out of the domain. Therefore, the cascade must be refiltered to ensure that power is not lost at the lowest levels of the cascade. However, the Fourier filtering process cannot use the no-data areas which have been advected into the domain from the boundaries. These areas are replaced by Gaussian noise N(0, 1) before filtering. In this way, the new data may be injected from the boundaries of the domain, and the noise cascade will always contain valid data at every point. However, the introduction of noise may affect the data values in the cascade in a non-local manner (at large scales the cascade may be slightly adjusted over the whole domain). However, the new data are likely to be swamped by the data from the previous time step, so this should not pose a problem."

I have not yet found how to implement this refiltering. Any ideas about this from your side?

In addition, @ladc mentioned that the start time in the RMI NWP file was 2 hours shifted. After correction, it leads to even better (visual) results, so perhaps also good to show (note, we're looking at the nowcast issued two hours earlier now):

Blended_forecast_0 Skill_blended_forecast

ladc commented 3 years ago

Hmm, I see the problem. Is it possible that by not advecting the noise cascade, you leave the noise contribution behind and generate new noise in the new locations, resulting in extrapolation + spatially filtered but temporally white noise instead of extrapolation + temporally red / Brownian noise? It does indeed seem like the noise cascade should be advected, I'm not sure why we decided not to. I need to look at the code but it will be for after Nov 15. Let's hope the CRPS improved with the corrected time offset in the NWP output here ;-)

dnerini commented 3 years ago

Quick thought: would it be possible to use the same idea as in the nowcasts.steps method (keep Lagrangian frame, advect everything in the end) by simply transforming the nwp data into Lagrangian coordinates?

Would it help?

I also wonder if the refiltering problem is relevant to us, since numerical diffusion is limited in the semilagrangian advection scheme. Did they use a different one in Bowler et al 2006?

RubenImhoff commented 3 years ago

I recently thought of that too, @dnerini. It will be quite a change - also from the original concept - but maybe worth it. Let's see what the others think (perhaps we're missing something obvious why not to do that..) and then I might give it a try soon.

ladc commented 3 years ago

I think it would help initially, but the problem is that after more than 2 hours and definitely in case of fast-moving precipitation zones, you will lose your data for the same reason that your extrapolation becomes unusable: the original radar image is advected out of the domain. Conversely, considering your NWP data in Lagrangian space advects it out of the domain in the other direction. Ideally, the blending should still produce sensible results after 2+ hours and if I understand correctly that will not work in Lagrangian coordinates. But maybe I'm missing something too?

RubenImhoff commented 3 years ago

Valid point! Hmm, I can indeed imagine that going wrong with the NWP forecast..

dnerini commented 3 years ago

mmmmh right we would loose the advantage of blending with respect to boundaries! Forget what I said then ☺️

But I still think that the refiltering step might be obsolete in our case. So we are left with the problem of the boundaries when advecting the noise cascade.

Another idea: add a buffer to the noise cascade before advection? By using a larger domain we would at least limit the problem.

Another idea: with noise we could also simply "mirror" the boundaries: what goes out from one side, goes in from the opposite one... In practice this could be implemented as an option in the advection scheme. What do you think @spulkkin ?

RubenImhoff commented 2 years ago

@dnerini (and @pulkkins), after some tests, I think I agree with you. Indeed, we are left with the problem of the boundaries when advecting the noise cascade. I do like the idea of mirroring the boundaries, perhaps worth the try?

dnerini commented 2 years ago

Very good, I also think this currently looks like the most promising approach. I'm not sure exactly how yet (need to look at the details in the code), but I feel there should be an elegant solution to this, something like wrapping around the indices of the advection routine when these exceed the size of the domain, if you see what I mean? @aperezhortal might have some good inputs on this too

edit: I suggest we tackle this issue in a separate branch and PR if you agree

dnerini commented 2 years ago

@RubenImhoff I just realized that the routine map_coordinates used in semilagrangian already offers a mode="grid-wrap" option, which sounds very much like what we have in mind... maybe you could have a try at it? If we are lucky, this might be as simple as changing few optional keyword arguments ....

RubenImhoff commented 2 years ago

Good to know, I'll have a look at it!

RubenImhoff commented 2 years ago

@dnerini, this seems to work well! "grid-wrap" did not work, but "wrap" does. I do not know why the "grid-..." options do not work, but it must be based on the input we're providing (I have to admit that my scipy map_coordinates knowledge is not sufficient for that, haha).

Anyway, the results go from (lead time is 3h; zoomed in on the top of the composite, where no new noise was coming in at the top right corner - movement is south/southwest): image

to: image

I'll do some more tests to see if everything works the way we want (also for less symmetrical rainfall fields) and then commit the changes here.

dnerini commented 2 years ago

@RubenImhoff when you have some time would you be able to produce an animation for one full forecast? You can use pysteps' animate() method to produce all frames and then manually make a gif out of them. Sometimes the eye can catch better than anything else if there are problems ...

ladc commented 2 years ago

Great @RubenImhoff! Does it improve the scores in any way or is the effect too far outside the composite to be verified?

RubenImhoff commented 2 years ago

Hi @dnerini and @ladc,

The last addition, mirroring the noise, does not change the skill scores, because it is all outside the radar domain. I can imagine that there where the radar domain is closer to the edges of the extent (or for other radar domains), the effect is more pronounced.

Below an animation of one full forecast. Test_blended_nowcast_202107041405

The radar domain edge, where the blended nowcast changes into a blended NWP-noise forecast, remains visible. I'm thinking of a slightly smoother transition, maybe a simple linear transition in space from the extrapolation-NWP-noise forecast to NWP-noise forecast? Do you think something is needed for this and do you have any ideas for it?

dnerini commented 2 years ago

Thanks @RubenImhoff, fantastic work!

Personally I don't mind too much the discontinuity you mention, it's anyway being smoothed out by the noise itself with time. Instead, what do you think are the next important steps? Is there any important component that is still missing? And what about the performance of the code? Do you plan to conduct a more in-depth verification analysis with multiple events or is it too early?

RubenImhoff commented 2 years ago

Thanks, @dnerini!

It's true that this discontinuity is not too worrying. Plus your region of interest is likely within the radar domain, so you might not even notice it. Anyway, it would be more something from an aesthetics perspective.

The last step left is to implement the weights calculation as in Seed et al. (2013), based on the covariance matrix. I will try to do this the coming days. I think that after that, we're ready for a good review of the code. Once that has finished, we are planning to, indeed, conduct a more in-depth hydrometeorological verification analysis with multiple events for one (or a couple of) catchments.

pulkkins commented 2 years ago

Question: the V_models has no time dimension as it represents the velocity field of the NWP precipitation forecast at the valid time = the start of the nowcast (taking into account the last 3 rainfall fields from NWP). I wonder if it would be feasible to have a velocity field per lead time instead. This can be computed ahead of time from the NWP forecast. By the way, @pulkkins, is there anything we need to take into account when applying an optical flow algorithm on NWP output (5-minute accumulations or longer) instead of on the near-instantaneous precipitation intensities measured by radar?

I do not see any major issues. However, the default thresholds need to be adjusted because I expect the NWP output to be smoother (and have smaller gradients) than radar measurements.

pulkkins commented 2 years ago

mmmmh right we would loose the advantage of blending with respect to boundaries! Forget what I said then relaxed

But I still think that the refiltering step might be obsolete in our case. So we are left with the problem of the boundaries when advecting the noise cascade.

Another idea: add a buffer to the noise cascade before advection? By using a larger domain we would at least limit the problem.

Another idea: with noise we could also simply "mirror" the boundaries: what goes out from one side, goes in from the opposite one... In practice this could be implemented as an option in the advection scheme. What do you think @spulkkin ?

When applying mirroring, you assuming continuity when wrapping over the border. I would not do that. Adding a buffer to the noise cascade sounds like a good idea.

RubenImhoff commented 2 years ago

@pulkkins, thanks for your comments!

Regarding the adjustment of the thresholds for the NWP models, I'll put that on my list. Makes sense to have a look at that.

Regarding the noise mirroring, do you know if this 'buffer' functionality is in some way already part of scipy map_coordinates?

dnerini commented 2 years ago

mmmmh right we would loose the advantage of blending with respect to boundaries! Forget what I said then relaxed But I still think that the refiltering step might be obsolete in our case. So we are left with the problem of the boundaries when advecting the noise cascade. Another idea: add a buffer to the noise cascade before advection? By using a larger domain we would at least limit the problem. Another idea: with noise we could also simply "mirror" the boundaries: what goes out from one side, goes in from the opposite one... In practice this could be implemented as an option in the advection scheme. What do you think @spulkkin ?

When applying mirroring, you assuming continuity when wrapping over the border. I would not do that. Adding a buffer to the noise cascade sounds like a good idea.

I see your point, but I still think that in this particular case the approach would be justified since it applies to noise only and mostly to the edges of the domain... Adding a buffer would be more expensive computationally and be more involved code-wise.

pulkkins commented 2 years ago

@pulkkins, thanks for your comments!

Regarding the adjustment of the thresholds for the NWP models, I'll put that on my list. Makes sense to have a look at that.

Regarding the noise mirroring, do you know if this 'buffer' functionality is in some way already part of scipy map_coordinates?

I don't think that the buffer functionality we need is implemented in scipy.map_coordinates. What I was thinking is to generate the noise field in a larger domain and crop it to the original domain after advection. But as @dnerini said, that could be tricky to implement.

RubenImhoff commented 2 years ago

@pulkkins, thanks for your comments! Regarding the adjustment of the thresholds for the NWP models, I'll put that on my list. Makes sense to have a look at that. Regarding the noise mirroring, do you know if this 'buffer' functionality is in some way already part of scipy map_coordinates?

I don't think that the buffer functionality we need is implemented in scipy.map_coordinates. What I was thinking is to generate the noise field in a larger domain and crop it to the original domain after advection. But as @dnerini said, that could be tricky to implement.

Clear, also thanks for your input, @dnerini. Would it be an idea to stick to the mirroring for now, as this functionality is only for outside the domain (unless your radar domain covers or almost cover the entire extent..), and make an issue for further improvement of this?

dnerini commented 2 years ago

yes, if @pulkkins agress, I think we can leave the current approach and focus on the remaining issues.

Il giorno gio 11 nov 2021 alle ore 08:21 Ruben Imhoff < @.***> ha scritto:

@pulkkins https://github.com/pulkkins, thanks for your comments! Regarding the adjustment of the thresholds for the NWP models, I'll put that on my list. Makes sense to have a look at that. Regarding the noise mirroring, do you know if this 'buffer' functionality is in some way already part of scipy map_coordinates?

I don't think that the buffer functionality we need is implemented in scipy.map_coordinates. What I was thinking is to generate the noise field in a larger domain and crop it to the original domain after advection. But as @dnerini https://github.com/dnerini said, that could be tricky to implement.

Clear, also thanks for your input, @dnerini https://github.com/dnerini. Would it be an idea to stick to the mirroring for now, as this functionality is only for outside the domain (unless you're radar domain covers or almost cover the entire extent..), and make an issue for further improvement of this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pySTEPS/pysteps/pull/233#issuecomment-966051607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3J3YYXCDIRSKA6RVHS5BLULNVI5ANCNFSM5CWJDUZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Daniele

RubenImhoff commented 2 years ago

After some consultation with @cvelascof and also Alan Seed about how the weights in Seed et al. (2013, WRR) were implemented, I've made a first try to implement it in the blending code as well. So far so good, it seems! Below for two runs (the same ones as you've seen earlier in the conversation) the CRPS.

The implementation of the weights method is as such that the user can choose which one to use (BPS2006 or SPN2013), with for now the SPN2013 method as default.

Note that the implementation does not exactly follow the SPN2013 paper, as the notation in that paper turned out to be partly erroneous. However, we've decided to continue with the normalized variance-covariance matrix, filled with the cross-correlations between the models.

Skill_blended_forecast

Skill_blended_forecast

dnerini commented 2 years ago

Hi @RubenImhoff , sorry for the late feedback... It's really exciting what you achieved! So this now means that you managed to include all the features needed for this module and we should get started on code review? Or are you planning to run a validation analysis first with some more precipitation cases?

RubenImhoff commented 2 years ago

Thanks, @dnerini! Yes, I'm working on some tests (call it a small validation) to also double check that all functionalities work. After that, I would say, it is time for the code review!

RubenImhoff commented 2 years ago

I think we're ready for the code review! :)

dnerini commented 2 years ago

Very good, @RubenImhoff, let's have a look then !

Just a comment for you and @ladc: I fear that the original plan with pysteps v2 will take some more time than expected, in particular for the refactoring with xarray and the object-oriented approach for forecasting methods.

To avoid postponing for too long the release of the blending module, one option could be to merge it into master and release a new v1.6. Although you already merged into this branch already several changes from v2 (e.g. xarray-utils), it should still be relatively easy to isolate just your contributions for the blending module, perhaps in a clean branch?

What do you think?

RubenImhoff commented 2 years ago

That is not a bad idea, Daniele. How much time do you think we still need for v2? I'm thinking of our idea (at least @ladc and mine - but not limited to) to work towards a blending paper, including the open-source implementation of the STEPS code in pySTEPS and a hydrometeorological analysis. As long as we have a release ready by that time, it should suffice.

@ladc, what do you think?

Only changes I can think of, is that the importers and subsequent testing schemes are (partly) based on the xarray implementation.

dnerini commented 2 years ago

How much time do you think we still need for v2?

Difficult to say. Unfortunately I don't have a lot of time to dedicate to this, and the same appears to be true for other core developers. We might have a pre-release ready by end of summer 2022, but this is optimistic and won't be stable.

So yes, I recommend to make this PR suitable for v1 (or cherry-pick from here into a new branch, as you prefer). I will anyway start reviewing your code here.

RubenImhoff commented 2 years ago

Sounds like the best decision then! Thanks for reviewing and I'll start thinking about how to nicely put this, together with the steps_blending and linear blending branches, into v1.

RubenImhoff commented 2 years ago

pystepsrc

You have included a new path_workdir. I'm not against, but what's its purpose? Perhaps you could include a short comment to help understand its scope?

Good point. We implemented this as a place to store the decomposed NWP fields and NWP velocity fields, but seeing how it used now, we can reach the same with 'just' the path_outputs in pystepsrc. @ladc, what do you think?

RubenImhoff commented 2 years ago

In the blended forecast, I have the impression that something is a bit off with the NWP values, as I observed a strong increase of the intensities with leadtime. I anyway still need to have a closer look at your implementation of the masking and probability matching.

I completely forgot to have a new look at the examples forecast after the latest changes. The minor code changes will follow soon. Indeed, something seems to go wrong with either the masking or prob_matching.. I don't see the same effect (at least it is not as clear if it is there) in the test forecasts with the RMI data, so I wonder what goes wrong.

RubenImhoff commented 2 years ago

In the blended forecast, I have the impression that something is a bit off with the NWP values, as I observed a strong increase of the intensities with leadtime. I anyway still need to have a closer look at your implementation of the masking and probability matching.

I completely forgot to have a new look at the examples forecast after the latest changes. The minor code changes will follow soon. Indeed, something seems to go wrong with either the masking or prob_matching.. I don't see the same effect (at least it is not as clear if it is there) in the test forecasts with the RMI data, so I wonder what goes wrong.

While working on the tests for pysteps.blending.utils, I came across an error in pysteps.blending.utils. In the function stack_cascades, the mean and sigma could be set to 0.0 and 1.0 (I don't know why), which eventually gives wrong values after blending with NWP. As this only happened with the NWP component(s), the prob_matching was compensating for this and that is why we saw the effect with increasing time in the examples scripts. However, the changes above fix the problems (nevertheless, masking and prob_matching still require a critical look)! Also for the RMI test scripts the results look better:

BlendedForecast_202107041405

dnerini commented 2 years ago

Very nice @RubenImhoff! The animation is very promising!

I'm having a closer look at the probability matching. Do I understand it right that you basically perform a linear blending of radar extrapolation and NWP based on the weights from the second cascade level in order to build the reference field for the probability matching? This is now a bit of a detail, but the problem with this is that you would loose the peak values, especially when both weights are around 0.5.

An alternative approach would be a simple resampling: for each grid point you sample either the radar or the NWP value (and this could be based on the same weights you are already using). This way you would build a new target distribution that is not smoother than the original radar and NWP fields.

I have some code for this (I used it in the EnKF approach). Should I give it a try to implement it in your code?

RubenImhoff commented 2 years ago

An alternative approach would be a simple resampling: for each grid point you sample either the radar or the NWP value (and this could be based on the same weights you are already using). This way you would build a new target distribution that is not smoother than the original radar and NWP fields.

I have some code for this (I used it in the EnKF approach). Should I give it a try to implement it in your code?

Great idea, @dnerini, and thanks if you want to implement that!