pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
441 stars 160 forks source link

Create deterministic blended "control member" for visualisation purposes #370

Open ladc opened 1 month ago

ladc commented 1 month ago

As mentioned in issue #333, there's currently no way to do a "control" or deterministic run (unperturbed) for the blended nowcast. However, this would be useful (or even necessary) for the visualisation in the app. The ensemble mean is too "blurry" and a single ensemble member might be too distorted.

dnerini commented 1 month ago

Would this be a simple extrapolation with linear blending?

mrtnrey commented 1 month ago

I find this a very relevant feature request. In fact, I'd like to add the idea of a "worst case" and "least case" ensemble member. I know each of the ensemble members is equally likely, but some of them contain more precipitation than others. I'd like to know whether you could order the members by severeness and if it is then possilbe to show the two extremes to your users (together with the "control member").

dnerini commented 1 month ago

I'd like to add the idea of a "worst case" and "least case" ensemble member.

Some time ago, I looked into this idea of "band depth", which goes in the direction of ranking multi-dimensional data and therefore could be relevant for your use case. https://doi.org/10.1198/jasa.2009.0108

There is even an implementation in pysteps, ready to be used and experimented with: https://pysteps.readthedocs.io/en/stable/generated/pysteps.postprocessing.ensemblestats.banddepth.html

This, however, would not solve the problem of having a "control" member, or did I get your comment wrong?

mrtnrey commented 1 month ago

Great! Is there also a notion of "median" of the band then, and could such a median then be used as "control member"?

dnerini commented 1 month ago

Yes and in fact the lowest band depth corresponds to the median, while larger values indicate more and more outliers (in both negative and positive senses).

Not sure this corresponds to a control run, strictly speaking, but it could be something worth exploring.

Another option could be to look at the total accumulations for the whole forecast so to consider also the advection speed as a risk factor (typically, the faster the rain cell moves, the lower the total accumulation at a given point on the ground). (By the way, do you perturb the motion field in your setup @ladc ?)

It's a very interesting problem, how to present the ensemble information to the public!

ladc commented 1 month ago

To (attempt to) answer your questions:

Would this be a simple extrapolation with linear blending?

Possibly, but the question is what the weights would be. One could take the weights of the second cascade level as we do to blend the velocity fields. Then still the question remains how to deal with the weight of the third, non-existing noise cascade. I would propose to use the NWP and radar weights as-is (don't try to renormalize them or anything) and just blend them without noise.

I'd like to add the idea of a "worst case" and "least case" ensemble member.

I'm not so fond of this approach - one member can be the worst for the east, and the other the worst for the west of the domain, for example - it is a bit arbitrary to define an "overall worst" one.

Is there also a notion of "median" of the band then, and could such a median then be used as "control member"?

I think this is useful for a single pixel (to compute and visualise the meteograms for example) but it will not have the right spatiotemporal correlations if you look at the resulting spatial fields. A basin average of the median (or other percentile) will give you something very unphysical. If the same storm is located in 3 different locations in different members, you will see three storms in such band products.

ladc commented 1 month ago

@sidekock and I will start working on a "control member" which does not include the noise cascade and does not become smooth over time.

First idea: create a simple non-decomposed blending of radar using the weights of the second cascade level (as is done for the velocity field). This way, the small scale features don't wash out immediately, and meanwhile the current skill of NWP (albeit only at second cascade level scale) is still taken into account. This is a bit hacky, but hopefully has the desired characteristics of providing something less smooth but still blended and in agreement with the pysteps blended ensemble.

Second idea: use the decomposed blending, but use the weights from level X for all levels >=X (e.g. X=2 but potentially higher) to avoid the washing out of the small-scale features.

aitaten commented 1 month ago

I did a small experiment to compare control run based on band depth, mean and median. I think the band depth looks realistic but still not that similar to the nowcasting. Here the figure for comparison purposes

Image

I think the approach you mentioned 30 minutes ago might be the way to go towards a "control run"

sidekock commented 1 month ago

@aitaten Am I correct in assuming the band depth control gives you one of the members of a run? Namely, the one that can be seen as the mean of the distribution? If this is the case, this member should look similar to other members. The downside it is still has random fields associated with it. If my assumption is wrong, please let me know because I am not that familiar with band depth.

ladc commented 1 month ago

Thanks @aitaten! Yes, very interesting! Both are interesting but I think they contain different information. We will try to create a control member according to the description I posted above and see what it looks like.

ladc commented 1 month ago

Update: we actually already created a cascade level-2 blended nowcast for the probability matching (see comment "8.7.1 first blend the extrapolated rainfall field (the field that is only used for post-processing steps) with the NWP rainfall forecast for this time step using the weights at scale level 2."). So let's try using this.

dnerini commented 1 month ago

@aitaten Am I correct in assuming the band depth control gives you one of the members of a run? Namely, the one that can be seen as the mean of the distribution? If this is the case, this member should look similar to other members. The downside it is still has random fields associated with it. If my assumption is wrong, please let me know because I am not that familiar with band depth.

yes, the band depth can be used to rank the members of the ensemble. The lowest depth corresponds to the median scenario.

@ladc I think this is useful for a single pixel (to compute and visualise the meteograms for example) but it will not have the right spatiotemporal correlations if you look at the resulting spatial fields. A basin average of the median (or other percentile) will give you something very unphysical. If the same storm is located in 3 different locations in different members, you will see three storms in such band products.

so no, what I mean is that the band depth will allow you to rank whole members thus conserving the spatio-temporal structure, you see?

aitaten commented 1 month ago

As @dnerini mentioned, the experiment I did used the band depth to rank the members, so yes ... we are getting the member closer to the mean/median as control-run (so you did assume correctly @sidekock ). Once said that, and also that the spatio-temporal structure is conserved (because it is a member of the realistic realizations created by pySTEPS), I have to admit that if you look at the nowcasting and the control member, they do not look similar enough so @ladc approach of creating a cascade level-2 blended nowcast might be closer to the deterministic nowcast, and consequently the desired control-run.

ladc commented 1 month ago

The naieve approach to just use the combination blended with the weights of the second cascade level doesn't seem to work too well (apologies for the crappy plots).

The weights of both components are clearly different for the STEPS ensemble and the control run, with a larger weight given to NWP in the second cascade level compared to the lower levels.

image

Alternative approach: blend all levels with their respective weight evolution but without adding noise...

aitaten commented 1 month ago

I am not sure the blend without adding noise will keep the small scale structure. For me there is a bit of a contradiction in these two points:

The first and second points kind of works again each other :(

Maybe one way of thinking of pySTEPS control run is still introducing the noise for reproducing the small scales but maybe no introduce other perturbations (no vector perturbations?), and keep the values of the initial observation without other perturbation (no NWP information?). I am not sure about the second point. That what the reason behind plotting the member corresponding to the lower band depth so that we have a realistic looking and close to the ensemble median.

However, this last sentence is not complying either with the idea that the control run at some point should correspond to another equiprobable ensemble member. By choosing the member closed to the median at all lead-times, maybe we are going against this remark.

I am referring to this sentence: "The spread around the ensemble mean as a measure of accuracy applies only to the ensemble mean forecast error. It does not apply to the median, nor control members, even if they happen to lie mid-range within the ensemble. The spread of the ensemble, relative to a particular ensemble member is, for example, about 41% larger than the spread around the ensemble mean. The spread with respect to the control members is initially the same as for the ensemble mean, but gradually increases, ultimately reaching the same 41% excess as any member (see Fig5-2)." from here

So, I solution might be to close the member closest to the deterministic nowcasting in the first (first? several?) lead-time and hope that after some time, it is just another ensemble member.

dnerini commented 1 month ago

I am not sure the blend without adding noise will keep the small scale structure.

this is true only if you don't renormalize the weights to account for the fact that the noise is not used. But otherwise you should be able to conserve all the variability, don't you think?

aitaten commented 1 month ago

Given two 2D fields nwc and nwp, and weights w1 and w2 = 1 - w1, the blended field is defined as:

blend = w1 * nwc + w2 * nwp

The variance of the blended field blend is given by:

Var(blend) = Var(w1 * nwc + w2 * nwp)

Using the properties of variance, for any two random variables X and Y, and constants a and b:

Var(aX + bY) = a^2 * Var(X) + b^2 * Var(Y) + 2ab * Cov(X, Y)

Applying this to your fields:

Var(blend) = w1^2 * Var(nwc) + w2^2 * Var(nwp) + 2 * w1 * w2 * Cov(nwc, nwp)

This equation shows that the variance of the blended field is not simply the weighted sum of the variances of nwc and nwp. It also includes a term that involves the covariance between nwc and nwp. The difference observed is due to the covariance term 2 * w1 * w2 * Cov(nwc, nwp) in the variance calculation.

If I understood correctly your question :P

dnerini commented 1 month ago

Ah yes, I agree, the variance isn't conserved, thanks for the detailed proof ;)

In a way, though, this would be imho the best definition on "control member", being the same forecast method without stochastic perturbations. No, you're right, it won't be realistic looking. For that something like the median member (according band depth or whatever other approach) might be a better option.

I'm not sure that both things (ie control/deterministic and realistic looking) are possible in this framework. This is because STEPS doesn't converge fully to NWP, but like s-prog it filters out unpredictable scales (and replace them with noise).

A more pragmatic approach might be to run a simple lagrangian extrapolation nowcast and define the linear blending so that it approximately matches the ensemble - maybe something as simple as computing the weights based on the correlation between the extrapolation nowcast and the ensemble mean of steps-blending?

ladc commented 1 month ago

OK, so I didn't manage to solve it in time ;-) The control member now "dies out" according to the AR(2) process which is not ideal, but closer to the behaviour of the ensemble and not outside it. To be continued....

ladc commented 3 weeks ago

Hi all, thanks a lot for the insightful remarks above, there's indeed a fundamental inconsistency in trying to create an unperturbed control member which would also lie within the STEPS ensemble. Renormalizing the weights to account for the lack of a noise cascade also gives a problem outside the mask where NWP is just fully included - which is not necessarily wrong but definitely different from the pysteps ensemble mean. This is the same figure I posted in #390 by the way control_ens_mean_obs

For a box within the radar domain it looks ok for this case:

Image