pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
468 stars 169 forks source link

Implement STEPS blending scheme #212

Closed cvelascof closed 2 years ago

cvelascof commented 3 years ago

A great addition to pySTEPS should be to include a module and support routines to blend multiple sources of forecast rainfall. For example, blending a radar rainfall nowcast with a Numerical Weather Prediction (NWP) rainfall forecast or multiple sources of rainfall estimates.

The branch steps_blending (https://github.com/pySTEPS/pysteps/tree/steps_blending) is created to develop a version of STEPS blending scheme as proposed by Seed et al, 2013 (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/wrcr.20536)

Proposed modules and routines:

Modules:

Support:

Data - case studies:

This is just a first trial to describe what would be needed to complete this feature. Early versions of some of those modules are already in the branch but they need more work and development.

dnerini commented 3 years ago

Hello @cvelascof, very happy to see this important branch back on track, I look forward to the new blending module!

dnerini commented 3 years ago

Do I get it right that all methods will assume that NWP and radar data come at the same spatio-temporal resolution as well as extent?

Also, do you aim at blending with an ensemble or a deterministic NWP?

cvelascof commented 3 years ago

It is easier to assume to start that NWP and radar data come at the same spatial projection and at the same spatial-temporal resolution. First version should combine one deterministic NWP with the radar nowcast. The critical aspect now is to define an structure in the module that allows us to add more complications to this first version later on, for example more than one NWP, or an ensemble NWP.

RubenImhoff commented 3 years ago

The idea is indeed to start simple, but to later allow for different spatio-temporal resolutions, extents and ensemble members.

dnerini commented 3 years ago

Very good, I like the approach!

It would be easier if the blending module could fit into the existing data model, but I'm personally very much open to explore alternatives if needed.

ladc commented 3 years ago

Hi everyone, Indeed, good approach to start simple (while keeping in mind the desired features). Things will need refactoring anyway. Also it would be good to (eventually) have an interface that's generic enough to allow for more input fields than simply precipitaiton. I'm thinking for example about machine learning / deep learning based blending methods which require multiple predictors. Also some NWP models can output stratiform and convective precipitation separately, and that could also be exploited. But these are worries for later.

ladc commented 3 years ago

Wout (@wdewettin) is progressing well with the linear blending. But indeed the linear blending method will need a totally different interface than advanced blending methods. The problem is that the blending generally is not simply a post-processing step that combines a radar-based nowcast with NWP. For the linear blending this is indeed the case - we calculate a nowcast, and then combine the two forecast sources post-hoc. However the STEPS scale- and skill-dependent blending needs to be done inside the STEPS nowcast.

Keeping in mind potential machine learning-based nowcast methods, I think the line between blending and extrapolation-based nowcast will be blurred anyway, as multiple predictors (including, but not limited to NWP precip) can be used.

Thoughts?

dnerini commented 3 years ago

Hey Lesley

You raise some very good points here. Personally, I'd like to take this opportunity to refactor the main steps method: we need to try to break it down into smaller pieces, so that we can more easily test and reuse it elsewhere (ie. your option 2).

I also like option 3, although this risks making the whole implementation even more complicated. But from the point of view of the application, this seems the most natural way: add as many sources of information you have available to make your nowcast better. This relates to your last point concerning machine learning: NWP forecasts should be seen as additional predictors to the prediction problem.

We should also consider eventually going for an object-oriented approach if we realize this could make our life simpler (initially we decided to avoid classes but this doesn't need to stay like this).

RubenImhoff commented 3 years ago

Hi everyone,

We (@ladc, @wdewettin and me) discussed today how to implement the STEPS blending scheme. The implementation of the blending weights by Seed et al. 2013 was hard to follow for us, as it seems that equations 7 and 8 do not entirely resemble what is described in the text and what we see in the C-implementation of STEPS. If anyone exactly knows how to implement this, we would be happy to hear it and see if we can and will change to that.

However, this weights calculation method is particularly made for blending with more than two (that is nowcast and NWP) models, e.g. multiple NWP models (by focussing on covariations). We were wondering if we will make blended nowcasts this way. I can imagine that if we have multiple NWP models or ensemble NWP models, that we use every model/ensemble member separately in a blending procedure. This means that we will not first blend all models with the nowcast and add the noise cascade, but instead that every model is blended independently with the nowcast.

Therefore, we propose to stay for now (except when the discussion here results in a different conclusion) with the calculation of the weights as initially proposed by Bowler et al. (2006). @cvelascof already made a good start with this in the steps_blending branch. I am continuing with this today and will make one adjustment compared to Bowler et al. (2006): the weights have to sum to 1.0. Hence, the weight of the noise cascade is 1.0 - the sum of the other weights. @cvelascof, no worries, I still have your initial code, so if we really have to change it back, that will be possible.

dnerini commented 3 years ago

Hi @RubenImhoff Unfortunately I'm not very familiar with the Seed et al 2013 paper either (we could get in touch with Alan directly... does he have a github account?^^ ), but from what I remember, they introduced this covariance term which might prove very important for blending (@aitaten knows a lot about this), especially when working with NWP RUC systems that use radar data assimilation. Regarding the case of multiple NWP runs, I'm not sure that we want to blend them sequentially as you mentioned, precisely because we cannot assume that they are independent to each other. This said, I'm perfectly fine with starting with a simpler implementation, especially at this point when there are a lot of important changes to be made. I'm sure @loforest would also have valuable inputs on this.

RubenImhoff commented 3 years ago

I totally agree, @dnerini! Let's see what the others think (and whether they know how the Seed et al 2013 paper is really implemented concerning the weights ^^). For now, I'll just implement the Bowler et al. (2006) method and we can continue from there.

@cvelascof, I'm going to change the noise weight back to the original formulation in eq. 13 in Bowler et al. (2006), instead of making it the missing noise weight in summing all weights to one. The reason for this is that we saw that the weights can sum up to more than 1.0 (also without the noise weight). After some digging in the C-code, we found out that the merged cascade is first renormalized before the recomposition with mu and sigma takes place. So, that will work out. Some things are missing in the papers and technical notes, so that (as you can see) keeps us busy, haha. ;)

aitaten commented 3 years ago

Hi! I just had a quick look to Seed et al., 2013 and what I understand is that they take into account the covariance matrix (Eq 8) as a form of having into account the variability of the model (0,0) and radar forecast (1,1) and how much of this variability is common/shared (1,0 = 0,1). As @dnerini said this is important for NWP with radar assimilation, but it can also be important to take into account improvements due to variance reduction. So I would say it is not constraint to more than 2 models as it is shown in the M=2 example in the very same paper.

Said that, I don't know how this paper is implementing the weights :( ... please can you point me where the weights are computed now so I can try to see if I can figure out where the covariance matrices can be added (I am still figuring out the pySTEPS package). Thanks!

ladc commented 3 years ago

Yes this is also what we understood, taking into account the covariances seems like a sensible thing to do (regardless of radar data assimilation). The matrix in Eqn. (8) doesn't seem to be a covariance matrix to me, though. I'm also a bit puzzled by how the \rho_e and the other \rhos are defined exactly (Lagrangian autocorrelation for the radar extrapolation component?). Screenshot from 2021-08-12 15-22-54 Screenshot from 2021-08-12 15-23-02

Also note that the weights, apart from the addition of the covariance matrix, appear to be implemented quite differently here than in the Bowler 2006 paper which defines the weights in Eqn. (11)-(13) in terms of ratios of explained variance. Screenshot from 2021-08-12 15-24-47

RubenImhoff commented 3 years ago

Hi everyone, another question w.r.t. the blending. I am working on the main module now (getting there, actually!) and we were wondering what to do with the probability matching. In the original pysteps code, the prob. matching takes place prior to the extrapolation of the recomposed precipitation fields. However, in the blending procedure, the extrapolation step takes place on all cascade levels prior to the blending with NWP. Using the blended precipitation field for the prob. matching with the initial radar observations does not make much sense. Do you have any ideas when and how to perform the prob. matching within the blending procedure? We haven't found a clear procedure in the earlier papers.

loforest commented 3 years ago

Hello :-) I try to give my input on some of the topics hoping that it helps.

ladc commented 3 years ago

If I understand correctly (please correct me if I'm wrong here), the goal of the probability matching is to use a reference rain rate cumulative distribution function and remap the forecast rain rates to match this cdf. The main motivation was to reduce the spurious appearance of drizzle throughout the previously non-rainy areas due to the noise injection. A second reason to perform it was to correct the intensification of rain as it leaves the domain (the mean is added to the recomposed cascade, even if some rain has already left the domain, leading to a spurious intensification of the remaining rain).

So given this information I see several options (although they may all be flawed in some way):

I think PM works well with a pure advection nowcast, but I'm not sure how much sense it makes to perform PM when new rainfall can be produced, or reduced, by NWP. I believe probability matching was finally abandoned (replaced by scale and shift) for the STEPS blending algorithm at BOM, and I suppose with good reason ;-) Not sure if this new method is described in the literature?

loforest commented 3 years ago

I now remember that PM was turned off in the latest implementation. Thanks @ladc for reminding me! Indeed, PM was used, among others, to remove the strong bias arising from the cascade parameters and dBR mean being fixed and not accounting for rainfall leaving the domain. If we cannot avoid doing PM we should use the approach of @dnerini, which will start from the radar cdf and finally converge to the NWP cdf. The assumption is of course that the NWP cdf is kind of ok, which is not true for lower resolution models. Otherwise, we should do shift and scale, though I remember vaguely that Alan added a third parameter to that scheme. We should also ensure not to get too high rainrates (saturate to a given value?).

ladc commented 3 years ago

Thanks for the information Loris! We've been brainstorming about the optimal solution to the aforementioned problems and we would propose to use a kind of Lagrangian blended probability matching instead. As a reference for the CDF we would use a deterministic extrapolation, blended with NWP, using the blending weights from the second cascade level (as is also done for the velocity field blending). The idea is to use the weights obtained in the STEPS scale- and skill-dependent blending algorithm to perform a straightforward (but skill-dependent) blending without any cascade decomposition or noise injection, so there's no introduction of spurious drizzle or enhancement of precipitation as it leaves the domain. Therefore we expect the CDF to be a representative baseline. Then the precipitation intensities in the STEPS nowcasts, which are blended per cascade level and recomposed, are matched with the reference CDF. Are we missing something here, or does this sound like it could work?