pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
441 stars 160 forks source link

Add Kalman-filter based post-processing to blending scheme #353

Open RubenImhoff opened 3 months ago

RubenImhoff commented 3 months ago

As discussed at the start of our blending implementation and re-discussed during this year's EGU, we can still improve the post-processing that takes place in the blending procedure. Right now, the post-processing procedures from the nowcasting module are used and adjusted to work within the blending module. This is functional, but has as limitation that the post-processing is based on (1) having everything in Lagrangian coordinates and that is not the case in the blending procedure (we have incorporated that difference, but is suboptimal), and (2) it depends on the weights assigned to the NWP and extrapolation component, which could reduce extremes.

A Kalman-filter like method, such as in Nerini et al. (2019; https://doi.org/10.1175/MWR-D-18-0258.1), could do the trick. I think we should see if we can implement this in the blending module anytime soon. :)

dnerini commented 3 months ago

Thanks for initiating this @RubenImhoff ! Can you provide any hint on where we should make changes in the current code? I can then try to make a first draft.

RubenImhoff commented 1 month ago

Sorry for completely forgetting to respond to this, @dnerini! I will dive into the code and come back to your question early next week. Would be great to put this feature into the code. :)

RubenImhoff commented 1 month ago

Hi @dnerini, To finally come back to your question:

This is where the probability matching is called in the code: https://github.com/pySTEPS/pysteps/blob/953f799f932760dd5775ce4f9f0fc771e4d0a67d/pysteps/blending/steps.py#L1494.

The CDF probmatching method still calls this original function: https://github.com/pySTEPS/pysteps/blob/953f799f932760dd5775ce4f9f0fc771e4d0a67d/pysteps/postprocessing/probmatching.py#L54

And maybe most important, as the fields are not in Lagrangian coordinates (whereas this is the case in the nowcasting module until the probability matching part), we perform the post-processing on a blended extrapolation-NWP field (that is, excluding the noise cascade) to get some sort of indication of where the rainfall fields would be at that timestep and what they would look like given the original input files and no added noise: https://github.com/pySTEPS/pysteps/blob/953f799f932760dd5775ce4f9f0fc771e4d0a67d/pysteps/blending/steps.py#L1431.

Is this something you can work with? Let me know how I can assist, happy to help!

RubenImhoff commented 1 month ago

Another potential post-processing issue we have is that the incremental mask is not really incrementing over time, see https://github.com/pySTEPS/pysteps/blob/953f799f932760dd5775ce4f9f0fc771e4d0a67d/pysteps/blending/steps.py#L1471. This is also caused by the lack of Lagrangian coordinates, which makes the implementation different from the nowcasting approach. Anyway, any thoughts here are welcome too. :)

ladc commented 1 month ago

There are 2 separate issues here: 1) create a new target distribution for the resampling of the precipitation (instead of probability matching with a blended nowcast, sample from both NWP & nowcast according to the current skill to weight the sampling probability) 2) Kalman filter (separate issue)

dnerini commented 1 month ago

simple code to resample the distributions:

    def resample_distributions(a, b, weight):
        '''merge two distributions'''

        assert a.size == b.size
        asort = np.sort(a.flatten())[::-1]
        bsort = np.sort(b.flatten())[::-1]
        n = asort.shape[0]

        # resample
        idxsamples = np.random.binomial(1, weight, n).astype(bool)
        csort = bsort.copy()
        csort[idxsamples] = asort[idxsamples]
        csort = np.sort(csort)[::-1]
        return csort

@RubenImhoff maybe you can help finding where this could fit into the pysteps blending code, and also which value to use for weight

aitaten commented 1 month ago

This is the idea behind the "Empirical distribution matching" (section 3.3) in this paper. So in principle this part of the code can be included in the postprocessing folder? So then it can be an option as cdf, etc.

RubenImhoff commented 1 month ago

@dnerini, yes, it concerns these weights: https://github.com/pySTEPS/pysteps/blob/162985942dbd45be4d88c0f84a946e0501d71f3c/pysteps/blending/steps.py#L1405.

RubenImhoff commented 1 month ago

I agree with @aitaten that this would fit well in the post-processing and that we then call the method just like we call "cdf" now as prob_matching method.

dnerini commented 1 month ago

I agree with @aitaten that this would fit well in the post-processing and that we then call the method just like we call "cdf" now as prob_matching method.

Not sure about this. I see the resampling as an additional step you would do before the prob matching , which you still need, so to construct a new target distribution

RubenImhoff commented 1 month ago

Ah that way, I see what you mean. In any case, it will only be used by the probability matching in the end, right? If so, it would make sense to place that function there too.

RubenImhoff commented 1 month ago

@dnerini, I have opened a draft pull request that we can work in.