Add and validate GEV preprocessing step

perrette commented 1 month ago

After the NGFS workshop preparation call, and preliminary discussion with @NiklasSchwind and Carl, we'll want to do GEV fitting inside the 21-year window instead of calculating the climatic average. This would require some validation and possible variations such as detrending the data inside the window, to avoid artificially distorting the GEV distribution.

NiklasSchwind commented 1 month ago

Good idea! However, as the GEV median and the mean won't exactly match and this introduces another assumption I would keep it an optional preprocessing step :)

perrette commented 1 month ago

Does it come to mind because of the importance of skewness in GEV fitting? On the other hand, precisely because of its importance, we might want to correct for it. Anyway, we can do sensitivity tests, see how strong the impact is, and decide then what the default should be. On a more general note, the processing of climate model data is not an exact science...

perrette commented 1 month ago

But I do agree detrending comes with tradeoffs and it needs careful assessment of whether the cure is better than the ill. E..g probably not a good idea to detrending selectively on 21 years worth of local data. If anything, we'd need to model the trend in a robust manner that does not add to the variability. A reasonably elegant (because self consistent) idea would be to use our model for the 21-year mean, perhaps (but not necessarily) calibrated on a per model basis, to remove the mean.

NiklasSchwind commented 1 month ago

I wouldn't put it as default as I think that doing that would limit the applicability of the emulator to GEV-distributed extreme indicators. E.g. rx1day is distributed with a GEV but tas is probably distributed normally, and flood depth probably has a completely different underlying distribution. So using the GEV per default would limit the applicability of our emulator to rx1day in this case while keeping it optional adds to the applicability.

I even wouldn't count the GEV fit/mean as a part of the emulator per se but as a part of the definition of the underlying indicator. (So we emulate indicators like e.g. "1-in-20 year event of rx1day" or "21-year-mean precipitation").

NiklasSchwind commented 1 month ago

Note about an idea by Carl (will elaborate further tomorrow): Instead of fitting one GEV on every 20-year window, one could try to fit a (linear?) function predicting the parameters of the GEV distribution from GMT to each simulation.

perrette commented 1 month ago

Here I am only talking about these variables that we assume follow a GEV distribution, like 1-in-X year events. The reason why I propose a dedicated processing for these variables, instead of treating them as a regular variable, is because they are already a statistics over time. There would not be a justification to take another 21-year, "climatological" mean of these variables, because they are already a climatological indicator. The question remains of whether they need detrending or not, and whether that should be done by default, but I tend to thing they do, and I maintain the line put forward above that we could use our classical emulator for the "mean" climatological variable as the trend, and use the GEV fitting on the daily variable minus the trend (I'd probably use a smooth GMT value to compute the trend). I don't see exactly what you find problematic here, nor the difference with predicting the parameters of the GEV. To predict the parameters of the GEV, you need to fit the GEV first, right? So accordintg to my understanding, in every 21-yr period, you'd do a GEV fitting on the (probably detrended) time-series derive the 1-in-X year event values, thus obtaining new time-series for each of the desired return periods. Later on this would not be fundamentally different from the other indicators, except that wouldn't take the 21-year mean of these values, you'd just use them directly. We can discuss that and other ideas on a call perhaps. Also with Carl if needed.

NiklasSchwind commented 1 week ago

My preferred way to go here would be to (1) calculate a warming-level-dependent GEV for each input ISIMIP simulation. (2) extract the indicator (e.g. 1-in-20 year event of rx1day) per warming level from every warming-level dependent GEV extracted from the simulation (3) calculate quantiles of the indicator values weighted by the probability of the warming level

Of course this would require some kind of detrending, let me check if I find a paper that does something similar after lunch, I think they do similar GEV fits in attribution studies. I think detrending with our own method would likely not work, as we calculate medians from all models, but maybe I am also misunderstanding the approach.

perrette commented 1 week ago

I guess we're on the same line. About the details, in my understanding (1) would mean doing this in a 21-year moving window: a 100-year time-series is split up into 80 overlapping sets of 21-years each (2000-2020, 2001-2021, ..., 2080-2100) and the GEV is fitted for every set. What is the input data frequency here, daily?

Regarding detrending, I don't have a strong opinion. Simplest is probably to just fit a linear trend and remove it within each 21-year period (keep the mean though). Not sure there could be side effects (like a noisy trend over time).

perrette commented 1 week ago

Yeah I'd do a linear detrending in every 21-year window. That would tend to minimize the GEV range, so it's conservative, so it's good.

iiasa / rime

Add and validate GEV preprocessing step #31