kikocorreoso / scikit-extremes

scikit-extremes is a basic statistical package to perform univariate extreme value calculations using Python
MIT License
41 stars 10 forks source link

Sign Convention Used in this package #12

Open chiaral opened 2 years ago

chiaral commented 2 years ago

@jaymay2002 and I are using your package to fit some annual maxima. I sort of stumbled on the sign convention of this package, and I am not sure anymore what is the sign convention for it.

Little summary: The typical convention I am used to - but definitely is not necessary right or wrong - is the one presented also for reference in the Wiki page: Positive shape parameters are Frechet/type II distribution (+/II), Negative shape parameters are Weibull/type III distribution (-/III).

If I use the scipy.stats.genextreme I know that they use the opposite sign convention, they mention it in the documentation:

import xarray as xr
import numpy as np
from scipy.stats import genextreme
import matplotlib.pyplot as plt
import scipy.stats as st

#generate some values mimicking 110 AM
N = 110
px = np.arange(1, N+1)/(N+1)
negative_shape_st = genextreme.ppf(px, -0.25, loc=20, scale=10)
positive_shape_st = genextreme.ppf(px, 0.25, loc=20, scale=10)

When I plot them - and I think I am doing it correctly here, I am going to copy the full code for clarity on my steps

plt.figure()
ax= plt.subplot(111)
plt.plot(negative_shape_st, -np.log(-np.log(px)), 'r.-', label= 'negative genextreme.shape')
plt.plot(positive_shape_st, -np.log(-np.log(px)), 'b.-', label= 'positive genextreme.shape')
RT= np.array([1.1,2,5,10,15,20,25,50, 100])
px_RT = 1-1/RT
ax.set_yticks(-np.log(-np.log(px_RT)))
ax.set_yticklabels(RT)
plt.legend()

I get this figure (please note that the probability/return period are on the y-axis): image

The Frechet like distribution (fat tail in red) corresponds to a negative shape parameter, and viceversa for the Weibull like distribution (positively bounded in blue). This confirms me that the scipy.stats.genextreme has the opposite behaviour of the typical, let's call it WIKI, WIKI convention.

Now let's move to this package. In the documentation you state that:

In all my experience with L-moments, I always used the +/II and -/III convention, so I was expecting the same from this package, however when I pass these generated values to scikit-extremes I get

param_neg = skextremes.models.classic.GEV(negative_shape,fit_method="lmoments",frec=1) 
print(param_neg.params)

param_pos = skextremes.models.classic.GEV(positive_shape,fit_method="lmoments",frec=1) 
print(param_pos.params)

OrderedDict([('shape', -0.2060769098499644), ('location', 20.136579831214807), ('scale', 9.937463870947118)])
OrderedDict([('shape', 0.2485270265980506), ('location', 20.05459337496466), ('scale', 9.81974150422217)])

So scikit-extremes estimates a negative (positive) parameter for the generated values with a negative (positive) paramaeters in scipy.stats.genextreme. So scikit-extremes seems to have a consistent sign convention withscipy.stats.genextreme. (We checked all fitting methods and it is across methods).

Instead if we use the WIKI equations:

x = np.arange(1, 120) # generate  i.e. precipitation values

# fix parameters
location = 20
scale= 10

# WEIBULL
neg_shape= -0.25
x_neg= x[x<location-scale/neg_shape] # the weibull has a upper bound
t_x_neg = (1+neg_shape*((x_neg-location)/scale))**(-1/neg_shape)
CDF_neg = np.exp(-t_x_neg)

# FRECHET
pos_shape=  0.25
t_x_pos = (1+pos_shape*((x-location)/scale))**(-1/pos_shape)
CDF_pos = np.exp(-t_x_pos)

plt.figure()
ax= plt.subplot(111)
plt.plot(x_neg, -np.log(-np.log(CDF_neg)), 'r.-', label= 'negative genextreme.shape')
plt.plot(x, -np.log(-np.log(CDF_pos)), 'b.-', label= 'positive genextreme.shape')
RT= np.array([1.1,2,5,10,15,20,25,50, 100])
px_RT = 1-1/RT
ax.set_yticks(-np.log(-np.log(px_RT)))
ax.set_yticklabels(RT)

plt.plot(negative_shape_st, -np.log(-np.log(px)), 'ro', label= 'negative genextreme.shape')
plt.plot(positive_shape_st, -np.log(-np.log(px)), 'bo', label= 'positive genextreme.shape')
plt.legend()

image

The convention is opposite, in that using the WIKI convention a positive shape corresponds to a fat tail, a negative shape to a bounded distribution.

Did I get this right? No sign convention "is right or wrong" but the documentation now is misleading.

Thanks!

EDIT: I had tagged the wrong Jaymay apologies

kikocorreoso commented 2 years ago

Hi Chiara,

Thanks for sharing your thoughts.

I did this package for a professional need I had some time ago. During that time I tried to read as much as possible and tried to follow standards.

The main references used are: -Book: "An Introduction to Statistical Modeling of Extreme Values" by Stuart Coles. -R Package: extRemes, https://www.jstatsoft.org/article/view/v072i08 (and other R packages and books).

The issue with scipy having different internal conventions about signs was not helpful. See some discussions here.

In the end I tried to mimic the results obtained from the R packages I inspected in order to provide similar results. I expect they have the correct convention as they are by far more experts in the field than me and theses packages are like standard packages in the field.

If you think there is something wrong somewhere or the docs are misleading, please, let me know how it can be improved and I will try to fix it (very few time these days) or, better, if you have a fix you can pull it and we can discuss it.

I hope it helps.

chiaral commented 2 years ago

Hi! thanks for your reply!

I absolutely love this package, and it is something missing in the python world. L-moments fitting is a standard for some of us, and R extRemes is awesome. So I am grateful for the time you put it. In fact, this package is becoming obsolete, because of the package it depends on for the l-moments, and I might try and see if I can contribute to both of them.

The sign convention is what it is, a convention, so it is not wrong or right, what it needs to be clarified for the users of this package is whether the shape sign that this package outputs across all methods of fitting (mom, l-mom, mle) follows the more common convention (i.e. positive sign for II/Frechet, negative sign for III/Weibull) or the convention that scipy follows (which is the opposite way).

Based on my analysis above seems clear that this package follows the scipy convention. The documentation on the website reports the theory related to EV and the GEV and the sign convention that is typically used (i.e. positive sign for II/Frechet, negative sign for III/Weibull), but although it doesn't say that this is the convention it uses, it also doesn't state the opposite.

So if you confirm that the sign convention used in this package is the same as scipy (i.e. negative sign for II/Frechet, positive sign for III/Weibull), all we need to do is add text to the documentation. I can try and open a PR for it.

Please let me know if I got it right. Best, and thanks for your time!

kikocorreoso commented 2 years ago

If you have a look here, for instance, it seems it follows the scipy convention (I don't remember what I decided at the moment so I have to dig into the code 😄 ).

If you send PRs to ammend the gaps in the docs or to improve the dependency issues I will be more than happy to review it (but I have very few time these days so do not expect extremely quick answers 😞 ). Regarding the dependencies I was thinking in a fork of lmoments3.

kikocorreoso commented 2 years ago

BTW, I've updated the README including this: https://github.com/georgebv/pyextremes

It is a more updated and complete EVA package also written in Python. I've been thinking in merge/add/help on this other package but I have had few time these years. Maybe it is worth to have a look to this package.