monocongo / climate_indices

Climate indices for drought monitoring
https://monocongo.github.io/climate_indices/
Other
338 stars 163 forks source link

The algorithm for the estimation of shape parameters of gamma distribution #374

Open zbc123a opened 4 years ago

zbc123a commented 4 years ago

Hi James,

When I went through the scripts of fitting the gamma distribution, I was confused by the way you used to estimate the shape parameters. means = np.nanmean(calibration_values, axis=0) log_means = np.log(means) logs = np.log(calibration_values) mean_logs = np.nanmean(logs, axis=0) a = log_means - mean_logs alphas = (1 + np.sqrt(1 + 4 * a / 3)) / (4 * a) betas = means / alphas

The typical functions I found for the estimated alpha is the quotient of the squared sample mean, and the squared sample variance and the beta is the estimated alpha divides by sample mean (https://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf). I would appreciate it if you would like to briefly explain the algorithm you used in the code or provide some references that I could read by myself.

Thanks! Beichen

monocongo commented 4 years ago

Hi Beichen,

Thanks for your attention to this. I am at a loss to describe this aspect of the code. To be honest I am not sure where I first found the code that is used here but I do know that it's not original as I'm very weak with statistics (my contribution to this effort has been providing project structure and software engineering best practices, etc.). This approach for computing gamma parameters and/or its results were vetted by the scientists I worked with at NOAA at the time I wrote the code. I'm happy to entertain updates and/or modifications if it can be shown that this code is somehow in error or if there are better implementations for this. Please advise! Thanks in advance for your help...

--James

monocongo commented 4 years ago

Also, the gamma is provided here since much of the research using SPI uses a gamma distribution, but it's been shown that other distribution fittings are more valid, such as the Pearson Type III which is the other distribution available in this package. Some literature on this topic:

https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1752-1688.1999.tb03592.x http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-77862011000200001

BTW the implementation of SPI using Pearson Type III is based on the Fortran SPI code written by Guttman, although it has gone through a number of iterations since then and bears little resemblance by now.

If I can be of help with comparing or vetting this implementation against others to determine its validity then please advise. This project is an attempt to provide some standardization, and as such, it aims to provide the "best of breed" implementations for these various drought indices. If this SPI is not "best of breed" and can be improved upon then we are very open to changes, please submit a PR. Thanks!

zbc123a commented 4 years ago

Hi James,

Thank you so much for your exhaustive explanation and suggestion! I did compare the SPI that calculated in the Gamma distribution with the SPI that calculated in the Pearson Type III distribution. The correlations between those two data sets are over 0.99 in my study region. I also noticed that the SPI in Pearson distribution had less missing values as compared to the SPI in Gamma distribution in some areas. However, as you said, many studies of drought used the Gamma distribution. I will reconsider what kind of distribution will work better. Also, thank you for the paper you shared with me~

I have contacted a professor in the statistics department since the first time you gave the response. I will let you know once I got his feedback.

Take care, Beichen