Formula of get_AICc - Githubissues

hayato-n commented 2 years ago

In diagnostics.py, get_AICc formula is https://github.com/pysal/mgwr/blob/2a955355e31fe4a124b49ce4723335d308ee09d2/mgwr/diagnostics.py#L11-L30

However, as written in its comment, it depends on the setting of GLM. Thus its result is different from the AICc definition described in Li et al. (2019). (Even if I changed sigma2_v1 parameter of gwr.GWR), the resulting AICc value did not change.)

I suspect that the following code is consistent with the definition above.

gwr.n * (np.log(np.sum(np.square(gwr.resid_response))) - np.log(gwr.n-gwr.ENP) + np.log(2*np.pi) + (gwr.n+gwr.ENP) / (gwr.n-2-gwr.ENP))

Is there any reason why the current implementation is employed?

Li, Z., Fotheringham, A. S., Li, W., & Oshan, T. (2019). Fast Geographically Weighted Regression (FastGWR): a scalable algorithm to investigate spatial process heterogeneity in millions of observations. International Journal of Geographical Information Science, 33(1), 155–175. https://doi.org/10.1080/13658816.2018.1521523

Ziqi-Li commented 2 years ago

Hi @hayato-n. The only difference is the denominator when calculating the error variance, where here in mgwr is using the MLE (RSS/n) and in the Li paper is described using the unbiased estimator (RSS/(n-k)). I think it is more common to use the MLE one that is implemented here, so to be consistent, the later update of fastgwr uses MLE (Link).

hayato-n commented 2 years ago

Hi @Ziqi-Li, thanks for your reply. I confirmed that the following code is consistent with the get_AICc's behaviour.

# ML
sigma2 = np.sum(np.square(gwr.resid_response)) / gwr.n

# unbiased
# sigma2 = np.sum(np.square(gwr.resid_response)) / (gwr.n - gwr.ENP)

# AICc
gwr.n * (np.log(sigma2) + np.log(2*np.pi) + (gwr.n+gwr.ENP) / (gwr.n-2-gwr.ENP))

I suspect it is not intuitive that the parameter sigma2_v1 does not affect the AICc formula. It will be more desirable if the reason is written in the comment in get_AICc or its documentation.

Thank you again for your helpful comment!

Ziqi-Li commented 2 years ago

Hi @hayato-n, great you find it consistent now. I think sigma_v1 (which is calculated based on the denominator n-k) is actually not used in the AIC formula so modifying it doesn't affect the outcome.

hayato-n commented 2 years ago

Yes, you are right, sigma_v1 does not affect. I think your comments here are informative, thus I will send a small pull request to clarify the AICc formula. Please check and accept if you like it.

pysal / mgwr

Formula of get_AICc #117