pysal / mgwr

Multiscale Geographically Weighted Regression (MGWR)
https://mgwr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
360 stars 126 forks source link

Scaling of kernel functions #157

Open DOSull opened 1 month ago

DOSull commented 1 month ago

Not really an issue, as I don't use mgwr (or haven't yet), more a question.

Are the constant scale factors applied to the kernel functions in _kernel_funcs(self, zs) here correct?

Perhaps more relevant, in the GWR context are they even necessary?

I ask because there is an odd mixture of constants applied to the kernels in some cases and not applied in others.

In density estimation applications of spatial kernels, volume preservation under the density surface is important. If this matters in the GWR case, then rather obviously the triangular function is not volume preserving as written given that the volume of a cone is $\pi r^2h/3$ for a cone of height $h$ and base radius $r$. This means that a volume preserving triangular kernel of bandwidth $w$ would have height (i.e. constant multiplier) $3/\pi w^2$. I assume since that's not applied (similarly the Gaussian and quartic kernels as written are not volume preserving, and none of the others are either, given that none of them are parameterised by a bandwidth.

They're not really kernels, which usually implies estimation of a PDF, so much as spatial interaction functions.

Anyway... if volume preservation doesn't matter then that means there's no point in applying any constant scaling factors to any of them so that the (3. / 4) applied to the quadratic and the (15. / 16) applied to the quartic can safely be dropped, and some infinitesimal amount of time saved!

TaylorOshan commented 6 days ago

Hi @DOSull, volume preservation doesn’t matter in the sense of reproducing 'canonical' results, however, I'm not sure how it would or would not impact GWR estimation. I suspect not, or at least not significantly.

They're not really kernels, which usually implies estimation of a PDF, so much as spatial interaction functions.

I think that’s a fair point.

I didn’t follow the comment about none of them being parameterized by a bandwidth. GWR always uses a bandwidth to determine the distances (zs) used in those functions - I guess that's fundamentally different than the bandwidth, w, you mention.

FWIW, the triangular, quadratic, and quartic functions don’t see much play in GWR. The set of functions was ported from the kernel module of the weights package and then modified to remove truncations (i.e., arbitrary removal of near-zero interactions) that existed for memory optimization.

ljwolf commented 6 days ago

I think @DOSull means that, without a (1/(sqrt(2 pi)bw)) scaling factor, the Gaussian kernel we estimate is only proportional to the "correct" one that has that scaling factor.

Imho we should keep everything volume preserving. I'm not aware of this being studied in GWR, but I bet it's been studied in GAMs?

Reasoning through it, the constants should cancel out within the WLS estimator, no? However, we would get in trouble for many other estimators or statistics, so the kernel should be fully specified.

DOSull commented 6 days ago

My (limited) understanding of GWR would be that all these scaling factors will cancel out since it's all relative anyway.

This was just something I came across along the way and as a good FOSS citizen thought I should mention. I came across it in the context of a round of updating of the CSR model in our NetLogo model zoo where I realised that our quartic kernel method and diffusion smoothing method were yielding different total volumes under the surface, because we were certainly using the wrong scaling factor for the quartic function option!

That led me to the books, one of which was Fotheringham et al. 2002 on GWR, and from there to here. There are surprisingly few explicit statements out there of the $k$ scaling factor on kernel functions anywhere in the literature, and many of them restrict themselves to the scaling factor in the 1D case... which it need hardly be said is irrelevant here... I also have no idea where the reference in the code # functions follow Anselin and Rey (2010) table 5.4 is pointing to as there is no Table 5.4 in either of the obvious possible references (Perspectives on Spatial Data Analysis has no Table 5.4 and only one reference to kernels; and Serge and Luc's chapter in Handbook of Applied Spatial Analysis about PySAL is Chapter A10).

Returning to mgwr it isn't so much the correctness (or not) in terms of volume preservation just the inconsistency that struck me. As noted in my comment if volume preservation isn't a concern then there is no point at all in the various scaling factors for some of the kernels and you might as well get rid of them for whatever minor performance improvement that might yield.

Alternatively, as @ljwolf points out, the kernel should be fully specified on some principled basis if only so that competing implementations are consistent on such things and don't inadvertently yield different results for obscure reasons related to this kind of minor difference!

geoglrb commented 6 days ago

Hello! This is an exciting Issue, at least to me. Whether the normalization of those functions matters seems (to me) to be a matter of how you've coded the rest of things, which I think @DOSull and I plead guilty to not having explored beyond the code directly in question above, either by testing or reading/reasoning.

I am selfishly interested in the answer as it has some bearing on something I'm interested in seeing, feature-wise. I wonder whether I could get a spatial statistician cleverer than I am to comment on what would be needed to implement a generalized data-driven kernel option--an analogue to a W matrix for GWR. Both @TaylorOshan and @ljwolf seem to be well-positioned to think on this, given https://doi.org/10.1080/13658816.2022.2034829 and many others! ;-)

Also https://doi.org/10.1111/tgis.12020