r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
116 stars 26 forks source link

Is it necessary to introduce a scaling factor in geary #151

Closed xiangpin closed 4 weeks ago

xiangpin commented 1 month ago

Thanks for developing this package, I'm using this package to process spatial transcriptome data. I want to calculate the autocorrelation using geary.test function. But I found the code introduces a scaling factor, which is not in the formula of geary'c index. I don't comprehend the purpose of doing this.

https://github.com/r-spatial/spdep/blob/fe24501d233e27d8c0e4bb98e09aa5686d717d96/R/geary.R#L11C1-L12C1

rsbivand commented 1 month ago

I cannot see anything that is not explained in https://doi.org/10.1007/s11749-018-0599-x p. 722, based directly on the original documentation in Cliff & Ord 1981, who generalised the measure to general weights. Please provide references to any conflicting literature or implementations that sugggest that the findings in Bivand & Wong 2018 pp. 733-734.

xiangpin commented 1 month ago

Thank you, I also see the reference. but the zi in the formula of geary is zi = xi - x_mean. But the scale function in R base first centralizes the data before scaling it in default. It seems more appropriate to use z = x - mean(x, na.rm=TRUE) or z = scale(x, scale=FALSE) in https://github.com/r-spatial/spdep/blob/fe24501d233e27d8c0e4bb98e09aa5686d717d96/R/geary.R#L11C1-L12C1

rsbivand commented 1 month ago

I suspect that the reason when changed was just implementation. I'll check and see if that is correct. I'm also cross-checking with the implementation in the python pysal esda module.

rsbivand commented 1 month ago

The specific change was: https://github.com/r-spatial/spdep/commit/732a6e2bb2c40ab21d6f5b9206e831e1a684c12a, with the issue #68 and discussion there. I'll try to add an argument to geary.test and geary to use scale() as now, or scale( , scale=FALSE) as before, only taking deviations from the mean but not setting sd to unity. My feeling is that probably the former leads to a division by unity of a function of the scaled values, rather than a division by sd of unscaled values, but need to check.

xiangpin commented 1 month ago

Thank you very much, I'm aware of that.

rsbivand commented 1 month ago

Just pushed https://github.com/r-spatial/spdep/commit/803636581398294aad9928aa7d4b1133647277b4 to branch geary. If you would like to try that, you'll see that both paths (scale(..., scale=TRUE) and scale(..., scale=FALSE) give the same outcome. The reason for the changes three years ago was because of the way localC uses geary.intern as far as I can tell, but both give the same output.