r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
123 stars 27 forks source link

Monte Carlo pseudo p underestimated? #172

Closed LucSteinbuch closed 13 hours ago

LucSteinbuch commented 2 weeks ago

In reference to https://github.com/r-spatial/spdep/blob/dbd3074b8822ec669fea628061bce88fe77c5bf2/R/moran.R#L153

I wonder if this shouldn't be diff <- nsim + 1 – xrank ? I have the impression that the Monte Carlo pseudo-p is currently underestimated. I tried to reverse engineer the code, using a convenience example:

Line 148 and 149: We perform nsim MC simulations; we shuffle the values among the spatial locations, so H0 is true (=no spatial correlation). The resulting Moran’s I values are stored in the numerical vector res[1:nsim]. Let’s for convenience assume that nsim = 5, and that we get the following simulated values:

res[1:nsim] = 0.1, 0.2, 0.3, 0.4, 0.5

Line 150: We calculate the actual Moran’s I, and store it vector element res[sim+1]. Let’s assume that the actual value of Moran’s I is 0.35. Note that two simulated values in res[1:nsim] are thus above the actual I, so the resulting Monte Carlo pseudo p-value for a right tail test (alternative = “greater”) should become (2+1)/(5+1) = 0.5. In line 150 we get thus:

res = 0.1 0.2 0.3 0.4 0.5 0.35

Line 151: Calculating the ranks gives rankres = 1 2 3 5 6 4

Line 152: What is the rank of the actual I? xrank = 4

Line 153: the diff(erence), as I understood the number of simulations with an I above the actual I, is calculated to be nsim – xrank = 5 – 4 = 1. Now I do now get (in line 158) a MC pseudo p-value of (1 + 1) / (5 + 1) = 2/6 = 0.333. Therefore, I wonder if this line is entirely correct?

If I made a reasoning mistake, I am happy to hear which one!

rsbivand commented 2 weeks ago

@LucSteinbuch yes, you neglected to look at the following lines: https://github.com/r-spatial/spdep/blob/dbd3074b8822ec669fea628061bce88fe77c5bf2/R/moran.R#L147-L160.

In all alternatives - greater, less and two-sided, the rank or diff is unit incremented. Using punif does the same as a Hope-type test (Cliff & Ord 1973).

In general, return_boot is to be preferred to the Hope-type test, see also https://cran.r-project.org/package=DCluster.