ncss-tech / soilReports

An R package that assists with the setup and operation of a collection of soil data summary, comparison, and evaluation reports. These reports are primarily used by USDA-NRCS soil scientists in both initial and update mapping.
15 stars 5 forks source link

estimate effective DF from spatial samples #11

Closed dylanbeaudette closed 8 years ago

dylanbeaudette commented 8 years ago

Graphical and formalized comparison are difficult due to the VERY high spatial autocorrelation. How can we down-weight the DF accordingly?

Ideas

1. http://www.inside-r.org/packages/cran/SpatialPack/docs/modified.ttest
2. use `clhs` output for bwplots and distributional tests... realistic?
3. weighting via local Moran's I
4. https://github.com/jebyrnes/spatial_correction_lavaan
5. [comparison of methods](http://www.petrkeil.com/?p=1050)
6. [faster ? calculation via ape::Moran.I and distance matrix](http://www.ats.ucla.edu/stat/r/faq/morans_i.htm)

Once we figure out an accurate and scale-able approach, we still need to figure out how to get these adjust sample size numbers into custom.bwplot() via bwplot(). This may require a custom panel function that performs a look-up against the spatial stats summary table.

dylanbeaudette commented 8 years ago

Progress

Using an idea from (Fortin and Dale, 2005) where n_eff is estimated based on a correlation. I am using Moran's I (truncated to {0,1}) as the "correlation".

Since we don't need p-values, set randomisation = FALSE.

## TODO: should this perform tests at increasing lags?
.Moran <- function(s, val, k=10) {
  # compute spatial weights matrix from k-nearest neighbors
  s.n <- spdep::knearneigh(s, k=k)
  s.nb <- spdep::knn2nb(s.n)
  s.listw <- spdep::nb2listw(s.nb)
  # get Global Moran's I
  I <- as.vector(spdep::moran.test(val, s.listw, rank=TRUE. randomisation = FALSE)$estimate[1])
  return(I)
}

## TODO: should this account for rho(lag) ?
# simple correction
# (Fortin & Dale 2005, p. 223, Equation 5.15
# using global Moran's I as 'rho'
.effective_n <- function(n, rho) {

  # TODO: what about negative spatial autocorrelation?
  # hack: clamping rho {0,1}
  rho <- ifelse(rho < 0, 0, rho)
  n_eff <- n * ((1-rho) / (1+rho))

  return(n_eff)
}

Results

Some results from map units in CA630. These are the median Moran I values.

variable 7011 7085 7086
(Estimated) MAST (degrees C) 0.2890 0.4350 0.5750
Annual Beam Radiance (MJ/sq.m) 0.2230 0.4000 0.5440
Compound Topographic Index 0.0950 0.2530 0.1765
Curvature Classes 0.0840 0.1575 0.1630
Effective Precipitation (mm) 0.7655 0.8295 0.8250
Elevation (m) 0.7740 0.8310 0.7840
Frost-Free Days 0.7440 0.8120 0.8170
Geomorphon Landforms 0.4105 0.4390 0.3640
Growing Degree Days (degrees C) 0.7240 0.8115 0.8065
Mean Annual Air Temperature (degrees C) 0.7200 0.8145 0.8090
Mean Annual Precipitation (mm) 0.7920 0.8360 0.8245
MRVBF -0.0265 0.0980 0.0830
SAGA TWI 0.6245 0.6010 0.5110
Slope Aspect (degrees) 0.2360 0.2560 0.3810
Slope Gradient (%) 0.1145 0.1950 0.2500
dylanbeaudette commented 8 years ago

OK, this works when applied to all samples associated with a map unit--however, it is far too slow when the number of samples > ~ 1000. Possible solution: compute Moran's I and effective sample size within each polygon. The final effective sampling size is the sum across all polygons.

This seems to work OK, still rather slow.

dylanbeaudette commented 8 years ago

The current report code works when sampleRasterStackByMU() is sourced from a local file, but not when loaded from the sharpshootR package namespace...!?

Computing the original sample size / polygon via

# starting sample size
n <- nrow(s.polys[[this.poly]])

Result:

   Estimating effective sample size: Mean Annual Air Temperature (degrees C)
Quitting from lines 191-269 (report.Rmd) 
Error in s.polys[[this.poly]] : 
  no [[ method for object without attributes
Calls: <Anonymous> ... eval -> eval -> sampleRasterStackByMU -> nrow -> [[ -> [[

Later on, when indexing the current polygon:

# attempt to compute Moran's I
rho <- try(.Moran(s.polys[[this.poly]], v.polys[[this.poly]]), silent = FALSE)

Result:

Error in s.polys[[this.poly]] : 
  no [[ method for object without attributes
dylanbeaudette commented 8 years ago

Why does this error occur?

Error in na.fail.default(x) : missing values in object
dylanbeaudette commented 8 years ago

A different approach (that seems to work better)

  1. load rasters into memory if possible
  2. perform cursory grid-based sampling to determine Moran's I of each raster
  3. set sampling intensity based on I of each raster

using this method now, need to test

Follow-up in #26