region2/mu-comparison: add option to set quantile levels for circular variable summaries.

brownag commented 4 years ago

circ.stats <- aspect.plot(i$value, q=c(0.1, 0.5, 0.9), plot.title=mu, pch=NA, bg='RoyalBlue', col='black', arrow.col=c('grey', 'red', 'grey'), stack=FALSE, p.bw=90)

Currently quantile levels for circular summaries are hard-coded in report.Rmd. These values could be retained as default for backwards compatibility, and allow for (optional) definition in config.R -- alternately we could just use the default p.quantiles that are used for non-circular.

This will make the default output with standard quantiles (q=c(0, 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 1)) much more complicated, but will offer more flexibility and ability to distinguish central tendency in more dispersed circular sample sets.

EDIT: A neat point about adding a new option to config file is that if we force the quantile vector to have length three then we can calculate "clockwise" vs "counterclockwise". While there is no general relationship between e,g 10th/90th percentiles and "clockwise" vs. "counterclockwise", if the RV is known, one can calculate and re-arrange table output.

dylanbeaudette commented 4 years ago

Not a bad idea. We may need to do some simulation studies to determine if there is a critical number of observations relative to radial "pattern" required to make much sense of the figures. Or, even simpler, we could use the Rayleigh Test of Uniformity to do a simple test of "given the sample size, is this pattern sufficiently different than uniform circular data"? We don't need to use p-values to enforce a decision, but the output is relatively simple.

I'll work on some ideas and add to the aspect.plot documentation. Also, this is a good reminder to visit NASIS data population guidance: should aspect values be populated when no angular pattern is present (i.e. low-RV-high are arbitrary)?

brownag commented 4 years ago

It would be helpful to standardize interpretation of those plots -- but that is more for cases where we have limited data (e.g. pedon observations) not for the mu-comp report. Seems like a good issue for {sharpshootR}.

We generally do not populate aspect (left empty) for components where it is "irrelevant" (low slopes) or "occurs on all aspects." That convention is not at all consistent across SSURGO.

For mu-comp report, I don't rely much on the figures other than for confirmation of (lack of) concept. My interest in using other quantiles is to see what the rest of the distribution looks like without going down the rabbit hole of modifying bandwidth and visualizations of points [or applying a rigorous statistical test to mapunit samples where my concept is at component level]

I don't think there is anything in the NASIS guidance/metadata about when (in terms of how strong of an aspect effect you have) you should or shouldn't populate aspect. I suppose you mean that you want to provide new guidance on this?

Was curious how this is done elsewhere. Here is a quick demo looking at how wide ranges are in SSURGO components. Not sure that the first 100k component records are representative but this generally confirmed my suspicions about how the data have been populated. The most common case of the non-NA aspects is the full circle, followed by 180, 270 and 90 degree portions and as a few others (135, 315?). There are some ranges that are not snapped to cardinal direction that could have been statistically derived.

library(soilDB)

# take first 100k records from component data in SDA
q <- "select top(100000) aspectccwise, aspectrep, aspectcwise from component"
res <- SDA_query(q)

# about half have aspect populated
sum(complete.cases(res))

# > sum(complete.cases(res))
# [1] 52090 

# remove NA
res1 <- res[complete.cases(res), c("aspectccwise","aspectrep","aspectcwise")]

# test data
# res1 <- res1[500:505,]
#
# two more simple test cases
# res1 <- structure(
#     list(
#       aspectccwise = c(0, 45L, 90L, 180L, 270L, 360L),
#       aspectrep = c(0L, 0L, 0L, 0L, 0L, 0L),
#       aspectcwise = c(0, 45L, 90L, 180L, 270L, 360L)
#     ),
#     row.names = c(124L, 125L, 127L, 128L, 129L, 130L),
#     class = "data.frame"
#   )
# 
# res1 <- structure(
#   list(
#     aspectccwise = c(360L, 315L, 270L, 180L, 90L, 0L),
#     aspectrep = c(360L, 360L, 360L, 360L, 360L, 360L),
#     aspectcwise = c(0, 45L, 90L, 180L, 270L, 360L)
#   ),
#   row.names = c(124L, 125L, 127L, 128L, 129L, 130L),
#   class = "data.frame"
# )

# convert to arc length (radians) on unit circle
res2 <- res1 / 180 * pi

# calculate sum of arc lengths to RV 
aspect.range <- function(ccw, rv, cw) {

  # CCW, RV, CW XY positions on unit circle
  y_rv <- sin(rv); y_cc <- sin(ccw); y_cw <- sin(cw) 
  x_rv <- cos(rv); x_cc <- cos(ccw); x_cw <- cos(cw)

  # when ccw == cw below routine won't work 
  #  this is a short-circuit commonly used for "everything" 
  if(x_cc == x_cw & y_cc == y_cw)
    return(2*pi)

  # calculate euclidean distance (chord length)
  d1 <- sqrt((x_rv - x_cc) ^ 2 + (y_rv - y_cc) ^ 2)
  d2 <- sqrt((x_rv - x_cw) ^ 2 + (y_rv - y_cw) ^ 2)

  # calculate arc length from chord length
  t1 <- 2 * asin(d1 / 2)
  t2 <- 2 * asin(d2 / 2)

  # return the sum (this is the part that doesn't work with "everything")
  return(t1 + t2)
}

# apply to each record
res3 <- apply(res2, 1, function(x) aspect.range(x[1], x[2], x[3]))

# convert back to degrees
res4 <- 180 * res3 / pi

# convert small values to 360
res4[abs(res4) < 1e-5] <- 360

# inspect RVs
hist(abs(res4), main = "Width of L-H range SSURGO components\n[non-NA in top 100,000 records]",
     xlab = "Distance from Counter-clockwise to Clockwise Aspect Angle")
abline(v=c(90, 180, 270), lty=2, col="blue")

brownag commented 4 years ago

Repeat above analysis for Live Region 02 (2-SON) SSURGO:

res <- sf::read_sf("E:/Geodata/soils/MLRA_2_SON_FY2021.gdb", "component")

# remove NA
res1 <- res[, c("aspectccwise","aspectrep","aspectcwise")]
res1 <- res1[complete.cases(res1),]

# 32% have aspect populated
nrow(res1) / nrow(res)

ncss-tech / soilReports

region2/mu-comparison: add option to set quantile levels for circular variable summaries. #106