Question: Why does 'california_current_grid' have so many rows?

ericward-noaa commented 1 month ago

I'm puzzled about the grid size of california_current_grid, and am probably missing something. I'm just wanting to make sure we're providing the right grid in the 'surveyjoin' package.

The california_current_grid object has 45628 rows. I realize some of those are from non-WCBTS surveys, but filtering those out leaves me with 30835.

grid <- dplyr::filter(california_current_grid, propInWCGBTS == 1)

What I'm trying to compare this with is the description of the WCGBTS grid in Keller et al. (~ 12000 cells, 1.5nm x 2nm). We've been using the grid based on that in the surveyjoin package (dim(surveyjoin::nwfsc_grid) has 11512 rows) -- and I can't figure out the difference. Thanks!

kellijohnson-NOAA commented 1 month ago

When you filter the california_current_grid object you actually want to filter for

grid <- dplyr::filter(california_current_grid, propInWCGBTS > 1)

but I think that the difference is because the grid cells in Keller et al. are 3 nm² where the grid cells in the california_current_grid are 4 km², which is about 1.17 nm². Disclaimer, I did not actually create the grid but instead stole it from FishStatsUtils I think.

ericward-noaa commented 1 month ago

Thanks -- that makes sense. The propInWCGBTS variable has a max of 1

I did compare some indices with both grids, and they're perfectly correlated (just off by a scaling factor, as we would expect because of the difference in units)

iantaylor-NOAA commented 1 month ago

@ericward-noaa, how big is the scaling factor (e.g. can most of us just ignore the grid density)?

ericward-noaa commented 1 month ago

The scaling issue is actually not anything to be concerned about -- it was an issue on my end, where I forgot to adjust the area for one model in the get_index function. Here's a reproducible example that shows they give you the exact same index,

library(sdmTMB)
library(ggplot2)
library(dplyr)

set.seed(123)

# make fake predictor(s) (a1) and sampling locations:
n_each <- 500
predictor_dat <- data.frame(
  X = runif(n_each*10), Y = runif(n_each*10), 
  year = rep(1:10, each = n_each)
)
predictor_dat$fyear <- as.factor(predictor_dat$year)
mesh <- make_mesh(predictor_dat, xy_cols = c("X", "Y"), cutoff = 0.1)
betas <- runif(10, 3, 7)
sim_dat <- sdmTMB_simulate(
  formula = ~ 0 + fyear,
  data = predictor_dat,
  time = "year",
  mesh = mesh,
  family = gaussian(),
  range = 0.5,
  sigma_E = 0.1,
  phi = 0.1,
  sigma_O = 0.2,
  seed = 42,
  B = betas 
)

# make a new mesh for simulated data
mesh <- make_mesh(sim_dat, xy_cols = c("X","Y"), cutoff=0.13)

# fit the model
sim_dat$fyear <- as.factor(sim_dat$year)
fit <- sdmTMB(observed ~ 0 + fyear,
              spatiotemporal = "iid",
              mesh=mesh,
              dat=sim_dat,
              time="year")

# Make 2 prediction grids
grid_coarse <- expand.grid(year = 1:10, 
                           X = seq(0.5,9.5,by=1), 
                           Y = seq(0.5,9.5,by=1))
grid_fine <- expand.grid(year = 1:10, 
                         X = seq(0.25,9.75,by=0.5), 
                         Y = seq(0.25,9.75,by=0.5))
grid_coarse$fyear <- as.factor(grid_coarse$year)
grid_fine$fyear <- as.factor(grid_fine$year)

pred_coarse <- predict(fit, grid_coarse, return_tmb_object = TRUE)
pred_fine <- predict(fit, grid_fine, return_tmb_object = TRUE)

index_coarse <- get_index(pred_coarse, 
                          bias_correct = TRUE, area = 1)
index_coarse$type <- "coarse"
index_fine <- get_index(pred_fine, 
                        bias_correct = TRUE, area = 0.25)
index_fine$type <- "fine"

rbind(index_coarse, index_fine) %>%
  ggplot(aes(year, log_est, group=type)) + 
  geom_ribbon(aes(ymin=log_est-2*se, 
                  ymax=log_est+2*se, fill=type), 
              alpha=0.3) + 
  geom_line(aes(col = type))

^{Created on 2024-08-05 with reprex v2.1.1}

pfmc-assessments / indexwc

Question: Why does 'california_current_grid' have so many rows? #35