Open mdscheuerell opened 5 years ago
If you guys have any scripts for model fits and selection with covariates from other projects can you send those along? I'd like to start playing with those.
Hi @DavidTallmon, I'd be happy to do that, but I was planning on adding some plots etc. to the script / RMarkdown in this repo. I imagine that would be simpler than asking you to modify existing code from other projects. If you can give me a day or two I'll try to carve out some time to work on this.
Sounds great. Thanks.
If you can give me a day or two I'll try to carve out some time to work on this.
I'm trying to wrap up some other things so I can focus some time on this as well.
I'm also happy to modify any existing code if you guys can't find time and want to send some along.
I'm running the Auke coho models today...standby...
Things seem to be working fairly well now. The Beverton-Holt model still has some unpleasantness when it comes to tau_M
and tau_S
:
and the energy diagnostics for all three models suggest some thick tails that are difficult to sample from:
All in all, though, regularizing away the left tail of the obs error SDs makes these models much better-behaved, and faster: at least an order of magnitude speedup in wall time per iteration, and even more than that per effective sample.
Here's what the regularization looks like for the Ricker model:
The LOO comparison is still hopeless; 100% of the pointwise likelihood posteriors are in the "bad" or "very bad" categories. For the time being I'm proceeding with the Ricker, since the productivity estimate is more reasonable than the B-H. As you can see, there's still basically no information about the slope at the origin:
That said, the fits to the abundance data don't look bad at all:
<img src=https://user-images.githubusercontent.com/22503560/88860660-17fb8e80-d1b1-11ea-9080-b23680c5d04b.png height="50%" width="50%">
I have some more ideas for plots to show some of the interesting life-history patterns we can infer from the parameter estimates -- for example, annual SAR and ocean-age proportions for 2- and 3-year-old smolts -- but for now the existing plots (and other code chunks in the script) are incorporated into the RMarkdown via knitr::read_chunk()
.
Here are some (arguably) nifty plots showing life-history variation through time.
For the freshwater stage, we have the smolt recruitment process error residuals (i.e., cohort strength after adjusting for the expected density dependence) and the proportion of each cohort that outmigrated as 2- vs. 3-year-old smolts. Age-2 is usually dominant, but not always.
And for the marine phase, SAR and the propensity to return as a jack given smolt age. SAR for 2- and 3-year-old smolts is remarkably uncorrelated, which is also reflected in the estimates of R_MS
. Older smolts usually survive better, but not always (what was going on in the early 2000s?) and are more likely than age-2 smolts to return as jacks.
I added some more plots comparing model fit (states) to observed data, in this case smolt and spawner age composition. Together with the smolt and spawner abundance time series shown above, these constitute all the likelihood components for this model.
It's a little surprising to see such lack of fit to the adult ocean age and Gilbert-Rich age data, considering the (perhaps dubious) precision implied by a multinomial likelihood with such large sample sizes (indicated by the sample binomial CIs). I suspect this is a, if not the, major contributor to the too-skewed-to-use pointwise likelihood posteriors and thus bad / very bad Pareto ks. (I can see @mdscheuerell nodding.)
On that note, @DavidTallmon, can you remind me of the provenance of these age-composition samples? It looks like every spawner over the weir is classified by ocean age (jack vs. 1-salt), a subset of them are sampled for full fresh-salt age, and a subset of outmigrating smolts are aged. Do I have that right?
I also considered whether we ought to exclude jacks from the "spawners" in the spawner-recruit function as we've done in the OR coho analysis, on the grounds that it gets closer to a female-only model and removes noise associated with variable jack proportions. However, it turns out there's no discernible relationship between the proportion of spawners that are jacks (either observed or estimated) and the recruitment process errors, so...meh. On the other hand, that doesn't mean variable sex ratio might not explain some of the process error. I can't remember, do we have sex-composition data?
Different but related: the model currently assumes jacks are not subject to harvest. Is that a good assumption?
These are definitely nifty figures to study. Scott is looking into the early 00s marine survival/age structure anomaly to see if there's a reasonable biological or sampling explanation. Maybe the covars will help with that, though I wish we had some early marine predator info to include. You're correct on age comp. All spawners are ocean aged and sexed at the weir. Approx 5-10% of smolts and 20% of spawners are aged from scales each year for at least many years and probably many, many years back. The zero harvest of jacks seems a reasonable assumption. I can't think of any fishery in SEAK that would harvest jacks in any appreciable numbers.
Thanks @DavidTallmon. I guess it might be worth looking at the sex ratios to see if they're associated with the smolt recruitment anomalies.
Re spawner age structure, the plot (LOL) thickens. It dawned on me that the weir data and the scale samples might imply different ocean-age proportions, which would explain why the model struggles to fit either one. Sure enough:
I'm on the fence about whether it's even statistically valid to include both of these as likelihood components, since the scale samples are a subset of the same individuals aged at the weir, hence obviously nonindependent. Technically the answer is clearly no, although there is some research suggesting nonindependent data in an IPM can sometimes be innocuous. @mdscheuerell, any thoughts on this? We could try leaving out the weir data on ocean age and just using the scale samples (the reverse won't work, as the full Gilbert-Rich age is needed to identify the smolt age-specific SAR and ocean age probabilities). Or I suppose we could remove the individuals with scale samples from the weir samples so they're independent, which wouldn't require any additional info. That would only increase the conflict b/w the two data sets, though.
I think we could really benefit from bringing in Scott's insights on daily operations at the weir. Perhaps there is/was historically some scale sampling weirdness occurring -- it seems like it would be really difficult to misassign ocean ages at the weir given the limited classes and obvious differences b/w classes. (I've emailed Scott w/ the fig above w/ a request for any insights.)
Yeah, @svulstek's input would be helpful. He's a collaborator on this repo, so would be best if he weighed in here to keep the discussion in one place.
Re: the use of both visual age data and scale samples, am I correct in the that the scale samples were taken from fish that were also aged visually, or are they a separate group?
This also strikes me as an opportunity to show the comparison between the information you could expect from the two different sample types by trying @ebuhle's suggestion to only use one or the other data type.
Yes, I think all spawners over the weir are aged visually. n_MSage0_obs + n_MSage1_obs == S_obs
.
(Edited for insufficient caffeine.)
Yes, to my limited knowledge, there are only jacks and 1 SW coho in all of SEAK. Ages and sexes are assigned (if I can say that) at the weir.
The decline in age-3 smolt survival is really interesting. Does the smolt to adult survival include jacks as adults (all spawners) or are spawners separated into adults and jacks? I think its the former, but want to be sure. Also, is there a cheat sheet for the model parameter abbreviations for IPM_SMaS_np? Some parameters are easy to infer from the abbreviations (a, Rmax, etc), but others take bit of time to suss out. I couldn't find this in the few places I looked.
The decline in age-3 smolt survival is really interesting. Does the smolt to adult survival include jacks as adults (all spawners) or are spawners separated into adults and jacks? I think its the former, but want to be sure.
That caught my eye too. And yes, you are correct. s_MS[t,j]
(SAR, or survival from smolts M
to spawners S
, for smolts of age class j
in ocean entry year t
) applies to returning spawners of both ages, and p_MS[t,j,k]
is the conditional probability of returning at ocean age class k
given that you survive. The notation p_x
refers to conditional age probabilities for life stage x
indexed by cohort, and q_x
is the age distribution indexed by calendar year. So, for example, the number of 2-year-old smolts (smolt age class 1) that return as jacks (ocean age class 1, not subject to harvest) in the same calendar year they outmigrated is M[t]*q_M[t,1]*s_MS[t,1]*p_MS[t,1,1]
.
When I have some time I'll write out the math in the RMarkdown, which will hopefully clarify things. The motivation for using this SAR / conditional-age-at-return parameterization is that it rearranges the otherwise nonidentifiable annual marine survival and maturation probabilities into identifiable functionals.
Also, is there a cheat sheet for the model parameter abbreviations for IPM_SMaS_np? Some parameters are easy to infer from the abbreviations (a, Rmax, etc), but others take bit of time to suss out. I couldn't find this in the few places I looked.
par_defs("IPM_SMaS_np")
will give you a cheat sheet. The Dimensions
column is a little off in some places due to a combination of transcription errors from the Stan code and the fact that multiple models may contain parameters with the same name but different dimensions (we have an open issue in salmonIPM to address the latter), and the order of appearance is a bit scattershot, but the definitions are right.
> par_defs("IPM_SMaS_np")[,-2]
Parameter/state Definition
[1,] "alpha" "Intrinsic productivity"
[2,] "Rmax" "Asymptotic recruitment"
[3,] "p_HOS" "True prop of hatchery-origin spawners (HOS) in years with H fish present"
[4,] "B_rate_all" "True broodstock take rate in all years"
[5,] "S" "True total spawner abundance"
[6,] "beta_M" "Regression coefs for spawner-smolt productivity"
[7,] "rho_M" "AR(1) coefs for spawner-smolt productivity"
[8,] "sigma_M" "SD of spawner-smolt process error"
[9,] "mu_MS" "Mean smolt-to-adult return (SAR)"
[10,] "beta_MS" "Regression coefs for SAR"
[11,] "rho_MS" "AR(1) coefs for SAR"
[12,] "sigma_MS" "SD of SAR process error"
[13,] "s_MS" "True SAR by outmigration year"
[14,] "tau_M" "SD of smolt observation error"
[15,] "tau_S" "SD of spawner observation error"
[16,] "M" "True smolt abundance (not density) by outmigration year"
[17,] "mu_p_M" "Popn mean smolt age distributions"
[18,] "sigma_p_M" "SDs of log-ratio cohort smolt age distribution"
[19,] "R_p_M" "Correlation matrices of log-ratio smolt age distns"
[20,] "p_M" "True smolt age distributions by brood year"
[21,] "q_M" "True smolt age distributions by calendar year"
[22,] "R_MS" "Correlation matrices of logit SAR by smolt age"
[23,] "mu_p_MS" "Popn mean ocean age distributions for each smolt age"
[24,] "sigma_p_MS" "SDs of log-ratio ocean age for each smolt age"
[25,] "R_p_MS" "Correlation matrices of log-ratio ocean age distns"
[26,] "p_MS" "True ocean age distns by outmigration year"
[27,] "q_MS" "True ocean age distns of spawners"
[28,] "q_GR" "true Gilbert-Rich age distns of spawners"
Perfect. Thanks. (I'm sure you told me about par_defs a couple years ago, but I forgot. My bad.)
No problem, I sometimes forget how some of this stuff in salmonIPM works myself! (Exhibit A: the fact that I had to edit my last post after realizing I left out a key part of the "equation".)
This is an opportune time to mention that over the weekend, I finally tackled the "new and improved" priors on initial states in IPM_SMaS_np
. Briefly, what I had done before was to put diffuse lognormal priors on smolt abundance M
and spawner abundance S
, and simplex-uniform priors on the smolt age distribution q_M
and the Gilbert-Rich spawner age distribution q_GR
, in the early years of the time series -- that is, the years before all age classes could be generated from previous years' states using the process model. That's an inefficient use of information, though, because some age classes during that initial window can in fact be generated from the process model. It's only the "orphan" age classes -- smolts that were born or spawners that outmigrated, respectively, before year 1 -- that need priors. By not using all the available information, I was depriving the process model of "data points" (really states) that could be used to help estimate the hyperparameters. This matters especially in short time series.
It took me an annoyingly long time to work out a general solution to this problem that, crucially, preserves the prior simplex-uniformity over the orphan age classes, and even longer to implement it in this particular model because the whole smolt age / ocean age bookkeeping is such a headache. After some teeth-gnashing, I'm pretty sure I got it right. The updates are in the master
branch of salmonIPM now and available to use if you uninstall and reinstall from GitHub. And the code and results in this repo are updated in the develop
branch, and will be in master
pending approval of a PR.
Most of the changes are pretty subtle, but this rather striking one is relevant to the above discussion. The uncertainty in the early years of SAR and ocean age probabilities really tightens up, and it turns out that SAR of age-3 smolts has been relatively low in the past too:
I'm not successfully uninstalling from the master and installing from the develop branch. I tried:
devtools::uninstall_packages("ebuhle/salmonIPM") which I thought would uninstall the current salmonIPM, but I got Error: 'uninstall_packages' is not an exported object from 'namespace:devtools'
I had planned to follow with: devtools::install_packages("ebuhle/salmonIPM@develop") I'm assuming this will give me an error, too, but again, I'm not sure what is wrong w/ this syntax.
My devtools package is up to date. Can you help w/ proper un/install from the master and develop branches?
Hey @DavidTallmon, I think you'll have to use
remove.package(salmonIPM)
Weird. That didn't work at first. But, this did install.packages("salmonIPM") remove.packages("salmonIPM") I don't know how the package could've been removed before that. I couldn't find any code that I had removed it previously. Could it be related to a previous "pull'? But, continuing on, this did not work. devtools::install_packages("ebuhle/salmonIPM@develop") Error: 'install_packages' is not an exported object from 'namespace:devtools' I would like to install salmonIPM from the develop branch. Suggestions?
Yeah, the lingo is a bit confusing among the different install options. Here you want
devtools::install_github("ebuhle/salmonIPM@develop")
Of course. I should've seen that. Thanks.
Everything seemed to work, but got this failure message (at bottom). Is it really a failure, or can we just call it a very stern warning?
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo ebuhle/salmonIPM@develop
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: backports (1.1.7 -> 1.1.8 ) [CRAN]
5: processx (3.4.2 -> 3.4.3 ) [CRAN]
6: rlang (0.4.6 -> 0.4.7 ) [CRAN]
7: vctrs (0.3.1 -> 0.3.2 ) [CRAN]
8: pillar (1.4.4 -> 1.4.6 ) [CRAN]
9: tibble (3.0.1 -> 3.0.3 ) [CRAN]
10: pkgbuild (1.0.8 -> 1.1.0 ) [CRAN]
11: loo (2.3.0 -> 2.3.1 ) [CRAN]
12: sys (3.3 -> 3.4 ) [CRAN]
13: xfun (0.15 -> 0.16 ) [CRAN]
14: dplyr (1.0.0 -> 1.0.1 ) [CRAN]
15: rstan (2.19.3 -> 2.21.2) [CRAN]
16: DT (0.14 -> 0.15 ) [CRAN]
17: rstantools (2.0.0 -> 2.1.1 ) [CRAN]
Enter one or more numbers, or an empty line to skip updates:1 backports (1.1.7 -> 1.1.8 ) [CRAN] processx (3.4.2 -> 3.4.3 ) [CRAN] rlang (0.4.6 -> 0.4.7 ) [CRAN] vctrs (0.3.1 -> 0.3.2 ) [CRAN] pillar (1.4.4 -> 1.4.6 ) [CRAN] tibble (3.0.1 -> 3.0.3 ) [CRAN] V8 (NA -> 3.2.0 ) [CRAN] pkgbuild (1.0.8 -> 1.1.0 ) [CRAN] loo (2.3.0 -> 2.3.1 ) [CRAN] sys (3.3 -> 3.4 ) [CRAN] xfun (0.15 -> 0.16 ) [CRAN] dplyr (1.0.0 -> 1.0.1 ) [CRAN] rstan (2.19.3 -> 2.21.2) [CRAN] DT (0.14 -> 0.15 ) [CRAN] rstantools (2.0.0 -> 2.1.1 ) [CRAN] Installing 15 packages: backports, processx, rlang, vctrs, pillar, tibble, V8, pkgbuild, loo, sys, xfun, dplyr, rstan, DT, rstantools Installing packages into ‘C:/Users/datallmon/R_LIBS’ (as ‘lib’ is unspecified)
There are binary versions available but the source versions are later: binary source needs_compilation backports 1.1.7 1.1.8 TRUE DT 0.14 0.15 FALSE
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/processx_3.4.3.zip' Content type 'application/zip' length 1147681 bytes (1.1 MB) downloaded 1.1 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/rlang_0.4.7.zip' Content type 'application/zip' length 1135164 bytes (1.1 MB) downloaded 1.1 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/vctrs_0.3.2.zip' Content type 'application/zip' length 1160471 bytes (1.1 MB) downloaded 1.1 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/pillar_1.4.6.zip' Content type 'application/zip' length 181902 bytes (177 KB) downloaded 177 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/tibble_3.0.3.zip' Content type 'application/zip' length 416247 bytes (406 KB) downloaded 406 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/V8_3.2.0.zip' Content type 'application/zip' length 9193571 bytes (8.8 MB) downloaded 8.8 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/pkgbuild_1.1.0.zip' Content type 'application/zip' length 141218 bytes (137 KB) downloaded 137 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/loo_2.3.1.zip' Content type 'application/zip' length 843911 bytes (824 KB) downloaded 824 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/sys_3.4.zip' Content type 'application/zip' length 59680 bytes (58 KB) downloaded 58 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/xfun_0.16.zip' Content type 'application/zip' length 232731 bytes (227 KB) downloaded 227 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/dplyr_1.0.1.zip' Content type 'application/zip' length 1309742 bytes (1.2 MB) downloaded 1.2 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/rstan_2.21.2.zip' Content type 'application/zip' length 5339661 bytes (5.1 MB) downloaded 5.1 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/rstantools_2.1.1.zip' Content type 'application/zip' length 160718 bytes (156 KB) downloaded 156 KB
package ‘processx’ successfully unpacked and MD5 sums checked Error: Failed to install 'salmonIPM' from GitHub: (converted from warning) cannot remove prior installation of package ‘processx’
So, first off, there's no need to install the develop
branch of salmonIPM because
The updates are in the
master
branch of salmonIPM now and available to use if you uninstall and reinstall from GitHub.
The updated code in the develop
branch of this repo is still awaiting approval of a PR, which I might just go ahead and approve myself as long as you (or @mdscheuerell) don't have any uncommitted changes sitting in your local master
.
As for the error, try this: from a fresh R session, do update.packages(checkBuilt = TRUE, ask = FALSE)
. Then start a new session, install salmonIPM, and decline any further updates (3: None
). It's not clear to me that this is actually the reason for the error, but at least we can rule it out.
The updated code in the develop branch of this repo is still awaiting approval of a PR
Done. Sorry I missed that previously.
which I might just go ahead and approve myself
In the future, @ebuhle should just go ahead and approve PR's.
Cool, thanks. Not trying to make extra work for you, just to avoid the multiple-cooks-in-the-kitchen-at-the-same-time problem. :grin:
So, back to the proportions of jacks/age structure of spawners and the disparity between datasets (at weir vs scales)..... I talked to Josh today. The scale samples are not random or systematic samples....they are taken from about 20% of individuals as the opportunity presents itself. In contrast, the visual calls of sex/age at the weir are of all individuals as they are passed upstream. Given that, the visual calls seem the better data for likelihood construction and interpretation. Also, Josh confirmed that < 20 CWT jacks have been reported as harvested by ADFG since 1980, and the ADFG expansion is that < 34 jacks have been harvested over the time series. Seems like we can safely say not subject to harvest.
So, back to the proportions of jacks/age structure of spawners and the disparity between datasets (at weir vs scales)..... I talked to Josh today. The scale samples are not random or systematic samples....they are taken from about 20% of individuals as the opportunity presents itself. In contrast, the visual calls of sex/age at the weir are of all individuals as they are passed upstream. Given that, the visual calls seem the better data for likelihood construction and interpretation.
Oh dear. That's too bad. Without the full FW/SW age, there's no way to identify the SAR and ocean age-proportions specific to each smolt age, which renders a big chunk of this model moot, or at least underdetermined.
I mean, I guess technically I should say that other likelihood components (total spawner abundance and ocean age composition) might partially identify or bound those parameters / states -- in a very general sense, that's part of the rationale for IPMs -- but I strongly suspect that the information will be weak at best. Of course we can try it and find out, but this seems like a major wrench in the works.
On another note, were you able to (re)install salmonIPM?
Uh-oh. Is there not a way to use full FW/SW age info from the scales from the sampled 20% of spawners to infer the smolt ages of the 100% of spawners IDd to SW age visually at the weir?
On the other note, I am still encountering some frustrations. For example:
update.packages(checkBuilt = TRUE, ask = FALSE) Warning in install.packages(update[instlib == l, "Package"], l, repos = repos, : 'lib = "C:/Program Files/R/R-4.0.2/library"' is not writable Error in install.packages(update[instlib == l, "Package"], l, repos = repos, : unable to install packages
I have admin privileges, so I'm not sure why /library is not writable. I also noticed that many of the recently installed packages have installed to
C:\Users\datallmon\R_LIBS
I'm guessing this is all related to the synch weirdness from my uni that I'm not clever enough to outsmart. This is a super-frustrating indicator of that:
devtools::install_github("ebuhle/salmonIPM") Error in loadNamespace(name) : there is no package called ‘devtools’ install.packages("devtools") Installing package into ‘\sync-al/files$/JD-880061-A2/datallmon/My Documents/R/win-library/4.0’ (as ‘lib’ is unspecified) also installing the dependencies ‘utf8’, ‘lifecycle’, ‘pillar’, ‘pkgconfig’, ‘vctrs’, ‘BH’, ‘sys’, ‘ini’, ‘tibble’, ‘backports’, ‘ps’, ‘lazyeval’, ‘base64enc’, ‘later’, ‘askpass’, ‘highr’, ‘markdown’, ‘xfun’, ‘clipr’, ‘crayon’, ‘curl’, ‘fs’, ‘gh’, ‘git2r’, ‘glue’, ‘purrr’, ‘rematch2’, ‘rprojroot’, ‘whisker’, ‘yaml’, ‘processx’, ‘R6’, ‘assertthat’, ‘fansi’, ‘digest’, ‘rex’, ‘htmltools’, ‘htmlwidgets’, ‘magrittr’, ‘crosstalk’, ‘promises’, ‘mime’, ‘openssl’, ‘prettyunits’, ‘xopen’, ‘brew’, ‘commonmark’, ‘knitr’, ‘Rcpp’, ‘stringi’, ‘stringr’, ‘xml2’, ‘evaluate’, ‘praise’, ‘usethis’, ‘callr’, ‘cli’, ‘covr’, ‘desc’, ‘DT’, ‘ellipsis’, ‘httr’, ‘jsonlite’, ‘memoise’, ‘pkgbuild’, ‘pkgload’, ‘rcmdcheck’, ‘remotes’, ‘rlang’, ‘roxygen2’, ‘rstudioapi’, ‘rversions’, ‘sessioninfo’, ‘testthat’, ‘withr’
Uh-oh. Is there not a way to use full FW/SW age info from the scales from the sampled 20% of spawners to infer the smolt ages of the 100% of spawners IDd to SW age visually at the weir?
I'm not sure I follow. What we've been doing is using the scale samples of FW/SW age to inform the estimates of SAR and conditional age at maturity for each smolt age, for the entire population (so, yes, the 100% of visually aged spawners at the weir). You need observations of full FW/SW age to do this -- ocean age alone can't distinguish the fates of 2- vs. 3-year-old smolts -- so in that sense, the scale samples are essential and the visual observations of SW age are a nonessential bonus that provides some additional information.
The problem is that those two data types are in conflict, which we can see by comparing the piece of information (SW age) they have in common. That's why the model is having a hard time fitting either one, despite the high precision implied by the multinomial likelihood with large sample sizes. And now we have an explanation: the scale samples aren't representative of the population as a whole. Even if we were to ignore the SW age from the scales and only use the FW age (maybe what you were suggesting?) it wouldn't change this fact, just hide it. Anyway, I don't think the FW age alone would be enough to identify the states / params in question -- you need the full 2 x 2 contingency table, as it were, not just the margins. Maybe there'd be some sort of indirect partial identifiability like I alluded to earlier, but again, it's a moot point if the scales are a severely biased sample from the population.
Does that make sense? Or am I missing something?
On the other note, I am still encountering some frustrations. For example:
Hrrrmm. Yeah, this does seem like some kind of Big Brother IT situation. (Who needs funding cuts when you can just prevent researchers from doing research?) And yet you've obviously managed to install and use packages before, so that's weird. What is .libPaths()
? It may be that either (A) you have somehow changed the default library directory, or (B) you will need to change the default library directory to evade Big Brother. I never, ever futz with the default library and I always get confused by this stuff, so I would need to consult the interwebz for instructions, but let's see what it is currently.
OK, so thinking about this as a multi-way contingency table analysis gave me an idea for a possible Hail Mary. Tell me if this makes sense.
We know that jacks are generally underrepresented in the opportunistically collected scale samples. I can think of two reasons why this might be the case (maybe there are others?):
If it's (1), then there's no immediately obvious reason why scale samples within each ocean age would be biased w.r.t. FW age. If it's (2), then who knows -- I could imagine that 2FW and 3FW fish come in at different times. If the date of each sample was recorded, then I suppose we could test this empirically.
Anyway, if (big, crucial IF) we can safely assume the scale samples within each SW age are unbiased w.r.t. FW age, then we could combine the visual SW age observations with the FW age from the scales to estimate what we need. Consider a toy example of age-frequency data for a given year (year indices are omitted for clarity), formulated as a 2 x 2 contingency table:
FW \ SW | 0 | 1 |
---|---|---|
2 | n_GRage2_2_obs |
n_GRage3_2_obs |
3 | n_GRage3_3_obs |
n_GRage4_3_obs |
The cell frequencies are the usual Gilbert-Rich age data as currently defined in IPM_SMaS_np
, which we get from scale samples. We model them using a multinomial likelihood with the following cell probabilities (again, I'm dropping the year index t
from q_GR[t,a]
to focus on age class a
):
FW \ SW | 0 | 1 |
---|---|---|
2 | q_GR[1] |
q_GR[2] |
3 | q_GR[3] |
q_GR[4] |
These cell probabilities are a function of several underlying states, including the smolt age-specific SAR and conditional age-at-maturity probabilities, but the important point is that they contain all the information about spawner age that we need to identify those states (and the hyperparameters that govern them).
The problem we've just discovered is that the observed cell frequencies n_GRage_obs
are not generated by (even pseudo-) random sampling from the population, so the multinomial likelihood is invalid; we cannot use it to estimate the cell probabilities q_GR
. But suppose the nonrandomness is only w.r.t. SW age, and that FW age is randomly sampled within each SW age. In contingency table parlance, we have fixed column margins, as would be the case in a designed experiment. It's as if we deliberately sampled some arbitrary number of scales from jacks n0_obs
and 1SW fish n1_obs
:
FW \ SW | 0 | 1 |
---|---|---|
2 | n_GRage2_2_obs |
n_GRage3_2_obs |
3 | n_GRage3_3_obs |
n_GRage4_3_obs |
Total | n0_obs |
n1_obs |
The column totals are meaningless, but the row-frequencies within each column allow us to estimate the conditional FW age probabilities q_FW
:
FW \ SW | 0 | 1 |
---|---|---|
2 | q_FW[1] |
q_FW[2] |
3 | q_FW[3] |
q_FW[4] |
where, e.g., q_FW[1] = q_GR[1] / (q_GR[1] + q_GR[3])
and q_FW[4] = q_GR[4] / (q_GR[2] + q_GR[4])
. (Yes I know the notation is tortured, but I'm trying to keep the variable names in tables 1 & 2 as they currently exist, rather than starting from scratch with what makes sense mathematically in this toy example.) So the multinomial likelihood becomes two multinomials (binomials). In cartoon form:
[n_GRage2_2_obs, n_GRage3_3_obs] ~ multinomial([q_FW[1], q_FW[3]], n0_obs) # P(FW age | SW age = 0)
[n_GRage3_2_obs, n_GRage4_3_obs] ~ multinomial([q_FW[2], q_FW[4]], n1_obs) # P(FW age | SW age = 1)
Now we need to estimate the column-marginal ocean age probabilities in the population:
FW \ SW | 0 | 1 |
---|---|---|
2 | q_GR[1] |
q_GR[2] |
3 | q_GR[3] |
q_GR[4] |
Total | q_MS[1] |
q_MS[2] |
where q_MS
is defined just as it currently is in the model. We use the visual ocean-age frequencies n_MSage_obs
to do this, exactly as we've been doing all along:
[n_MSage0_obs, n_MSage1_obs] ~ multinomial([q_MS[1], q_MS[2]], n_MSage0_obs + n_MSage1_obs)
Conceptually, the conditional FW age probabilities and the marginal SW age probabilities are equivalent to the full Gilbert-Rich age probabilities that we need. E.g., q_GR[1] = q_FW[1] * q_MS[1]
and so on.
Again, this is ONLY valid if the scale samples within each ocean age are unbiased w.r.t. smolt age.
Thoughts?
Sorry, I've been radio silent while going through the stages of grief. I think you've just saved me a couple final stages and articulated what I was trying and failing horribly to describe.....the SW scales should still be useful samples of their FW age composition even if SW scales are way more likely to be aged from 1 SW. The tables make sense after a walk-through. This could be fantastic. But, let me pass this by Scott and Josh for b.s. detection......I may be ignoring a fundamental sampling problem because I want so desperately for this to work. It seems like a brilliant work around.
Well, why didn't you just say so?!? :rofl: :laughing: :cry: :smile: :facepalm:
No, just kidding. This stuff is so much easier to hash out with a whiteboard. Another casualty of COVID. Glad I could save you depression and acceptance, at least. FWIW, I was feeling pretty bummed too, after all the work we've put into this and having just started making some real headway. I was hoping that somehow the visual age data would turn out to be bogus and the scales would be legit, because we could just drop the former and be fine. Was pretty excited when I hit upon this idea. Even better, I think it would be dead simple to implement. You could just pass in a logical flag conditionGRonMS
that, if TRUE
, would cause q_GR
to be conditioned on the column totals as shown above instead of the usual unconditional probabilities. Then (I'm pretty sure, although I need to double-check this) the actual likelihood statement wouldn't even change.
I'm tempted to go code it up right now, but yes, we should definitely run this by @svulstek and Josh first. The thing that gives me pause is that I don't see how we could validate the assumption of random (-ish) sampling of FW age conditional on SW age; it'd be a leap of faith. But hopefully those guys have a sense of whether it's well-justified. Fingers crossed.
Thoughts?
This sounds like a clever and opportunistic way of addressing this problem.
"There's such a fine line between stupid and clever."
It's on again. I just talked with the guys at the weir and they're confident that at least in the last decade there is no reason to think there's biased sampling of FW ages w/i SW ages.
OK, I got the new version with conditionGRonMS = TRUE
working.
Most of the substantive results are...surprisingly not that different. See for yourself in the rendered HTML. The one major exception, of course, is that the fits to the spawner age-frequency data look a lot better and more sensible:
Alas, this doesn't do anything to cure the bad-to-very-bad Pareto ks that make the LOO estimates unusable, but you can't win 'em all. I'm just happy we were able to salvage the scale data and continue to fit the same underlying models!
Yes. Nice work. Those age data fits definitely look better. A relief. So, do we move onto model comparisons w/ covariates or is that tenuous w/o a properly functioning LOO?
We can certainly go ahead and fit models with covariates, even if we don't yet have reliable machinery for doing model comparison based on out-of-sample predictive performance. The posteriors of the regression coefficients are also a valid way to look at effects; it's just asking a different question.
I may have to take a couple days off to get some "official" work done, though. :smile:
I'm hoping to make a push on this over the next few weeks. I'm feeling inspired and have a drop in my teaching load for awhile. My plan is to nuke everything and re-install R-studio, etc., with the hopes of evading UAS' synching and correlated issues for a few weeks. That seemed to work for awhile this summer. I could definitely use help formally describing the obs and process model likelihoods if anybody else feels similarly inspired (and flush with free time).
I can help with the model descriptions next week.
Hey Mark, did you have any luck finding time to work on the model descriptions? I hate to nag, but am feeling the pinch from NPRB. I think I have enough in the way of results to make them happy, but my model description attempts are pretty comical and likely dangerous.
Hey David. No, I didn't yet, but I can do so this weekend if you can wait that long.
Sounds great. Thanks, Mark.
Hey @DavidTallmon, I said this already via email, but I suspect I'm going to have to run point on translating IPM_SMaS_np
from Stan to mathematical English. I might be able to devote some time to that later this week/end. (I mean, it's not like there's anything else going on, right?)
When I asked about NPRB, I didn't realize this work was already part of an existing proposal. We should talk about what that means, and the possibilities for dedicated funding; feel free to do that over email if that's better.
My plan is to nuke everything and re-install R-studio, etc., with the hopes of evading UAS' synching and correlated issues for a few weeks.
Any success (or failure) on this front?
As @ebuhle demonstrated on 10/26,
shinystan()
is a convenient way to examine some of the diagnostics and summaries of model fits. However, we will like want some additional R scripts for customizing our output. Eric and I have several options from other projects that we can draw upon.