r4atlantis / atlantisom

Atlantis operating model. Generates data sets from Atlantis scenarios.
https://r4atlantis.github.io/atlantisom
7 stars 6 forks source link

add sample_ages to wrappers to simulate ageing a subset of lengthed fish #42

Open sgaichas opened 3 years ago

sgaichas commented 3 years ago

The way sampling is set up, effN for each survey is the number of fish measured for length, age, and average weight at age. We could introduce further realism by using the sample_ages() function to take a subsample for age composition and optionally apply an ageing error matrix (specified in the survey and fishery config files). The age comp based on a larger sample size and without error is still used to generate the length composition as input to calc_age2length, so we would have to run sample_ages after we generate lengths and weight at age.

This would require changing the om_comps() wrapper:

  1. move #save age comps lines to after the length comps and weight at age are saved
  2. add a step modifying the age_comp_data[[i]] object before the #save age comps lines by running sample_ages
  3. same two steps for fishery age comps, unless we age all the samples from fisheries
  4. add a step modifying the annage_comp_data[[i]] by running sample_ages (no need to shuffle because this is not an input to the length function

saved age comp objects remain the same this still means weight at age is from an unrealistically large age sample, could fix later

kellijohnson-NOAA commented 3 years ago

Sorry if I am missing the mark here, I did not review all of the code to see exactly what is going on, but does this assume that weight-at-age samples are independent of age-composition samples? I cannot think of any situation where you would age fish and weigh them and not use those ages in the model as marginal age compositions.

sgaichas commented 3 years ago

You are absolutely correct @kellijohnson-NOAA that we wouldn't leave out age samples if we had them. And thank you for helping me think this all the way through. So here is what I think we can do:

Atlantis outputs n at age and weight at age. We use the initial create_survey and sample_fish with an effN equal to the number of lengths measured, which we estimate with calc_age2length because length is not tracked by Atlantis itself. Testing to date has kept all lengths, ages, and mean weight at age output from this step, which I think gives us an unrealistically large age sample, and also a mean weight at age based on this unrealistically large age sample. (This is assuming most surveys measure 10-100x more lengths than ages. If a survey measures as many lengths as ages, then we can stop here.)

So I think we can keep the length output of calc_age2length, but the next step would be to run the n at age output of sample_fish through sample_ages to represent the age subsample (still a subset of fish originally collected on the survey and measured for length). Then we can re-run calc_age2length with the subsample of ages output from sample_ages (optionally with ageing error included) to get both a length composition of the subsampled aged fish and a mean weight at age based only on the subsampled aged fish.

An extra step, but much more representative of how (at least US) surveys work.

Does this make more sense?

kellijohnson-NOAA commented 3 years ago

In fisheries data, we don't always get lengths and ages for a given fish. Sometimes there will be age data with no length information. So, would you always want to just sample ages from those that are lengthed? In ss3sim we allow the sampling to be separate, where we sample from the truth two times, (1) for ages and (2) for lengths. Unless, the data are conditional age-at-length samples; where we would sample for length and take a total number of ages from those lengthed based on the distribution of lengthed fish, i.e., more ages from the most abundant length bin and fewer ages in the bins near the tails. Where many sampling protocols are length stratified and take an equal number per bin if available. But we don't allow for this latter kind of sampling in ss3sim.

If you are sampling ages from those that are lengthed and putting the ages into the model as marginal age-composition samples your information is not as independent as the model assumes because it is double counting each fish, i.e., assuming a length measurement is from and independent fish from the population and assuming an age measurement is from a new independent sample of the population.

Sorry if this is a bit in the weeds.

sgaichas commented 3 years ago

Not at all, I'd like to design this so users have options for different biological sampling methods and this is definitely helping.

We can make the age sample independent of fish sampled for length similarly to ss3sim if we re-do sampling at the sample_fish stage with an effN that reflects the age sample. We can then run sample_ages on this to add ageing error if necessary. We can get mean weight at age for this sample either by running calc_age2length and ignoring or discarding the length output if it isn't wanted. That is a lot of overhead so I should write a simpler function to calculate mean weight at age only if lengths aren't used (extracting that bit from the calc_age2length would probably work).

I would also rather avoid an option that mimics length-stratified sampling for age, so I'm glad to hear ss3sim doesn't allow it. I think we are treating the survey ages as conditional age at length in the CC Atlantis-based sardine assessment, but @cstawitz can confirm.

cstawitz commented 3 years ago

Yup, we are using CAAL in the Atlantis-sardine assessment, where every lengthed fish is aged.

in the species I have looked at in Alaska the effect of length-stratified sampling has been pretty minimal because the length bins they use are often very small. But it gets worse the larger the bins are. I have code that corrects for length-stratified sampling if that's of interest (though it sounds like we're not trying to replicate it in atlantisom)

On Fri, Oct 30, 2020 at 7:15 AM Sarah Gaichas notifications@github.com wrote:

Not at all, I'd like to design this so users have options for different biological sampling methods and this is definitely helping.

We can make the age sample independent of fish sampled for length similarly to ss3sim if we re-do sampling at the sample_fish stage with an effN that reflects the age sample. We can then run sample_ages on this to add ageing error if necessary. We can get mean weight at age for this sample either by running calc_age2length and ignoring or discarding the length output if it isn't wanted. That is a lot of overhead so I should write a simpler function to calculate mean weight at age only if lengths aren't used (extracting that bit from the calc_age2length would probably work).

I would also rather avoid an option that mimics length-stratified sampling for age, so I'm glad to hear ss3sim doesn't allow it. I think we are treating the survey ages as conditional age at length in the CC Atlantis-based sardine assessment, but @cstawitz https://github.com/cstawitz can confirm.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r4atlantis/atlantisom/issues/42#issuecomment-719576721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWNMJQ35Y5MCD6XYMLGS7TSNLC3NANCNFSM4TEKRKXQ .

-- Christine Stawitz (she/her) cstawitz@uw.edu christine.stawitz@noaa.gov http://cstawitz.github.io