pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Add functionality to remove fish with ages from length data when using marginal ages #110

Open chantelwetzel-noaa opened 1 year ago

chantelwetzel-noaa commented 1 year ago

Is your feature request related to a problem? Please describe. In the past, we have used all length data to create length compositions while also using the ages from a subset of these fish as marginal age data resulting in double use for some of the data. Then the ad hoc "fix" for this was to then apply a lambda weight in the model of 0.50 to both the lengths and the marginal ages. In reality, the number of records used twice across the lengths and ages likely is not 50 percent or even close to this.

Describe the solution you'd like Across all of our data processing packages, we should subset the lengths used to create the length composition data to only those lengths that do not have ages with them if the user is planning on using the associated age data as marginals. I think we could go about this in a couple of different ways 1) add an input to processing functions for the user to specify how they want the data processed, 2) nest the length processing within the age processing function such that if someone works the data up as CAAL all lengths are retained and if marginals are requested the length data are subset, or 3) the length processing function produces two output automatically that represent all lengths and then the lengths subset.

Describe alternatives you've considered Alternatively, users could do the subsetting themselves outside of the processing function putting the choice and approach in their hands. This would be the current way to get around using the lambdas in the model to reduce the likelihood of these data.

Additional context The one item that we may want to carefully think through is how the subsetting of fish lengths that have ages would impact the composition expansions. Removing a portion of lengths from a trip or haul would result in the remaining lengths, which may vary from those removed, having increased influence in the expansions which may not be ideal. Ultimately, I posted this issue to trigger thought on this topic so was as a collective can decide if this is something we should be considering for future assessment cycles.

kellijohnson-NOAA commented 1 year ago

@chantelwetzel-noaa thank you for bringing this up. Do you know of any published references that have looked into this? Or, do you know what other centers do? @Cole-Monnahan-NOAA have you ever thought about removing the length information for fish that have length information and aged but you are not using conditional age at length data for the Alaska assessments?

Cole-Monnahan-NOAA commented 1 year ago

@kellijohnson-NOAA I can't speak to the processing steps you're using, but yes we have examples where marginal lengths are removed in lieu of marginal ages. Or are put in as ghost data. This happens b/c we don't have in-season ages available so we use marginal lengths for a year and then drop them the next cycle in place of marginal ages if they exist. I think we're more generally moving to CAAL, but this delay in age data would happen there too.

iantaylor-NOAA commented 1 year ago

I'm not convinced that this extra complexity would make any difference in the results. For many species we have lengths from 1000s of fish per year but end up with adjusted input sample sizes less than 100 because we are accounting for the fact that the fish are not independent and identically distributed random samples from a panmictic population. I don't see how having an age and a length from the same fish is fundamentally different than two lengths from similar size fish in the same haul. In both cases the data-weighting methods should account for lack of independence and we surely always end up with an adjusted sample size far below the number of unique fish that were sampled.

If folks really want this feature, I'm not against using it, I just don't think it will make any difference and thus not be worth the effort to implement it.