Closed kellijohnson-NOAA closed 1 year ago
@kellijohnson-NOAA and @KSten Washington's process on assigning ages in described in this issue. The matter of unassigned ages in PacFIN is an issue I plan to raise with them after the assessment. Ultimately, I am plan to exclude those 700 canary age samples.
Thanks for working on this @kellijohnson-NOAA. I didn't get to it today but will try out the branch tomorrow.
The fix-getAge
branch worked just fine for petrale.
Here's the relevent subset of the output from cleanPacFIN()
:
`getAge()` summary information -
v 0 rows were missing a final age
i The distribution (in numbers) for fish aged 0--32 years is 0, 11, 57, 1023, 6790, 17528, 22234, 19810, 13586, 8457, 5487, 3398, 2366, 1505, 996, 601,
347, 185, 156, 74, 61, 38, 16, 12, 14, 5, 5, 5, 1, 3, 1, 0, 2
i Age methods S and B were present
i Age methods 'B', or 'S' were desired
i 100 ages used undesired age methods
v Number of ages by age (years) changed to `NA_integer_` is age-3 (n = 1), age-4 (n = 11), age-5 (n = 19), age-6 (n = 28), age-7 (n = 22), age-8 (n =
10), age-9 (n = 7), age-10 (n = 1), age-12 (n = 1)
`getAgeMethod()` summary information -
i Age methods were originally coded to 'NA', 'S', '2', '9', '1', 'BB', 'N', 'B', or 'UNK'
i Age methods are now coded to B (n = 26342), B--S (n = 528), S (n = 77904), and NA (n = 158329)
i Number of samples (n) per ageing method for each read
AGE_METHOD1 AGE_METHOD2 AGE_METHOD3 n
1 B B B 28
2 B B <NA> 1376
3 B S <NA> 299
4 B <NA> B 13
5 B <NA> <NA> 24373
6 S B <NA> 583
7 S S <NA> 573
8 S <NA> <NA> 77330
9 <NA> B B 40
10 <NA> B <NA> 159
11 <NA> <NA> <NA> 158329
I skimmed over the code changes but it's a lot and I don't see value in digging in deeper. I have one question and two suggestion.
First, why NA_integer_
in this line? (Aha, I just read documentation of NA at https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html. This is new to me. I'm still not sure that it's more clear this way than just "NA")
Second, I think it would be useful to have a list of the age methods that got removed instead of just "i 100 ages used undesired age methods". Since most of use don't know the mapping of things like 2, N, and UNK to "B" and "S".
Third, I think it would be helpful to add the resulting age_method
value to the table reported by summaryAgeMethod()
as in the following (which I hacked together by adding age_method
to Pdata
within getAgeMethod()
but you could find a more elegant solution).
AGE_METHOD1 AGE_METHOD2 AGE_METHOD3 age_method n
1 B B B B 28
2 B B <NA> B 1376
3 B S <NA> B 118
4 B S <NA> B--S 181
5 B <NA> B B 13
6 B <NA> <NA> B 24373
7 S B <NA> B 235
8 S B <NA> B--S 347
9 S B <NA> S 1
10 S S <NA> S 573
11 S <NA> <NA> S 77330
12 <NA> B B B 40
13 <NA> B <NA> B 159
14 <NA> <NA> <NA> <NA> 158329
@iantaylor-NOAA I was able to easily add the age_method
column by just including lower-case terms in the search when doing the summary but I think that users would benefit from a better column name for the 4th column in your third point. Do you have any suggestions? Once I fix this column name the changes can be merged in.
How about age_methods
? Is that too subtle? It could also be something like age_method_summary
.
thanks @iantaylor-NOAA for the suggestions. The actual column name is just for explanatory purposes of what is in the summary table ... it does not actually "stay with" the data. That column, set in cleanPacFIN is called age_method
. So, I do not know which is better (1) leave it age_method
and have it match the resulting column but need some text to explain what it means or (2) have a more verbose column name for the summary table and not need additional text explaining the table?
If it doesn't stay with the data, how about the more verbose: age_method_summary
.
I was able to leverage tidyverse style and have spaces in the column name; so, I think the following is sufficient:
ℹ Number of samples (n) per combinations of ageing methods
AGE_METHOD1 AGE_METHOD2 AGE_METHOD3 Age method for best age n
1 B B B B 28
2 B B <NA> B 1376
3 B S <NA> B 118
4 B S <NA> B--S 181
5 B <NA> B B 13
6 B <NA> <NA> B 24801
7 S B <NA> B 235
8 S B <NA> B--S 347
9 S B <NA> S 1
10 S S <NA> S 573
11 S <NA> <NA> S 77330
12 <NA> B B B 40
13 <NA> B <NA> B 159
14 <NA> <NA> <NA> <NA> 157904
@iantaylor-NOAA this one is for you 😍 ... I fixed
getAge()
to no longer average over ages. It wasn't doing this for petrale sole because all rows had a FINAL_FISH_AGE_IN_YEARS but this is not true for canary rockfish, black rockfish, or sablefish. See below. I also addedgetAgeMethod()
that now is used incleanPacFIN()
to add a column to Pdata calledage_method
that summarizes the ageing methods that were used to potentially come up with the FINAL_FISH_AGE_IN_YEARS, which is used for the Age column. All entries in Age were exposed to at least one of the desired ageing methods. If you could install this branch and runcleanPacFIN(verbose = TRUE)
on your bds data that would be amazing :trophy:. I basically just want to know if it works for you and if the new column will be sufficient? You do not have to go through the code line by line.@brianlangseth-NOAA the output of
getAge()
is a small attempt at your request for more informative output. If you could also install this branch and rungetAge(bds.pacfin, verbose = TRUE, keep = c("B", "S", "M"))
and provide feedback on the output that would be great.Thanks all! And, this is still a draft pull request because I still have a few things to clean up and I do not want the code merged in yet.
Below is a summary for @brianlangseth-NOAA, @okenk, @kellijohnson-NOAA, and @shcaba because they are the only 🍀 individuals this year that have assessments with biological data that have age reads but no FINAL_FISH_AGE_IN_YEARS entry for commercial catch (see below). I emailed Kristen Hinton to ask what we should do for Washington samples and Brenda Erwin for the only sablefish sample in California.
The addition of
getAgeMethod()
will close #95