pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Fix `getAge()` to **NOT** average #109

Closed kellijohnson-NOAA closed 1 year ago

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA this one is for you 😍 ... I fixed getAge() to no longer average over ages. It wasn't doing this for petrale sole because all rows had a FINAL_FISH_AGE_IN_YEARS but this is not true for canary rockfish, black rockfish, or sablefish. See below. I also added getAgeMethod() that now is used in cleanPacFIN() to add a column to Pdata called age_method that summarizes the ageing methods that were used to potentially come up with the FINAL_FISH_AGE_IN_YEARS, which is used for the Age column. All entries in Age were exposed to at least one of the desired ageing methods. If you could install this branch and run cleanPacFIN(verbose = TRUE) on your bds data that would be amazing :trophy:. I basically just want to know if it works for you and if the new column will be sufficient? You do not have to go through the code line by line.

@brianlangseth-NOAA the output of getAge() is a small attempt at your request for more informative output. If you could also install this branch and run getAge(bds.pacfin, verbose = TRUE, keep = c("B", "S", "M")) and provide feedback on the output that would be great.

Thanks all! And, this is still a draft pull request because I still have a few things to clean up and I do not want the code merged in yet.

Below is a summary for @brianlangseth-NOAA, @okenk, @kellijohnson-NOAA, and @shcaba because they are the only 🍀 individuals this year that have assessments with biological data that have age reads but no FINAL_FISH_AGE_IN_YEARS entry for commercial catch (see below). I emailed Kristen Hinton to ask what we should do for Washington samples and Brenda Erwin for the only sablefish sample in California.

   PACFIN_SPECIES_CODE AGENCY_CODE    N
1                 BLCK           W   13
2                 CNRY           W  700
3                 DSRK           W   23
4                 EGLS           W  309
5                 LCOD           W   16
6                 NANC           W   10
7                  POP           W 2288
8                 REYE           W  472
9                 SABL           C    1
10                SABL           W    1
11                WDOW           W   27
12                YEYE           W   51
13                YTRK           W   20

The addition of getAgeMethod() will close #95

brianlangseth-NOAA commented 1 year ago

@kellijohnson-NOAA and @KSten Washington's process on assigning ages in described in this issue. The matter of unassigned ages in PacFIN is an issue I plan to raise with them after the assessment. Ultimately, I am plan to exclude those 700 canary age samples.

iantaylor-NOAA commented 1 year ago

Thanks for working on this @kellijohnson-NOAA. I didn't get to it today but will try out the branch tomorrow.

iantaylor-NOAA commented 1 year ago

The fix-getAge branch worked just fine for petrale. Here's the relevent subset of the output from cleanPacFIN():

  `getAge()` summary information -
v 0 rows were missing a final age
i The distribution (in numbers) for fish aged 0--32 years is 0, 11, 57, 1023, 6790, 17528, 22234, 19810, 13586, 8457, 5487, 3398, 2366, 1505, 996, 601, 
  347, 185, 156, 74, 61, 38, 16, 12, 14, 5, 5, 5, 1, 3, 1, 0, 2
i Age methods S and B were present
i Age methods 'B', or 'S' were desired
i 100 ages used undesired age methods
v Number of ages by age (years) changed to `NA_integer_` is age-3 (n = 1), age-4 (n = 11), age-5 (n = 19), age-6 (n = 28), age-7 (n = 22), age-8 (n =   
  10), age-9 (n = 7), age-10 (n = 1), age-12 (n = 1)
  `getAgeMethod()` summary information -
i Age methods were originally coded to 'NA', 'S', '2', '9', '1', 'BB', 'N', 'B', or 'UNK'
i Age methods are now coded to B (n = 26342), B--S (n = 528), S (n = 77904), and NA (n = 158329)
i Number of samples (n) per ageing method for each read
   AGE_METHOD1 AGE_METHOD2 AGE_METHOD3      n
1            B           B           B     28
2            B           B        <NA>   1376
3            B           S        <NA>    299
4            B        <NA>           B     13
5            B        <NA>        <NA>  24373
6            S           B        <NA>    583
7            S           S        <NA>    573
8            S        <NA>        <NA>  77330
9         <NA>           B           B     40
10        <NA>           B        <NA>    159
11        <NA>        <NA>        <NA> 158329

I skimmed over the code changes but it's a lot and I don't see value in digging in deeper. I have one question and two suggestion.

First, why NA_integer_ in this line? (Aha, I just read documentation of NA at https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html. This is new to me. I'm still not sure that it's more clear this way than just "NA")

Second, I think it would be useful to have a list of the age methods that got removed instead of just "i 100 ages used undesired age methods". Since most of use don't know the mapping of things like 2, N, and UNK to "B" and "S".

Third, I think it would be helpful to add the resulting age_method value to the table reported by summaryAgeMethod() as in the following (which I hacked together by adding age_method to Pdata within getAgeMethod() but you could find a more elegant solution).

   AGE_METHOD1 AGE_METHOD2 AGE_METHOD3 age_method      n
1            B           B           B          B     28
2            B           B        <NA>          B   1376
3            B           S        <NA>          B    118
4            B           S        <NA>       B--S    181
5            B        <NA>           B          B     13
6            B        <NA>        <NA>          B  24373
7            S           B        <NA>          B    235
8            S           B        <NA>       B--S    347
9            S           B        <NA>          S      1
10           S           S        <NA>          S    573
11           S        <NA>        <NA>          S  77330
12        <NA>           B           B          B     40
13        <NA>           B        <NA>          B    159
14        <NA>        <NA>        <NA>       <NA> 158329
kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA I was able to easily add the age_method column by just including lower-case terms in the search when doing the summary but I think that users would benefit from a better column name for the 4th column in your third point. Do you have any suggestions? Once I fix this column name the changes can be merged in.

iantaylor-NOAA commented 1 year ago

How about age_methods? Is that too subtle? It could also be something like age_method_summary.

kellijohnson-NOAA commented 1 year ago

thanks @iantaylor-NOAA for the suggestions. The actual column name is just for explanatory purposes of what is in the summary table ... it does not actually "stay with" the data. That column, set in cleanPacFIN is called age_method. So, I do not know which is better (1) leave it age_method and have it match the resulting column but need some text to explain what it means or (2) have a more verbose column name for the summary table and not need additional text explaining the table?

iantaylor-NOAA commented 1 year ago

If it doesn't stay with the data, how about the more verbose: age_method_summary.

kellijohnson-NOAA commented 1 year ago

I was able to leverage tidyverse style and have spaces in the column name; so, I think the following is sufficient:

ℹ Number of samples (n) per combinations of ageing methods
   AGE_METHOD1 AGE_METHOD2 AGE_METHOD3 Age method for best age      n
1            B           B           B                       B     28
2            B           B        <NA>                       B   1376
3            B           S        <NA>                       B    118
4            B           S        <NA>                    B--S    181
5            B        <NA>           B                       B     13
6            B        <NA>        <NA>                       B  24801
7            S           B        <NA>                       B    235
8            S           B        <NA>                    B--S    347
9            S           B        <NA>                       S      1
10           S           S        <NA>                       S    573
11           S        <NA>        <NA>                       S  77330
12        <NA>           B           B                       B     40
13        <NA>           B        <NA>                       B    159
14        <NA>        <NA>        <NA>                    <NA> 157904