pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

restore the combineCalCOM() function #108

Closed iantaylor-NOAA closed 1 year ago

iantaylor-NOAA commented 1 year ago

Adding back function as discussed in #101.

iantaylor-NOAA commented 1 year ago

Note that there remains a mismatch in capitalization between the function name and the .R filename: combineCalCOM() and CombineCALCOM.R (where the .Rd file matches the function).

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA thanks for the pull request. I am working on this now. Do you mind if I push changes to this branch? There are some column names that change depending on certain things and I want to make the code robust to that. There are some additional columns that are not needed. Finally, I would like to use dplyr::join_*() rather than rbind() to bring the data together because I think the former is a safer choice.

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA I think that this is ready to be squashed into a single commit and merged in. The kicker here is that there are double reads that are not present in your data but are present in CALCOM. So, the function works with the information in the file that you provided but it would not work if the CALCOM data were provided in a different manner. Users will need to set the age method manually, there is an alert telling users this.

iantaylor-NOAA commented 1 year ago

Thanks for all the work on this @kellijohnson-NOAA.

I just tried the current code in the CALCOM branch and noticed three minor issues.

First, I got the error

Error in changecol_pacfin(CalCOM) : 
  A fish-age column was not found in your data
Age or FISH_AGE_YEARS_FINAL work.

which seems to be caused by this line using "FINAL_FISH_AGE_IN_YEARS" instead of "FISH_AGE_YEARS_FINAL".

Second, the original combineCalCOM() defined SOURCE_AGID as "CalCOM" (link to code) but the current branch leads to NA values for that column.

Third, I see the message > Users must set AGE_METHOD for CalCOM data manually. but if I add an AGE_METHOD column to the CalCOM data before running combineCalCOM() I get busted with the warning about Error in combineCalCOM(Pdata = Pdata, CalCOM = CALCOM) : There are columns in CalCOM that cannot currently be processed by combineCalCOM(). I presume the intended workflow is to add AGE_METHOD after combining, but how about allowing it to happen before as well (which may be easier than adding it to the combined data).

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA and @gertsevv I think the "errors" are a order of operations thing. I assumed that you would not have already "cleaned" the PacFIN data that you are combining with the calcom data because users would combine them and then run cleanPacFIN() which is why the age and agency column names are not aligning. Cleaning the PacFIN data twice does not make sense to me and combineCalCOM doesn't necessarily clean the data.

Regarding the AGE_METHOD. I can easily add code to accommodate if the ageing method exists. I will do that right now.

iantaylor-NOAA commented 1 year ago

@kellijohnson-NOAA, thank you for working on this.

The script we're using (linked in the original issue #101, link repeated here) is based on a script that was developed for petrale sole in 2019. In that script, combineCalCOM() came before cleanPacFIN(). However, that order of operations didn't work when I first tried it in 2023. Adding a cleanPacFIN() step before combineCalCOM() first fixed the problem but for some reason it didn't occur to me that I should remove the extra combineCalCOM() that came after, perhaps I thought that it would do some cleaning of the CalCOM data as well, but I didn't actually check for this. So I'm happy to just remove the unnecessary second cleaning step.

Thank you for working on allowing AGE_METHOD.

The warning There are columns in CalCOM that cannot currently be processed by combineCalCOM() which is triggered by the presence of AGE_METHOD also interacts with our status-quo workflow because we re-use steps used in 2019 where temporary columns are added to the CalCOM data as part of the process of cleaning up some outliers where the total weight is missing or more than 50% above or below the sum of the individual weights estimated from the observed lengths. It won't be hard to remove those temporary columns, but even easier would be an option in combineCalCOM() to just warn you that the columns are being stripped away rather than have a hard stop.

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA would you mind trying out the latest changes but be sure to do them with an UNCLEANED Pdata.

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA should I be merging this in or are there more changes needed?

iantaylor-NOAA commented 1 year ago

Thanks for following up @kellijohnson-NOAA. I tried out the function last week but got pulled away before I could provide feedback and was intending to circle back today so your timing is good.

Running through things again just now it seems to mostly be working great but I faced a couple minor issues:

  1. I got Error in changecol_pacfin(CalCOM) : A fish-age column was not found in your data Age or FISH_AGE_YEARS_FINAL work. because the column name is "AGE". Commenting out that check allowed things to play through but I presume it's there for a reason so a better solution would be good.
  2. I think that AGENCY_CODE = "CalCOM" should be SOURCE_AGID = "CalCOM" As the current code leads to the following mismatch:

    r$> table(CombinedDat$AGENCY_CODE, CombinedDat$SOURCE_AGID, useNA = "always")
    
             C     O     W  <NA>
    CalCOM     0     0     0 53794
    <NA>   57568 52623 77654     0
kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA I think this is finally ready to be merged in. I will leave that up to you when you see fit.

iantaylor-NOAA commented 1 year ago

Function works great. Thanks for all the help @kellijohnson-NOAA. I will try to add to the documentation based on the petrale experience in the future, but don't have time for that now.