pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Write comps #84

Closed chantelwetzel-noaa closed 1 year ago

chantelwetzel-noaa commented 1 year ago

This writeComps branch:

  1. Corrects how the sum1 and rounding of data frame is applied because the previous approach was not applying this function across all the composition data frames (FthenM, Mout, F,out, Uout).
  2. Update some of the column names for the output dataframes to align with the column names created by the nwfscSurvey package which facilitates the use of nwfscSurvey plot_comps function to work on a composition data frame. Note, the way writeComps outputs a csv file has not been changed and users will continue to need to cut out a specific composition data frame (e.g., FthenM) for use with nwfscSurvey::plot_comps function.

I pulled the main branch into the writeComps branch to confirm that there were no merge conflict, so all other commits in this pull request are commits made to the main branch.

kellijohnson-NOAA commented 1 year ago

@chantelwetzel-noaa it is difficult for me to see what the exact changes are in the branch because of merging in main. I am going to try and create a more linear commit history locally to see exactly what the changes are. Quickly though, the one thing that I see that makes me nervous is changing column names given that there is not a universal function to work up the data, i.e., changing a column name will more than likely break people's downstream code that they have for previous assessments. So, I would like to think about this a little bit more before it gets merged in.

chantelwetzel-noaa commented 1 year ago

@kellijohnson-NOAA The column change is minimal. There are three changes that were made to fix column names for a few reasons.

1, Modify "fishyr" to "year". This revision was done to align with how this column is labeled in the nwfscSurvey package to support use of the plot_comps function potentially on composition files created by this package. 2, Correction (lines 441:445) because the "Ninput" column name was being applied to the incorrect column.

  1. Change the column names for the length/age bins to be clearer. All bins in the past were either labeled "L" or "A" with no indication of the sex in the column name. This changes the column names for the composition bins to be clearer by either being F10, F12, M10, M12, U10, or F.10, M.10, U.10 for the repeater sex specific composition data.

Yes, these changes made break people's downstream processing code (it did mine but required minor revisions). However, I think the improved clarity and correction in column names is worth it.

There are two other revisions to the code that should be considered (perhaps just pull in these changes) for the main branch. The first revision are corrections to the sum1 and digits functions based on the revised bin column names to ensure that they were applied correctly.

The one change that could be of minor importance is on line 165 that deals with how the number of tows is set for FthenM composition data. The previous approach set it equal to all tows which results in a number of tows that is not representative when there is a large presences of unsexed fish in the data. The revision sets this value to the sum of ftows and mtows which more closely aligns with the sex-specific tows but still may not be perfect given that there could be one tow with both males and females. I think this can be further improved but a full fix would require additional work in the earlier expansion functions which was beyond what I was doing within this function.

chantelwetzel-noaa commented 1 year ago

I fine is this is not pulled into the main branch at this time, since I can still continue to use these revisions in my own work.

kellijohnson-NOAA commented 1 year ago

Thanks @chantelwetzel-noaa for the detailed explanations. It helped me tremendously to do a local revision of the order of commits so I could separate out what was being changed for this branch versus what was changed in the main branch between when you started this branch and now.

I love the change in column name from fishyr to year. I fully support year in all instances and I wonder if it would be worth changing it in one fell swoop for the entire package? I found 55 instances of fishyr that could still be changed :) Anyway I am still looking through the commits and I love the changes, I just want to find the best way to bring them in so when I look at the commit history it is clear for me what exactly changed.

chantelwetzel-noaa commented 1 year ago

Absolutely. I don't want to mess up anyone's workflow. Given that many people likely have not extensively started working of PacFIN bds data, I wanted to try to get this into the main branch now or wait until after the assessment cycle. For once, I finally followed @kellijohnson-NOAA wise words of pulling in changes in the main branch prior to submitting the pull request, however, I acknowledge that this made determining the specific changes a little more challenging.

In regards of changing fishyr to year through the package, I think that is reasonable if this can be easily done. However, since I think users interaction with column is probably limited except when the composition data is written out to a csv.

kellijohnson-NOAA commented 1 year ago

Merge away!

As a side note ... I checked out writeComps locally and rebased to origin/main to get all of the most recent commits at the top of the stack. The following led to

git checkout -b writeComps
git rebase origin writeComps
git reset --h 2452f4d
git rebase origin/main
git cherry-pick 4dda26c
git cherry-pick 8693650
git log --oneline -n 4

image