pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Issue with aggregate function in getExpansion_2 via application to canary age comp expansion #104

Open brianlangseth-NOAA opened 1 year ago

brianlangseth-NOAA commented 1 year ago

Describe the bug Im encountering an issue with the getExpansion_2 function when I try to expand age comps for canary rockfish across the entire coast. The specific line where this issue occurs in when trying to figure out the trips that dont have catch values associated with them, line 162-163. When I expand based on state, I dont have this issue.

The reason why this is occurring is that when I try to expand across the coast, all trips with no catch also have NA for Sum_Sampled_Lbs. Thus aggregate(Sum_Sampled_Lbs ~...) does not work. When I expand across each state, there are more trips with no catch, and most of these have values for Sum_Sampled_Lbs.

This appears to be a very case-specific issue. In this case, when expanding across the coast my catches run from 1981 on but i have age comps for 1980. It just so happens to be that the only records with Sum_Sampled_Lbs==NA occur in 1980. When expanding by state, other years are added because not every state starts with catches in 1981, and thus there are some postivie Sum_Sampled_Lbs.

To Reproduce I cant really reproduce because I would have to share catches. However, a minimally reproducible example is below showing that the formula formulation of aggregate doesn't work when the y ~ value is all NA:

test = data.frame("A" = NA, "B" = rep(c("T","N"),2))
aggregate(A~B, data=test, length) #does not work and gives me the error I get for canary
aggregate(test$A, by = list(test$B), length) #works`

Additional context I see two solutions to this. First use the less elegant formulation for aggregate, like I do above. Within lines 162-163 this would look like

NoCatch <- stats::aggregate(tows[is.na(tows[, "catch"]), ]$Sum_Sampled_Lbs, 
                            by = list("fishyr" = tows[is.na(tows[, "catch"]), ]$fishyr, 
                                      "stratification" = tows[is.na(tows[, "catch"]), ]$stratification),
                            FUN= length)

Second, use a dplyr group_by formulation.

NoCatch <- tows[is.na(tows[, "catch"]), ] %>%
      dplyr::group_by(fishyr, stratification, ) %>%
      dplyr::summarize("N" = length(Sum_Sampled_Lbs))

I tried these two options and both ran, overcoming my original issue. I ran though length comps output for one column and the results were the same across all three approaches (the original, and then my two choices).

kellijohnson-NOAA commented 1 year ago

Thanks Brian. Would you mind sharing your catch file with me on google drive so I can download it and go through your script that you have where you work up the commercial composition data? I am trying to determine what the outcome should be to the user if the code enters this if statement.

kellijohnson-NOAA commented 1 year ago

More information "The if statement was being triggered for age records from the pacfin bds in years without corresponding catch years for that fleet. For example, there are 1980 age records but the pacfin catch file does not include 1980."

kellijohnson-NOAA commented 1 year ago

Make an error message when trying to expand composition data without corresponding catch.

brianlangseth-NOAA commented 1 year ago

@kellijohnson-NOAA An error message would be different than what has been done in the past, which is to post a warning. I haven't paid much attention to those, and in those cases I think the comps are left as they are.

Id be hesitant to stop the script without also providing knowledge on how to resolve the issue. Otherwise, what has previously "worked" now wont and folks would be frustrated. Perhaps something off cycle to resolve?

chantelwetzel-noaa commented 1 year ago

If there are no catches associated with a fleet for a year that has bds to expand, I think this is exactly the type of situation where we would want the code to stop in order to force the user to understand this mismatch. I think many instances it would likely be due to a processing error either in the catches or how the bds data was filtered which should be fixed. If there are truly zero catches for a year with bds data I think the user should understand why and be forced to think about if those data should be used and how they should be expanded.

As you noted above, most people blow by the warnings printed to the screen, and would see this issue not being addressed.