Sample and tows calculated for sexed fish combined

chantelwetzel-noaa commented 1 year ago

These new commits add columns in the comp data frame output by the getComps() function that calculate a combined sample size and number of unique tows where fish were sexed (e.g., either F or M). writeComps()} now uses the "both" column to determine the input effective sample size for the composition data formatted as sex = 3 in Stock Synthesis. The separate male and female comps use the corresponding sex specific sample sizes and tows to calculate the input effective sample size for those data frames. The motivation for this change came from processing PacFIN data where there was a heavy mixture of sexed and unsexed data.

The changes in getComps() getcomps_long is not elegant but it works. I spent a large amount (slightly embarrassing) of time trying to do these calculation via the existing unsexed approach but could not get it to fully work.

kellijohnson-NOAA commented 1 year ago

This will be reviewed by EOD, thanks for submitting @chantelwetzel-noaa.

kellijohnson-NOAA commented 1 year ago

@chantelwetzel-noaa I looked over the code yesterday and what you did was really easy to understand, thank you. I am not certain about the naming and my future self being able to understand what exactly "both" means without looking at the code. I do not think this is an issue though because we can always change the names later. I worked on some tidyverse stuff to get rid of the complexity that is in this function. The trouble with it is that I am uncertain what the best output is. Below is the code, there are still groupings that are hardwired but I know how to change those to variables, just haven't done it yet.

test <- purrr::map(
  # Make different groups of sexes that you want to filter by
    list(
      females = c("F"),
      males = c("M"),
      sexed = c("F", "M"),
      unsexed = c("U"),
      all = c(LETTERS, NA)
    ),
  # Filter `data` into non-mutually exclusive data.frames based on sex grouping
  # A list of data frames is returned
    .f = ~ dplyr::filter(data, .data[[sexn]] %in% .x) %>%
  # Group by everything but the age or length bins
      dplyr::group_by(
        fleet, fishyr, season
      ) %>%
  # Calculate the number of tows
      dplyr::mutate(
        tows = dplyr::n_distinct(.data[[towid]])
      ) %>%
  # Now group by everything
      dplyr::group_by(
        fleet, fishyr, season, lengthcm
      ) %>%
  # Calculate the weight and number of samples
      dplyr::summarise(
  # Testing out some fancy naming scheme but it will be unneeded if we leave
  # everything as a list of data frames
        "{LETTERS[1]}_value" := sum(.data[[weightid]]),
        samps = sum(FREQ),
  # This is just carrying forward the number of tows
        tows = unique(tows)
      )
  ) %>%
  # Bring everything into a single data frame
    purrr::reduce(
      .f = dplyr::full_join,
      by = c("fleet", "fishyr", "season", "lengthcm")
    ) %>%
  # Replace all NA values from the combining of data with zeros
    replace(is.na(.), 0)

So, I think we should just merge in what you have and then worry about {tidyverse} changes.

chantelwetzel-noaa commented 1 year ago

@kellijohnson-NOAA Thank you for working out the potential replacement code. When I have a moment I will walk through it and make sure I grasp what it is doing (thank you for the comments!). I will go ahead and merge this branch and when we have time we can look at moving to the new code and whether additional revisions will be needed based on the output to the writeComps function.

pfmc-assessments / PacFIN.Utilities

Sample and tows calculated for sexed fish combined #89