Add new functions to calculate expanded composition data and input sample sizes

chantelwetzel-noaa commented 3 months ago

New functions:

get_expanded_comps() expands length or age composition data to the tow and strata level. This new function replaces the SurveyLFs.fn() and SurveyAFs.fn(). The new function calculates the expanded composition data in the same manner as the existing functions but more streamlined. If two_sex_model = TRUE and output = "full_expansion_ss3_format" in the get_expanded_comps(), then a list of sexed and unsexed composition data is returned. This new function also includes an input_sample_size_method input and will automatically calculate the input sample size.
get_input_n() calculates the input sample size for composition data and is designed to replace the GetN.fn. This function outputs a table that includes the number of unique tows, total samples, and the input sample size based on the Stewart and Hamel approach by composition type (sexed or unsexed). This function is now called within get_expanded_comps() that includes a function input input_sample_size_method which will add input sample size to the formatted Stock Synthesis composition data.
The get_raw_comps() is modified to have similar function inputs as get_expanded_comps() to improve consistency and to support alternative approaches to calculate input sample size within the function.
get_species_info() is designed to replace GetSpp.fn(). The get_species_info() adds a column to the dataframe output to specify the Stewart and Hamel species group for determining input samples sizes by species type.

The soft deprecation warning references a new tag number that still needs to be added. I can never remember if the tag should be added in the branch being merged in or should be added to the main branch post-merge. Please let me know your preference and I will deal with that.

kellijohnson-NOAA commented 3 months ago

@chantelwetzel-noaa I have set aside time both tomorrow and Friday to look at this.

chantelwetzel-noaa commented 3 months ago

Thank you. I was looking at the issue list for the package and saw the issue that suggests moving warning messages to the cli package. I would be happy to also incorporate that change in this pull request if you agree with that move.

kellijohnson-NOAA commented 3 months ago

{cli} is amazing, it is what sdmTMB uses. I have pirated some of their code for learning how to do messages and warnings in ss3sim this way 😃!

kellijohnson-NOAA commented 3 months ago

@chantelwetzel-noaa I am working with limited internet so suggestions will come in file by file. Sorry for the delay. I will start reviewing get_expanded_comps() now but probably won't be able to send comments for a little bit.

kellijohnson-NOAA commented 2 months ago

@chantelwetzel-noaa I am really struggling with get_expanded_comps() because the function name does not encompass all that the function does. I see that it

merges 3 data frames together
does some error checking, prints messages
filters the data without letting users know
creates stage-1 expansions for sexed comps, unsexed comps, and all comps
creates stage-2 expansions for sexed comps, unsexed comps, and all comps
formats the 3 types of output from stage-2 expansions for SS3 It is almost like each of the items in the list need their own function with documentation and then maybe a wrapper function can be created. It is fine if you do not agree with this logic, I will approve the pull request after the stage-1 and stage-2 expansion equations are added to the documentation so people know how things are being expanded. It was hard to check the code without knowing how the expansions should be done.

One more thing, I think that the code could have less duplication if you use purrr::map() after splitting the data. Here is an example, and a note that I am happy to help in any way that I can. I fully realize that code that there is no code that will be perfect and there was nothing wrong with this pull request, all of my comments are just suggestions :)

# Two helper functions
  make_long_comp <- function(data) {
    data |>
      dplyr::group_by(year, full_bin, sex) |>
      dplyr::summarize(
        expansion = sum(expansion)
      ) |>
      dplyr::group_by(year) |>
      dplyr::mutate(proportion = prop.table(expansion)) |>
      dplyr::select(-expansion)
  }
  pivot_wider_comp <- function(data) {
    data_long <- data |>
      tidyr::separate(full_bin, into = c("bin", "group")) |>
      dplyr::mutate(
        bin_factor = forcats::fct_cross(
          forcats::as_factor(as.numeric(bin)),
          forcats::as_factor(sex)
        )
      )
    temporary_levels <- levels(data_long[["bin_factor"]])
    data_wide <- data_long |>
      tidyr::pivot_wider(
        id_cols = year,
        names_from = bin_factor,
        values_from = proportion
      ) |>
      dplyr::select(year, temporary_levels)
    return(data_wide)
  }
# Using the helper functions to create a list of length 3, basically one list with each of the output types for SS3, this would replace quite a bit of code that starts with the creation of `total_by_year`
  attempt <- stratum_exp |>
    tidyr::pivot_longer(
      cols = c(-year, -strata, -bin),
      names_to = "sex",
      values_to = "expansion"
    ) |>
    dplyr::mutate(
      sex_group = ifelse(sex %in% c("female", "male"), "sexed", sex),
      full_bin = paste(bin, sex_group, sep = ":")
    ) |>
    dplyr::group_by(sex_group) |>
    dplyr::group_split() |>
    purrr::map(make_long_comp) |>
    purrr::map(pivot_wider_comp)

chantelwetzel-noaa commented 2 months ago

@kellijohnson-NOAA I have worked through all your comments and made changes based on most of them. There are some questions in my responses. The two key ones are 1) when should I apply a new tag? and 2) how to delete and transfer a commit history for a file? Let me know if you find new issues or issues that you would like looked at again.

chantelwetzel-noaa commented 2 months ago

@kellijohnson-NOAA I think I have gone back and addressed each of your new comments. I opened an issue to circle back to using helper functions in the future once I have time to fully understand how the code works and have time to address some minor issues in the functionality. I think the only two outstanding items are how to delete a file while transferring the commit history to the replacement function and when to apply a new tag to the package. Thanks.

chantelwetzel-noaa commented 2 months ago

@kellijohnson-NOAA Let me know if there are additional revisions you would like to see in this branch prior to merging.

pfmc-assessments / nwfscSurvey

Add new functions to calculate expanded composition data and input sample sizes #145