pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Clarify summary output from cleanPacFIN #82

Open brianlangseth-NOAA opened 1 year ago

brianlangseth-NOAA commented 1 year ago

Is your feature request related to a problem? Please describe. Yes. I had a hard time figuring out what elements were cleaned from pacfin bds and which were not.

Describe the solution you'd like Can the summary output from cleanPacFIN (e.g. that added below) specify which concerning items are actually removed within it? Take canary bds for example

The samples 'cleaned' include the 9108 samples with "bad" sample type, the 199 with "bad" sample method, and an unspecified 8828 samples taken in areas outside the US. All other elements were not cleaned.

I have a few suggestions:

  1. Can the checks resulting in samples being removed be marked differently than the checks that are more for informational purposes? So, for the above, can sample_type and sample_method be marked differently. One suggestion is to finish the lines with "and removed if CLEAN"
  2. Can you add a summary of samples outside US waters to the end of the output? I know that the longer description includes it - i.e. the black text in my attached imagine. Maybe its always directly above, but maybe not. It would be nice to have a single summary line, something like "N outside US waters, an removed if CLEAN: ...." at the end of output.

Describe alternatives you've considered Specify in the document what checks result in removal if CLEAN. I read the CLEAN section in the documentation for cleanPacFIN and that didn't specify. If anything it confused me more because it says

"many early length compositions do not have information on the weight of fish that were sampled, and thus, there is no way to infer how much the entire sample weighed or how much the tow/trip weighed. Therefore, these data cannot be expanded and are removed using CLEAN = TRUE"

Based on the code the weight check only applies to Oregon. I would recommend that be clarified in the documentation as well.

Additional context

I also noticed that the output "N without length and Age: 61614" should perhaps be "N without one of length or Age: 61614" based on the code

image