pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Update the sql.bds function to only pull from current data tables #68

Closed chantelwetzel-noaa closed 1 year ago

chantelwetzel-noaa commented 2 years ago

Issue The existing sql code pull data from both the pacfin_marts.COMPREHENSIVE_BDS_COMM and pacfin.bds_sample_odfw tables with the ODFW table providing the number of unknown sex samples and weights. However, the ODFW is no longer being maintained

Potential Fix Update the sql.bds function to only use the pacfin_marts.COMPREHENSIVE_BDS_COMM to retrieve bds data. Processing functions in the package will need to be checked to ensure that removing the ODFW UNK_NUM and UNK_WT columns will not impact calculations.

kellijohnson-NOAA commented 2 years ago

They are used in downstream calculations, so it won't be as easy as just removing them from the sql call, though I agree that they should be removed.

kellijohnson-NOAA commented 1 year ago

@chantelwetzel-noaa needs me to actually work on this, high priority.

chantelwetzel-noaa commented 1 year ago

I did a test of removing the portion of code in the sql.bds function that pulls from these older tables for Dover sole. All functions used in the data processing and expansion were able to run and there was no difference in the composition data. I modified the function as:

sql.bds <- function(pacfin_species_code) {
  spid <- paste0("('", paste(pacfin_species_code, collapse = "','"), "')")
  sqlcall <- paste0(
  "Select s.*",
  # Consider changing VESSEL_NUM to VESSEL_ID b/c less NULL
          # xxx TOTAL_WGT, might be spp weight
          # xxx WGTMAX, i don't think this one matters
          # xxx WGTMIN, i don't think this one matters
          # xxx all_cluster_sum, done with ave
   "from
          pacfin_marts.COMPREHENSIVE_BDS_COMM s
   where
              s.PACFIN_SPECIES_CODE = any ", spid)
  sqlcall <- gsub("\\n", "", sqlcall)
   return(sqlcall)
}

I think the portion of code that was using the unknown numbers and weight from Oregon samples has been updated and is no longer required. I propose that we update the sql.bds function.

kellijohnson-NOAA commented 1 year ago

@chantelwetzel-noaa I will get this merged in or fixed by tomorrow morning, 🤞

kellijohnson-NOAA commented 1 year ago

@chantelwetzel-noaa thank you so much for specifying which species you did the test for. Unfortunately, the data is VERY messy and dover has fairly clean data. The two columns needed from the OR table are still being used in the code, it was just that the backup protocol for dover is not really needed or provides the same information as the oregon table. For example there are only 9 sample_numbers that do not have the same information in CLUSTER_WEIGHT_LBS. This is low compared to other species like cabezon that has 857 😱 So, I am still investigating things. I think that I might need to talk to @aliwhitman to get this ironed out.

kellijohnson-NOAA commented 1 year ago

There are records in the UNK_WT column that do not match the total sample weight - weight of males - weight of females. So, I should of just listened to @chantelwetzel-noaa months ago when she suggested that we remove the dependency on the outdated pacfin.bds_sample_odfw table that PacFIN is no longer maintaining. Thank you @chantelwetzel-noaa for the suggestion and being patient with the lack of fix. I will have this removed by the EOW.