pbs-assess / gfiphc

:fishing_pole_and_fish: An R package for data extraction and analysis of groundfish data from the IPHC Longline Survey in British Columbia
3 stars 0 forks source link

Extra NA rows in 2022 #27

Open seananderson opened 1 month ago

seananderson commented 1 month ago

In 2022, there appear to be many extra NAs. E.g. looking at the set counts for dogfish:

# from my gfsynosis cache:
gfiphc_dat <- readRDS("report/data-cache-2024-05/iphc/north-pacific-spiny-dogfish.rds")$set_counts
dplyr::filter(gfiphc_dat, year == 2022, is.na(N_it20), is.na(N_it))
# A tibble: 174 × 12
    year station   lat   lon  E_it  N_it  C_it E_it20 N_it20 C_it20 usable standard
   <dbl> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl> <chr>  <fct>   
 1  2022 2324     54.0 -131.    NA    NA    NA     NA     NA     NA Y      N       
 2  2022 2150     53.8 -131.    NA    NA    NA     NA     NA     NA Y      Y       
 3  2022 2318     53.8 -131.    NA    NA    NA     NA     NA     NA Y      N       
 4  2022 2147     53.7 -131.    NA    NA    NA     NA     NA     NA Y      Y       
 5  2022 2148     53.7 -131.    NA    NA    NA     NA     NA     NA Y      Y       
 6  2022 2145     53.5 -131.    NA    NA    NA     NA     NA     NA Y      Y       
 7  2022 2142     53.3 -131.    NA    NA    NA     NA     NA     NA Y      Y       
 8  2022 2139     53.2 -131.    NA    NA    NA     NA     NA     NA Y      Y       
 9  2022 2297     53.2 -131.    NA    NA    NA     NA     NA     NA Y      N       
10  2022 2295     53.2 -132.    NA    NA    NA     NA     NA     NA Y      N       
# ℹ 164 more rows

These appear to just be from a bad join. The 'real' data are also there with non-NA data for those same stations.

E.g.

filter(gfiphc_dat, year == 2022, station == 2324)
# A tibble: 2 × 12
   year station   lat   lon  E_it  N_it  C_it E_it20 N_it20 C_it20 usable standard
  <dbl> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl> <chr>  <fct>   
1  2022 2324     54.0 -131.    NA    NA    NA   1.61      0      0 Y      N       
2  2022 2324     54.0 -131.    NA    NA    NA  NA        NA     NA Y      N