pbs-assess / gfiphc

:fishing_pole_and_fish: An R package for data extraction and analysis of groundfish data from the IPHC Longline Survey in British Columbia
3 stars 0 forks source link

For 2018 survey - Yelloweye assessment excludes the 131 new IPHC locations. Look into. #14

Open andrew-edwards opened 4 years ago

andrew-edwards commented 4 years ago

I just went through gfsynopsis to look for any striking anomalies.

For each of the following species, 2018 is the year with that species' highest catch rate for the time series plotted (looking at the figure by eye): Tope Shark Big Skate Giant Wrymouth Vermilion Rockfish Kelp Greenling (though looks like 2018 is the only observation) Red Irish Lord (ditto) Great Sculpin (ditto) Cabezon (ditto)

For these species, 2018 is the year with that species' lowest catch rate (excluding when there are several years with zero catch) for the time series plotted: Redbanded Rockfish Arrowtooth Flounder

So for the other species, 2018 is within the range of previous years. Still needs to be looked into, but the issue is more pertinent for the above species.

andrew-edwards commented 3 years ago

The issue is that 2018 had lots of extra stations. It seems from here that the Yelloweye assessment excluded those. Need to get that code into the standard analysis. This camp up late when we updated gfsynopsis to include the 2018 data.

andrew-edwards commented 3 years ago

Have incorporated details (from Dana) of expansion stations as setDataExpansion. Also checked that those stations only appear in 2018, using: summary(filter(sp_set_counts$set_counts, station %in% filter(setDataExpansion, standard == "N")$station)) during data_for_one_species vignette (after commit 05c2c85). This includes 2019 data but not yet the 2020 data. Am adding a 'standard' column to the species set-level calculations to allow easy exclusion of the expanded stations when calculating time series.

andrew-edwards commented 3 years ago

However.... there seem to be 20 stations in Dana's list but not in gfbio, and 7 stations in gfbio but not in Dana's:

> from_dana <- setDataExpansion$station
> from_gfbio <- dplyr::filter(get_iphc_sets_info(), year == 2018)$station
> setdiff(from_dana, from_gfbio)  # 2018 stations in Dana's list but not in gfbio:
 [1] "2094" "2102" "2116" "2138" "3001" "3002" "3003" "3004" "3005" "3008"
[11] "3009" "3010" "3011" "3012" "3013" "3210" "2241" "2248" "2256" "2257"
> setdiff(from_gfbio, from_dana)  # 2018 stations in gfbio but not in Dana's list:
[1] "2338" "2336" "2337" "2342" "2341" "2339" "2340"

The gfbio 7 all come as being 'usable' as declared by the IPHC, so that's not it. The 20 in Dana's list but not gfbio aren't such an issue (presumably they just weren't fished), but the other 7 need clarifying.

andrew-edwards commented 3 years ago

See Dana's new file in Teams comment.

andrew-edwards commented 3 years ago

Fixed the 7 stations in gfbio but not in Dana's list, thanks to her updated file. Commit aed55ac. They all automatically now have standard = "N", whereas they were NA.

andrew-edwards commented 3 years ago

All automatically fixed now. Have gone through data_for_all_species vignette for all species and compared to original gfsynopsis and made some notes. But doing through the above list will be better done with gfysnopsis figures since I won't have to think about which Series is being plotted (and can just compare new plot with old).