pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

restore combineCalCOM()? #101

Closed iantaylor-NOAA closed 1 year ago

iantaylor-NOAA commented 1 year ago

Is your feature request related to a problem? Please describe. PacFIN BDS has only 1 sample of Petrale Sole from California for the years prior to 1990, but CalCOM data pulled in 2011 and used in recent assessments includes 53,794 samples from 1949 to 1988. For the 2019 assessment @chantelwetzel-noaa used the combineCalCOM() function to merge these CalCOM samples with the PacFIN BDS data, but that function was deprecated in d5c0355795d10b945a9e28bfb05cb6811697c852 under the expectation that the CalCOM data were now all in PacFIN.

I have created a CALCOM branch in which I have restored the function to the state it was in before deprecation and then made some additional minor changes necessary to get it to work with the current PacFIN BDS code as extracted and processed by the current {PacFIN.Utilities}. It worked better if I ran cleanPacFIN() before running combineCalCOM() although I didn't actually check to see what was changing that made it work better that way.

The code where I'm using the restored combineCalCOM() is here: https://github.com/pfmc-assessments/petrale/blob/e0e20e04e837f98a2cf18c265b4dfcfb0b8546d0/R/process_pacfin_bds.R#L132-L161.

Describe the solution you'd like Option 1: If petrale is the only stock where biological data from CalCOM need to be combined with PacFIN BDS data, then we can move combineCalCOM() into the petrale repository. Option 2: If there are other stocks where CalCOM data isn't in PacFIN and combineCalCOM() would be useful, we could merge the branch and retain the function in {PacFIN.Utilities}, perhaps after testing it on the data for those stocks.

Describe alternatives you've considered

kellijohnson-NOAA commented 1 year ago

Thank you @iantaylor-NOAA for the pull request. Before I start looking at it, I am wondering if @EJDick-NOAA can comment on why these data are not in PacFIN?

kellijohnson-NOAA commented 1 year ago

Regarding your proposed solutions, I prefer option 2 to increase the visibility of the function. And, my apologies for assuming that all CalCOM data was in PacFIN and that we did not need this function. I now remember you telling me that you had old data you were dealing with but I blocked it out of my memory when I deleted the function.

EJDick-NOAA commented 1 year ago

I'd ping Brenda for the history. If memory serves, I think it's because the sampling procedure changed from "bin samples" to "cluster samples" and they want to avoid mixing them up. However, please check with Brenda to confirm.

iantaylor-NOAA commented 1 year ago

Thank you for the quick responses. @kellijohnson-NOAA, the function was deleted in 2021 when nobody was thinking as much about petrale. @EJDick-NOAA, I will write to Brenda for more info on the particulars of petrale data.

kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA any news from Brenda why these data are not included in PacFIN?

iantaylor-NOAA commented 1 year ago

@kellijohnson-NOAA, thanks for following up. I just found the complete but unsent email asking about this in my drafts folder. I presume that I got distracted (happens a lot these days). Now sent.

iantaylor-NOAA commented 1 year ago

Lightning-fast answer from Brenda:

Yes, we have the data but it's in separate tables because they are bin samples (# of fish) rather than cluster samples (weight based). PacFIN doesn't access those tables. It's not just PTRL but all the flatfish pre-1990.

Yes, it's probably something I should talk to PacFIN about.

If it's all the flatfish, then it seems like restoring combineCalCOM() is a good idea for now. I will create a pull request from the CALCOM branch.

@emilysellinger, do you and your team have California samples for Rex Sole from prior to 1990 from CalCOM as discussed in this issue?

kellijohnson-NOAA commented 1 year ago

I am slightly worried that the expansion code is not going to work for these samples 🤕

chantelwetzel-noaa commented 1 year ago

Yeah, there were issues with these data that required me to fill in some missing values. Ian has my messy code from 2019 that shows what I had to do at that time to include these data. However, I think this package has significantly evolved since then so I suspect additional work may be required to include these data.

iantaylor-NOAA commented 1 year ago

The code required to run the restored combineCalCOM() function was linked in the original post, but this link is updated and encompasses some additional steps: https://github.com/pfmc-assessments/petrale/blob/cfe76e3999adff848a95d34afcd85f041b2a765a/R/process_pacfin_bds.R#L31-L141.

I recognize that it's not ideal to have a function in the package that requires extra work and has question about it. But I think that's a better state to be in than having users track down the function history from before it was deprecated, or rely on one-off scripts that are more easily lost.

The figure below shows expanded California length comps for Petrale where the samples up to 1989 are from CalCOM and from 1990 onward from PacFIN after using the combineCalCOM() function in the CALCOM branch of this package. I don't see any obvious inconsistencies in distributions or sample sizes across that break so I'm OK going forward with these data in the model without further investigation this year.

I'll go forward with the pull request but don't feel obligated to merge it if this approach feels wrong.

comp_lendat_flt3mkt2_page2

kellijohnson-NOAA commented 1 year ago

Anyone know, maybe @EJDick-NOAA, what species this matters for? I am a terrible biologist, so when someone says "flatfish", a list of species does not automatically populate my 🤯. I am trying to write a check for certain pacfin species codes that would warn users they need to check for early CalCOM data.

iantaylor-NOAA commented 1 year ago

Thanks for working on this @kellijohnson-NOAA. Please feel free to push any changes to the branch you want. It appears that the function was written long ago and I have no ego attached to it's current state.

As for flatfish species, the PFMC Groundfish Fishery Management Plan has the following list of flatfish on page 17:

Common name scientific name
Arrowtooth flounder (turbot) Atheresthes stomias
Butter sole Isopsetta isolepis
Curlfin sole Pleuronichthys decurrens
Dover sole Microstomus pacificus
English sole Parophrys vetulus
Flathead sole Hippoglossoides elassodon
Pacific sanddab Citharichthys sordidus
Petrale sole Eopsetta jordani
Rex sole Glyptocephalus zachirus
Rock sole Lepidopsetta bilineata
Sand sole Psettichthys melanostictus
Starry flounder Platichthys stellatus
kellijohnson-NOAA commented 1 year ago

@iantaylor-NOAA can you put the CalCOM petrale data on the network so I can have an example data set?

iantaylor-NOAA commented 1 year ago

The file we used in the script referenced in https://github.com/pfmc-assessments/PacFIN.Utilities/issues/101#issuecomment-1499368894 is here: \nwcfile\FRAM\Assessments\Archives\PetraleSole\PetraleSole_2019_Update\6_data\fishery_comps\PetraleCALCOM_Query2011.csv

EJDick-NOAA commented 1 year ago

I used the following query to get samples sizes by species and year:

SELECT species, count(fish_no) as n_lengths, count(distinct(flatfish_bin_samples.sample_no)) as n_samples, year(sample_date) as yr from flatfish_bin_fish, flatfish_bin_samples where tlength is not null and flatfish_bin_fish.sample_no = flatfish_bin_samples.sample_no group by species, year(sample_date) order by species, year(sample_date)

Results:

species,n_lengths,n_samples,yr DOVR,252,5,1948 DOVR,100,2,1949 DOVR,857,18,1969 DOVR,671,14,1970 DOVR,725,29,1971 DOVR,1174,47,1972 DOVR,900,36,1973 DOVR,799,32,1974 DOVR,1358,40,1975 DOVR,1475,39,1976 DOVR,2425,65,1977 DOVR,2702,68,1978 DOVR,1583,38,1979 DOVR,4349,116,1980 DOVR,3561,97,1981 DOVR,3336,68,1982 DOVR,5193,106,1983 DOVR,4274,88,1984 DOVR,5140,121,1985 DOVR,4068,100,1986 DOVR,5700,114,1987 DOVR,4654,94,1988 DOVR,5450,111,1989 DOVR,450,9,1990 EGLS,51,1,1948 EGLS,50,1,1949 EGLS,49,1,1959 EGLS,50,1,1960 EGLS,25,1,1963 EGLS,934,35,1964 EGLS,499,25,1965 EGLS,3531,125,1966 EGLS,3055,114,1967 EGLS,5511,214,1968 EGLS,3446,123,1969 EGLS,1175,46,1970 EGLS,850,34,1971 EGLS,1499,57,1972 EGLS,2150,61,1973 EGLS,2084,61,1974 EGLS,1658,39,1975 EGLS,2742,62,1976 EGLS,4168,85,1977 EGLS,3288,66,1978 EGLS,1345,27,1979 EGLS,6273,126,1980 EGLS,5647,113,1981 EGLS,3796,78,1982 EGLS,3820,77,1983 EGLS,1927,38,1984 EGLS,3100,62,1985 EGLS,2949,58,1986 EGLS,1949,39,1987 EGLS,250,5,1988 EGLS,400,8,1989 EGLS,50,1,1990 PTRL,404,8,1948 PTRL,458,10,1949 PTRL,150,3,1962 PTRL,970,24,1964 PTRL,608,15,1965 PTRL,2248,53,1966 PTRL,2296,56,1967 PTRL,3936,100,1968 PTRL,2616,68,1969 PTRL,1146,35,1970 PTRL,1700,49,1971 PTRL,2182,60,1972 PTRL,2221,59,1973 PTRL,2107,63,1974 PTRL,1198,27,1975 PTRL,1730,36,1976 PTRL,2555,54,1977 PTRL,2097,42,1978 PTRL,847,17,1979 PTRL,6172,124,1980 PTRL,5754,118,1981 PTRL,3860,84,1982 PTRL,2725,56,1983 PTRL,1570,32,1984 PTRL,1222,25,1985 PTRL,1351,27,1986 PTRL,1050,21,1987 PTRL,516,11,1988 PTRL,842,17,1989 PTRL,50,1,1990 REX,2144,43,1980 REX,2774,56,1981 REX,1011,20,1982 REX,1552,31,1983 REX,838,15,1984 REX,1401,28,1985 REX,1450,29,1987

kellijohnson-NOAA commented 1 year ago

Thanks @iantaylor-NOAA for the file link. Thanks @EJDick-NOAA for the table. One more question. What units are CalCOM lengths in? Are they all mm?

EJDick-NOAA commented 1 year ago

You're welcome! The flatfish bin samples were measured in mm total length. All other lengths (i.e., the ones that are imported to PacFIN's BDS) have been either measured in or converted to mm fork length.

kellijohnson-NOAA commented 1 year ago

@EJDick-NOAA two more questions. I think these will be my last 🤣, on this topic 😉 .

  1. What gear type was used for these samples?
  2. What ageing method was used for these samples? The data used for petrale do not have this information even though it is available in CalCOM for bin samples. If all the samples are the same we can just fix the values but if they are variable then we will need a new data pull for Petrale and Rex that has this information.
EJDick-NOAA commented 1 year ago

I added gear type to the query (below). It's reasonable to assume all gears were trawl variations. GFT is "Groundfish (Otter) Trawl" and RLT is "Roller Trawl." For petrale 1869 fish did not have an ageing method. For the rest, ~1320 ages were break and burn, with 39 radiocarbon ages (reader=Haltuch) and 33 surface ages (all 3-year-olds, with 100% agreement for 10 double reads; readers N Atkins and T Johnson).

SELECT species, year(sample_date) as yr, gear, count(fish_no) as n_lengths, count(distinct(flatfish_bin_samples.sample_no)) as n_samples from flatfish_bin_fish, flatfish_bin_samples where tlength is not null and flatfish_bin_fish.sample_no = flatfish_bin_samples.sample_no group by species, year(sample_date), gear order by species, year(sample_date), gear

species | yr | gear | n_lengths | n_samples DOVR | 1948 | NULL | 252 | 5 DOVR | 1949 | NULL | 100 | 2 DOVR | 1969 | GFT | 857 | 18 DOVR | 1970 | GFT | 671 | 14 DOVR | 1971 | GFT | 725 | 29 DOVR | 1972 | GFT | 1174 | 47 DOVR | 1973 | GFT | 900 | 36 DOVR | 1974 | GFT | 799 | 32 DOVR | 1975 | GFT | 1358 | 40 DOVR | 1976 | GFT | 1475 | 39 DOVR | 1977 | GFT | 2425 | 65 DOVR | 1978 | GFT | 2702 | 68 DOVR | 1979 | GFT | 1583 | 38 DOVR | 1980 | GFT | 4349 | 116 DOVR | 1981 | GFT | 3561 | 97 DOVR | 1982 | GFT | 3336 | 68 DOVR | 1983 | GFT | 5193 | 106 DOVR | 1984 | GFT | 2724 | 57 DOVR | 1984 | RLT | 1550 | 31 DOVR | 1985 | GFT | 3540 | 89 DOVR | 1985 | RLT | 1600 | 32 DOVR | 1986 | GFT | 4068 | 100 DOVR | 1987 | GFT | 5700 | 114 DOVR | 1988 | GFT | 4654 | 94 DOVR | 1989 | GFT | 5225 | 106 DOVR | 1989 | RLT | 225 | 5 DOVR | 1990 | GFT | 450 | 9 EGLS | 1948 | NULL | 51 | 1 EGLS | 1949 | NULL | 50 | 1 EGLS | 1959 | NULL | 49 | 1 EGLS | 1960 | NULL | 50 | 1 EGLS | 1963 | NULL | 25 | 1 EGLS | 1964 | NULL | 934 | 35 EGLS | 1965 | NULL | 499 | 25 EGLS | 1966 | NULL | 3531 | 125 EGLS | 1967 | NULL | 3055 | 114 EGLS | 1968 | NULL | 5511 | 214 EGLS | 1969 | GFT | 3446 | 123 EGLS | 1970 | GFT | 1175 | 46 EGLS | 1971 | GFT | 850 | 34 EGLS | 1972 | GFT | 1499 | 57 EGLS | 1973 | GFT | 2150 | 61 EGLS | 1974 | GFT | 2084 | 61 EGLS | 1975 | GFT | 1658 | 39 EGLS | 1976 | NULL | 1841 | 44 EGLS | 1976 | GFT | 901 | 18 EGLS | 1977 | NULL | 3681 | 75 EGLS | 1977 | GFT | 487 | 10 EGLS | 1978 | NULL | 2089 | 42 EGLS | 1978 | GFT | 1199 | 24 EGLS | 1979 | NULL | 1046 | 21 EGLS | 1979 | GFT | 299 | 6 EGLS | 1980 | NULL | 6023 | 121 EGLS | 1980 | GFT | 250 | 5 EGLS | 1981 | NULL | 5296 | 106 EGLS | 1981 | GFT | 351 | 7 EGLS | 1982 | GFT | 3796 | 78 EGLS | 1983 | GFT | 3820 | 77 EGLS | 1984 | GFT | 1927 | 38 EGLS | 1985 | NULL | 2950 | 59 EGLS | 1985 | GFT | 150 | 3 EGLS | 1986 | NULL | 2949 | 58 EGLS | 1987 | NULL | 100 | 2 EGLS | 1987 | GFT | 1849 | 37 EGLS | 1988 | GFT | 250 | 5 EGLS | 1989 | GFT | 400 | 8 EGLS | 1990 | GFT | 50 | 1 PTRL | 1948 | NULL | 404 | 8 PTRL | 1949 | NULL | 458 | 10 PTRL | 1962 | NULL | 150 | 3 PTRL | 1964 | NULL | 970 | 24 PTRL | 1965 | NULL | 608 | 15 PTRL | 1966 | NULL | 2248 | 53 PTRL | 1967 | NULL | 2296 | 56 PTRL | 1968 | NULL | 3936 | 100 PTRL | 1969 | GFT | 2616 | 68 PTRL | 1970 | GFT | 1146 | 35 PTRL | 1971 | GFT | 1700 | 49 PTRL | 1972 | GFT | 2182 | 60 PTRL | 1973 | GFT | 2221 | 59 PTRL | 1974 | GFT | 2107 | 63 PTRL | 1975 | GFT | 1198 | 27 PTRL | 1976 | NULL | 976 | 21 PTRL | 1976 | GFT | 754 | 15 PTRL | 1977 | NULL | 1906 | 40 PTRL | 1977 | GFT | 649 | 14 PTRL | 1978 | NULL | 848 | 17 PTRL | 1978 | GFT | 1249 | 25 PTRL | 1979 | NULL | 647 | 13 PTRL | 1979 | GFT | 200 | 4 PTRL | 1980 | NULL | 4791 | 96 PTRL | 1980 | GFT | 1381 | 28 PTRL | 1981 | NULL | 4369 | 90 PTRL | 1981 | GFT | 1385 | 28 PTRL | 1982 | GFT | 3860 | 84 PTRL | 1983 | GFT | 2725 | 56 PTRL | 1984 | NULL | 50 | 1 PTRL | 1984 | GFT | 1520 | 31 PTRL | 1985 | NULL | 1122 | 23 PTRL | 1985 | GFT | 100 | 2 PTRL | 1986 | NULL | 1351 | 27 PTRL | 1987 | GFT | 1050 | 21 PTRL | 1988 | GFT | 516 | 11 PTRL | 1989 | GFT | 842 | 17 PTRL | 1990 | GFT | 50 | 1 REX | 1980 | NULL | 1843 | 37 REX | 1980 | GFT | 301 | 6 REX | 1981 | NULL | 2538 | 51 REX | 1981 | GFT | 236 | 5 REX | 1982 | GFT | 1011 | 20 REX | 1983 | GFT | 1552 | 31 REX | 1984 | GFT | 838 | 15 REX | 1985 | NULL | 1351 | 27 REX | 1985 | GFT | 50 | 1 REX | 1987 | GFT | 1450 | 29

kellijohnson-NOAA commented 1 year ago

Thanks @EJDick-NOAA I fixed AGENCY_GEAR_CODE to "GFT" and I will leave the ageing method blank for now. But, perhaps in the future we can find a way to get these double reads to the assessment authors in a similar way to how I do it for data pulled from PacFIN.

iantaylor-NOAA commented 1 year ago

@EJDick-NOAA, thank you for all your input on the CalCOM data for flatfish!

The previous petrale models assumed all CalCOM ages up through 1984 were surface reads and 1985-1989 were break and burn? Does this match the pattern you're seeing? If not, could you send me the CalCOM petrale data you have in which age method is recorded (or blank for a subset of samples)?

EJDick-NOAA commented 1 year ago

I'll email you all of the petrale ages from the FLATFISH_BIN_AGES table, with the query I used to get them. That will give you the method (or null value) for each age.

iantaylor-NOAA commented 1 year ago

Thanks @EJDick-NOAA for sending the data. I'm commenting here for anyone coming across this thread in the future that 1986-1989 ages are almost all break and burn as assigned in the previous petrale assessments, but 1985 is all "UNKNOWN" and 1982-1984 all "Unknown", so presumably previous authors had information from those familiar with the ageing to know that the 1985 values should be assigned as break and burn and the earlier ones as surface. I'm just going to follow the previously used ageing error assignments without digging further into this.

I agree with @kellijohnson-NOAA that in some future cycle it would be great to add the CalCOM double read info to workflow in future assessment cycles but I don't think that will have a big impact this year and we're all too busy to worry about it right now.