nmfs-swfsc-ast / ast-tasks

0 stars 0 forks source link

Obtain catch data from Lisa Marie #53

Closed jrenfree closed 9 months ago

jrenfree commented 10 months ago

We currently have set data that has not been quality-controlled. So still need to obtain QA/QC'd set and specimen data.

kstierhoff commented 10 months ago

We have received a QA/QC'd spreadsheet from K. Hinton containing set, catch, and specimen data from Lisa Marie. Age and DNA data are not yet completed, but will be provided later.

The original Excel file (2023_CPS_Survey_Final_noages.xlsx) has been placed on AST4 here: \swc-storage4-s\AST4\SURVEYS\20230703_LISA-MARIE_SummerCPS\DATA\SEINE

I have created a copy of this file (lm_data_2307RL.xlsx), which I intend to modify and use for the nearshore biomass estimates and figure generation for the survey and biomass reports after renaming columns to be compatible with existing code. This will be the master file and I will not keep a local copy on my machine, so please discuss any changes to this file that may affect other analyses.

jrenfree commented 10 months ago

In the past, at least for Lisa Marie, we have used three spreadsheets:

lm_sets: General info about each set (e.g., date, time, location) lm_catch: Breakdown of abundance and weight of each species for each basket from each set lm_specimen: Individual specimen info (e.g., length, weight) for a number of fish from each species from each set

Currently it looks like "plot_purseSeine_2307RL" is configured to reach each of these data files, however for 2307LM we only have the single data file, "lm_data_2307RL", which appears to be specimen info (i.e., lm_specimen). The non-QA/QC file we obtained (lm_sets_noQAQC) does appear to have the set and catch info.

Do you think we should update plot_purseSeine_2307RL to utilize these two spreadsheets, or rather should we modify the existing spreadsheets to mimic the three that we've used in the past?

kstierhoff commented 10 months ago

I've been updating processSeine_2307RL to work with Excel files, with set, catch (bucket), and specimen data in separate tabs. The Lisa Marie file (lm_data_2307RL.xlsx) is on AST4: \swc-storage4-s\AST4\SURVEYS\20230703_LISA-MARIE_SummerCPS\DATA\SEINE

While we wait for equivalent data from Long Beach Carnage, I've been transcribing the Bucket Sample forms from the Logs into a Google Sheet: https://docs.google.com/spreadsheets/d/1LdEnwL62ySTDjSd9hLZscqSyGlucUniydj60yctCuZc/edit#gid=1182220588

I've entered the sets, and am working on the buckets. I never got an answer from Dianna what the sample numbers corresponded to and why they were different than the set, but that shouldn't impact the survey report.

For the sake of repeatability/posterity, I will move these data from a Google Doc to an Excel file on AST4, here: \swc-storage4-s\AST4\SURVEYS\20230708_CARNAGE_SummerCPS\DATA\SEINE

I'll let you know when they are posted there.

kstierhoff commented 10 months ago

plotPurseSeine_2XXX has contained code that is also present in processSeine_2XXX (i.e., redundant), which I think is a bad approach. However, having separate scripts has been somewhat necessary given different data inputs and formats. Ideally, processSeine would contain survey-specific processing steps, unless we standardized the input data, but I'm not sure that'll happen. This would standardize the purse seine data. Then, I'd like to say that we could standardize the plotPurseSeine script, unless we have to do something whacky like we did in 2022. In either case, I'd like to be able to call processSeine and plotPurseSeine from reportSurvey and reportBiomass, so that we got the same outputs and only had to maintain one set of code.

Or, we could combine these two scripts to do the processing and plotting in one script per survey, and put the plotting in a conditional loop {i.e., if(save.figs)}, so that the data could be processed without saving the figures. Let's think about it.

jrenfree commented 9 months ago

The parsing should be relatively static, and not change from year to year. That is, it should be able to read some standardized input and then output some basic results, like the pie chart calculations for each haul and cluster. But I could see the plotting changing depending on what we need, such as a standard plot for the reports but also one-off plots for various analyses we do.

Also, the plotting may sometimes require other dependencies (e.g., data from other vessels), so I think it should be separate from the parsing.

So think I'm in agreement with your plan to call processSeine and plotPurseSeine separately from reportSurvey and reportBiomass. If any other one-off plots need to be made then we can just call processSeine then create some ad-hoc plotting script to do what we need.

kstierhoff commented 9 months ago

I'm glad that my comment made sense and I'm in agreement that we can standardize processSeine and customize plotSeine, or even call other helper scripts for one-off needs. This will require a little work up front to reformat the seine data, but that already has to be done, so I think that is a good approach. I'll try to develop a seine data template that will provide the necessary inputs to processSeine, which will produce the objects required by plotSeine.

jrenfree commented 9 months ago

I think we should try to handle it like we do the acoustic data (like process_NASC). That is, we provide processSeine a base directory of seine data files, then it iterates through each subdirectory and looks for the pertinent data (e.g., *_catch.xlsx), which it reads and appends to some variable holding results for all vessels.

Is this something that you'll be actively working on right now? Just wondering if I should stand down until that's ready, or if there's anything I can do to help. The survey report is certainly near the top of my priority list, and handling the nearshore data is one of the main tasks remaining, but we're also getting Saildrone data back now so data processing will be starting on that soon.

kstierhoff commented 9 months ago

I pretty much have the processSeine code working for the data that we have. I've commented out the LBC data that's missing. The latest version is on Github (master branch). I'd advocate for keeping the seine data that we're provided in a subdir for each vessel, but have a master Excel sheet or collection of CSV files (probably preferable to have flat, plain text files) in a main directory that we point the code to. I'm also actively working on this to prepare for nearshore biomass estimates, so I can keep you in the loop. Maybe try to run processSeine_2307RL and see if it works for you as-is?

jrenfree commented 9 months ago

Yeah I can work with this for now, just seems like it will be heavily updated for the future.

kstierhoff commented 9 months ago

processSeine.R is now available in the main branch. This can replace calls to processSeine_2307RL.R. Must now update code for creating the plots for 2307RL, and individual plotting scripts will probably required each year, but unless extreme deviations from the original plan occur, these should be fairly routine. I recomment starting with plot_purseSeine_2207RL as the basis for the 2307RL script, since that was a more traditional survey.

kstierhoff commented 9 months ago

Presently working with the LM data here: //swc-storage4-s/AST4/SURVEYS/20230703_LISA-MARIE_SummerCPS/DATA/SEINE/lm_data_2307RL.xlsx

Contains set, catch, and specimen data and works with processSeine. Also contains data from both the nearshore survey, and what they're calling the "comparative" survey, which are sets where LM sampled Shimada hake trawl areas. These data have been kept separate from the data used to estimate biomasses during the nearshore survey.

kstierhoff commented 9 months ago

These data have been more or less finalized. Closing this issue, and can reopen if any significant changes to these data arise.

kstierhoff commented 7 months ago

Revised catch data were provided by Kristen Hinton, which adds catch data for several sets, and provides correct catch weight data for another set (all updated figures highlighted yellow in the sheet below). @jrenfree, this update will likely affect the pie charts for the nearshore survey, so plotting routines should be re-run for survey and biomass reports.

\swc-storage4-s\AST4\SURVEYS\20230703_LISA-MARIE_SummerCPS\DATA\SEINE\editedSpeciesComps_dec2023.xlsx