international-year-of-the-salmon / IYS-Integrated-Data-Collection

Repository to integrate all the trawl data collected throughout the IYS Pan-Pacific Winter High Seas Expeditions
0 stars 0 forks source link

Version 4 To do: #1

Closed Br-Johnson closed 1 year ago

Br-Johnson commented 1 year ago
timvdstap commented 1 year ago

Should we remove blank columns from the Integrated Data Collection? Empty columns:

timvdstap commented 1 year ago

Enhancement to-do's:

Br-Johnson commented 1 year ago

@timvdstap

Br-Johnson commented 1 year ago

IYS_events

IYS_trawl_catch

IYS_trawl_specimen

IYS_Bongo_specimen

timvdstap commented 1 year ago

IYS_events

  • IYS_events$day_night includes the values dawn, dusk, night/dawn. The data dictionary also doesn't totally clarify what day/night means... dark, light? Or are the hours between 6 am to 6 pm is day for example.. Not fully sure, I think this information was added arbitarily? Can investigate, but I don't think it has to do with specific times.
  • IYS_events$maximum_sampling_depth_meters has 5 decimal point precision. I think those probably need rounding to no decimal points to present the precision of the measurements correctly. addressed
  • Same comment as above for IYS_events$tow_speed_kilometers_per_hour addressed
  • Same comment as above for IYS_events$tow_distance_nautical_miles, wind_direction_degrees, wind_speed_km_per_hour addressed

IYS_trawl_catch

  • $taxonomic_rank has inconsistent capitalization: class Class family Family genus Genus order Order species Species Addressed: removed any capitalization
  • $lifestage: should larvae and larval be combined? Addressed: agreed, combined into larvae as this term has controlled vocabulary in the NVS.
  • $catch_weight_units: inconsistent capitalization ie Kilograms vs kilograms Addressed: removed capitalization
  • $weight_method: value 'Trace' used 11 times but not explained in data dictionary Addressed: included in data dictionary
  • $catch_count has four decimal places. Change to zero. Addressed
  • $count_method values inconsistently named: Count Estimate Subsample Subsampled Total Weight estimate Addressed: Count renamed to 'Total' (as per Andrew Dimond), 'Subsample' renamed to 'Weight estimate' (following discussion with Andrew Dimond). 'Subsampled' is now included in the data dictionary, it indicated (practically) whole (gelatinous) individuals that could be counted. 'Estimate' changed to visual estimate as per Amy T.
  • It's unclear if when the catch_count_method is sub_sample whether or not the catch_count has been adjusted to represent the total catch or if it is just the count of the sub sample. If the latter, what is the multiplier required to multiply to sub-sample by to arrive at a total count? Probably just needs some clarification in the data dictionary. Addressed: changed the description in the data dictionary: weight_method: Method used to determine the catch_weight. Options are: total (all individuals are used to determine the weight), weight estimate (total catch_weight is extrapolated from a subsampled weight), volume estimate (catch_weight is extrapolated from a subsampled volume, trace (weights that are too small to be weighed accurately), or this is unknown. catch_method:
  • I think it would be better if the alternative_event_id immediately followed the event_id in the ordering of the columns. Same for alternative_catch_id and catch_id. Addressed: agreed, changed order. I've also renamed the columns to indicate original instead of alternative

IYS_trawl_specimen

  • $sex values are inconsistent: f f? female m male unknown Unknown unsexed Addressed: changed f and m to female and male respectively. Changed f? to female? Not sure how best to address that, welcome suggestions. Uncapitalized the sexes as well.
  • I think it would be good to list length_units explicitly like: specimen_length_units, specimen_length2_units. Right now there's just one column length_units and it's not super intuitive that it applies to both length measurements. Addressed: Agreed and changed.
  • Unclear what it means when number_measured is more than 1. Is the resulting measurement an average? Data dictionary says: "Number of individuals within measurement. Actual for specimen measured in groups (myctophids, squid etc)." Not sure what that means. **I think this indicates an average specimen length value if number_measured > 1. If that's the case, would it be better to add a column 'average_specimen_length', or indicate in the comment column that it's an average value?
  • Rename original_specimen_id, original_event_id, original_catch_id, with the word alternative to match what is used in IYS_events and IYS_catch Addressed: Did not change it in the IYS_trawl_specimen tab, rather I changed the word alternative in other data files to original to match.

IYS_Bongo_specimen

  • No specimen_id column. Need one to conform to relational structure. Addressed: Awesome catch! I had made IYS2022-TINRO-3-28-Bongo-1 but this should've been a catch_id, with IYS2022-TINRO-3-28-Bongo-1-49 being the specimen_id.
  • Does the catch_id column in IYS_Bongo_specimen have a corresponding catch_id in IYS_trawl_catch? Or does Bongo_specimen only join to IYS_events using station_event_id? Good question, currently the catch_id columns for the bongo data does not have an catch_id in the IYS_trawl_catch data file. I think that was done on purpose to keep that data file + its column headers trawl data specific. I've now included catch_id and specimen_id in the bongo_specimen data tab -- does that make sense?
  • $lifestage naming inconsistent: adult Adult C4 C5 C6 juvenile Addressed: removed capitalization.
  • $sf_proc includes date values: 10-May for eg. while others are '< 5' Addressed: This was also noted by Joanne Breckenridge. She suggested adding 'mm' to ensure Excel doesn't convert it to a date immediately. Open to other suggestions if you have them, but otherwise I think that's a good (temporary?) work around.
  • How many sig figs are relevant for $abundance_m3? Addressed: Think perhaps best to round to 3 decimal places?
  • $preservation_method naming inconsistent: formalin v. Formalin Addressed: changed to 'formalin'.
  • representation of size_class is inconsistent. ie 20mm+ while others are for eg > 10 mm **I'm not sure how to address this/whether we should address it. Perhaps we can change it to two columns: 'minimum_size' and 'maximum_size', but I'm not sure if that's useful for data users.
timvdstap commented 1 year ago

@timvdstap

  • Should we remove the Raw_Spirit from the IYS_events table? Currently all we have is a lat/lon... Addressed: Yeah I think that's a good idea, I removed it. There's no useful information attached to Raw Spirit event_ids currently anyway.

  • There are 3 NA values for catch_id column in the IYS_trawl_catch file. One was for testing the live box on the Kaganovksy (could probably be removed). One other is from the TINRO which was successful but nothing was caught, perhaps that comment should just be captured in the IYS_events table and the row removed from the IYS_trawl_catch file. Lastly there is a trawl from the Shimada for which the codend wasn't closed: this could probably be removed from the catch file and commented in the IYS_events files. Addressed: I've removed these from the catch_id column in the IYS_trawl_catch and included the comments in the Sampling_Event tab.

  • It looks like there are a number of specimens from the NW Explorer that has a catch_id that doesn't exist in the IYS_trawl_catch$catch_id: TableIYS_trawl_specimen: foreign keycatch_idinto tableIYS_trawl_catch: values ofIYS_trawl_specimen$catch_idnot inIYS_trawl_catch$catch_id: IYS2022-NW-2-13-Trawl-15 (57), IYS2022-NW-2-6-Trawl-10 (35), IYS2022-NW-2-6-Trawl-13 (19), IYS2022-NW-2-6-Trawl-9 (16), IYS2022-NW-2-6-Trawl-12 (15), … It looks to me like this is related to some of the catch_ids in IYS_trawl_catch say Trawl2 rather than Trawl: eg IYS2022-NW-2-10-Trawl2. Run this code to see what I mean: nw_issues <- dplyr::anti_join(IYS_trawl_specimen, IYS_trawl_catch, by = "catch_id") Addressed: Resolved the issue - I've sent a revised data file to Andrew for verification just to be sure.

Br-Johnson commented 1 year ago
  • Does the catch_id column in IYS_Bongo_specimen have a corresponding catch_id in IYS_trawl_catch? Or does Bongo_specimen only join to IYS_events using station_event_id? Good question, currently the catch_id columns for the bongo data does not have an catch_id in the IYS_trawl_catch data file. I think that was done on purpose to keep that data file + its column headers trawl data specific. I've now included catch_id and specimen_id in the bongo_specimen data tab -- does that make sense?

What is the purpose of the catch_id column? I assume the station_event_id relates to a specific bongo tow... what does the catch_id relate to? Is it a subsample? If it doesn't identify a sub sample and it's not a primary or foreign key to another table, it seems superfluous.

Br-Johnson commented 1 year ago

Relational structure:

Br-Johnson commented 1 year ago

representation of size_class is inconsistent. ie 20mm+ while others are for eg > 10 mm **I'm not sure how to address this/whether we should address it. Perhaps we can change it to two columns: 'minimum_size' and 'maximum_size', but I'm not sure if that's useful for data users.

Perhaps leave as is if it's not possible to align the values

$sf_proc includes date values: 10-May for eg. while others are '< 5' Addressed: This was also noted by Joanne Breckenridge. She suggested adding 'mm' to ensure Excel doesn't convert it to a date immediately. Open to other suggestions if you have them, but otherwise I think that's a good (temporary?) work around.

Sounds OK to me.

IYS_events$day_night includes the values dawn, dusk, night/dawn. The data dictionary also doesn't totally clarify what day/night means... dark, light? Or are the hours between 6 am to 6 pm is day for example.. Not fully sure, I think this information was added arbitarily? Can investigate, but I don't think it has to do with specific times.

It would make sense if it's just daylight vs dark. Up to you if you think it's necessary to confirm...