Closed Br-Johnson closed 1 year ago
Should we remove blank columns from the Integrated Data Collection? Empty columns:
Enhancement to-do's:
@timvdstap
Table
IYS_trawl_specimen: foreign key
catch_idinto table
IYS_trawl_catch: values of
IYS_trawl_specimen$catch_idnot in
IYS_trawl_catch$catch_id: IYS2022-NW-2-13-Trawl-15 (57), IYS2022-NW-2-6-Trawl-10 (35), IYS2022-NW-2-6-Trawl-13 (19), IYS2022-NW-2-6-Trawl-9 (16), IYS2022-NW-2-6-Trawl-12 (15), …
It looks to me like this is related to some of the catch_ids in IYS_trawl_catch say Trawl2 rather than Trawl: eg IYS2022-NW-2-10-Trawl2. Run this code to see what I mean: nw_issues <- dplyr::anti_join(IYS_trawl_specimen, IYS_trawl_catch, by = "catch_id")
IYS_events
IYS_trawl_catch
IYS_trawl_specimen
specimen_length_units
, specimen_length2_units
. Right now there's just one column length_units
and it's not super intuitive that it applies to both length measurements.number_measured
is more than 1. Is the resulting measurement an average? Data dictionary says: "Number of individuals within measurement. Actual for specimen measured in groups (myctophids, squid etc)." Not sure what that means.IYS_Bongo_specimen
No specimen_id column. Need one to conform to relational structure.
Does the catch_id column in IYS_Bongo_specimen have a corresponding catch_id in IYS_trawl_catch? Or does Bongo_specimen only join to IYS_events using station_event_id
?
$lifestage naming inconsistent: adult Adult C4 C5 C6 juvenile
$sf_proc includes date values: 10-May for eg. while others are '< 5'
How many sig figs are relevant for $abundance_m3?
$preservation_method naming inconsistent: formalin v. Formalin
representation of size_class is inconsistent. ie 20mm+ while others are for eg > 10 mm
IYS_events
- IYS_events$day_night includes the values dawn, dusk, night/dawn. The data dictionary also doesn't totally clarify what day/night means... dark, light? Or are the hours between 6 am to 6 pm is day for example.. Not fully sure, I think this information was added arbitarily? Can investigate, but I don't think it has to do with specific times.
IYS_events$maximum_sampling_depth_meters has 5 decimal point precision. I think those probably need rounding to no decimal points to present the precision of the measurements correctly.addressedSame comment as above for IYS_events$tow_speed_kilometers_per_houraddressedSame comment as above for IYS_events$tow_distance_nautical_miles, wind_direction_degrees, wind_speed_km_per_houraddressedIYS_trawl_catch
$taxonomic_rank has inconsistent capitalization: class Class family Family genus Genus order Order species SpeciesAddressed: removed any capitalization$lifestage: should larvae and larval be combined?Addressed: agreed, combined into larvae as this term has controlled vocabulary in the NVS.$catch_weight_units: inconsistent capitalization ie Kilograms vs kilogramsAddressed: removed capitalization$weight_method: value 'Trace' used 11 times but not explained in data dictionaryAddressed: included in data dictionary$catch_count has four decimal places. Change to zero.Addressed$count_method values inconsistently named: Count Estimate Subsample Subsampled Total Weight estimateAddressed: Count renamed to 'Total' (as per Andrew Dimond), 'Subsample' renamed to 'Weight estimate' (following discussion with Andrew Dimond). 'Subsampled' is now included in the data dictionary, it indicated (practically) whole (gelatinous) individuals that could be counted. 'Estimate' changed to visual estimate as per Amy T.It's unclear if when the catch_count_method is sub_sample whether or not the catch_count has been adjusted to represent the total catch or if it is just the count of the sub sample. If the latter, what is the multiplier required to multiply to sub-sample by to arrive at a total count? Probably just needs some clarification in the data dictionary.Addressed: changed the description in the data dictionary: weight_method: Method used to determine the catch_weight. Options are: total (all individuals are used to determine the weight), weight estimate (total catch_weight is extrapolated from a subsampled weight), volume estimate (catch_weight is extrapolated from a subsampled volume, trace (weights that are too small to be weighed accurately), or this is unknown. catch_method:I think it would be better if the alternative_event_id immediately followed the event_id in the ordering of the columns. Same for alternative_catch_id and catch_id.Addressed: agreed, changed order. I've also renamed the columns to indicate original instead of alternativeIYS_trawl_specimen
$sex values are inconsistent: f f? female m male unknown Unknown unsexedAddressed: changed f and m to female and male respectively. Changed f? to female? Not sure how best to address that, welcome suggestions. Uncapitalized the sexes as well.I think it would be good to list length_units explicitly like:Addressed: Agreed and changed.specimen_length_units
,specimen_length2_units
. Right now there's just one columnlength_units
and it's not super intuitive that it applies to both length measurements.- Unclear what it means when
number_measured
is more than 1. Is the resulting measurement an average? Data dictionary says: "Number of individuals within measurement. Actual for specimen measured in groups (myctophids, squid etc)." Not sure what that means. **I think this indicates an average specimen length value if number_measured > 1. If that's the case, would it be better to add a column 'average_specimen_length', or indicate in the comment column that it's an average value?Rename original_specimen_id, original_event_id, original_catch_id, with the word alternative to match what is used in IYS_events and IYS_catchAddressed: Did not change it in the IYS_trawl_specimen tab, rather I changed the word alternative in other data files to original to match.IYS_Bongo_specimen
No specimen_id column. Need one to conform to relational structure.Addressed: Awesome catch! I had made IYS2022-TINRO-3-28-Bongo-1 but this should've been a catch_id, with IYS2022-TINRO-3-28-Bongo-1-49 being the specimen_id.- Does the catch_id column in IYS_Bongo_specimen have a corresponding catch_id in IYS_trawl_catch? Or does Bongo_specimen only join to IYS_events using
station_event_id
? Good question, currently the catch_id columns for the bongo data does not have an catch_id in the IYS_trawl_catch data file. I think that was done on purpose to keep that data file + its column headers trawl data specific. I've now included catch_id and specimen_id in the bongo_specimen data tab -- does that make sense?$lifestage naming inconsistent: adult Adult C4 C5 C6 juvenileAddressed: removed capitalization.$sf_proc includes date values: 10-May for eg. while others are '< 5'Addressed: This was also noted by Joanne Breckenridge. She suggested adding 'mm' to ensure Excel doesn't convert it to a date immediately. Open to other suggestions if you have them, but otherwise I think that's a good (temporary?) work around.How many sig figs are relevant for $abundance_m3?Addressed: Think perhaps best to round to 3 decimal places?$preservation_method naming inconsistent: formalin v. FormalinAddressed: changed to 'formalin'.- representation of size_class is inconsistent. ie 20mm+ while others are for eg > 10 mm **I'm not sure how to address this/whether we should address it. Perhaps we can change it to two columns: 'minimum_size' and 'maximum_size', but I'm not sure if that's useful for data users.
@timvdstap
Should we remove the Raw_Spirit from the IYS_events table? Currently all we have is a lat/lon...Addressed: Yeah I think that's a good idea, I removed it. There's no useful information attached to Raw Spirit event_ids currently anyway.
There are 3 NA values for catch_id column in the IYS_trawl_catch file. One was for testing the live box on the Kaganovksy (could probably be removed). One other is from the TINRO which was successful but nothing was caught, perhaps that comment should just be captured in the IYS_events table and the row removed from the IYS_trawl_catch file. Lastly there is a trawl from the Shimada for which the codend wasn't closed: this could probably be removed from the catch file and commented in the IYS_events files.Addressed: I've removed these from the catch_id column in the IYS_trawl_catch and included the comments in the Sampling_Event tab.
It looks like there are a number of specimens from the NW Explorer that has a catch_id that doesn't exist in the IYS_trawl_catch$catch_id:Addressed: Resolved the issue - I've sent a revised data file to Andrew for verification just to be sure.Table
IYS_trawl_specimen: foreign key
catch_idinto table
IYS_trawl_catch: values of
IYS_trawl_specimen$catch_idnot in
IYS_trawl_catch$catch_id: IYS2022-NW-2-13-Trawl-15 (57), IYS2022-NW-2-6-Trawl-10 (35), IYS2022-NW-2-6-Trawl-13 (19), IYS2022-NW-2-6-Trawl-9 (16), IYS2022-NW-2-6-Trawl-12 (15), …
It looks to me like this is related to some of the catch_ids in IYS_trawl_catch say Trawl2 rather than Trawl: eg IYS2022-NW-2-10-Trawl2. Run this code to see what I mean:nw_issues <- dplyr::anti_join(IYS_trawl_specimen, IYS_trawl_catch, by = "catch_id")
- Does the catch_id column in IYS_Bongo_specimen have a corresponding catch_id in IYS_trawl_catch? Or does Bongo_specimen only join to IYS_events using
station_event_id
? Good question, currently the catch_id columns for the bongo data does not have an catch_id in the IYS_trawl_catch data file. I think that was done on purpose to keep that data file + its column headers trawl data specific. I've now included catch_id and specimen_id in the bongo_specimen data tab -- does that make sense?
What is the purpose of the catch_id
column? I assume the station_event_id relates to a specific bongo tow... what does the catch_id
relate to? Is it a subsample? If it doesn't identify a sub sample and it's not a primary or foreign key to another table, it seems superfluous.
Relational structure:
IYS_Bongo_specimen$station_event_ID
do not have a match in IYS_events$station_event_ID
. One issue I see is "GoA2019-Kaganovsky" in IYS_events vs "IYS2019-Kaganovsky" in IYS_Bongo_specimenrepresentation of size_class is inconsistent. ie 20mm+ while others are for eg > 10 mm **I'm not sure how to address this/whether we should address it. Perhaps we can change it to two columns: 'minimum_size' and 'maximum_size', but I'm not sure if that's useful for data users.
Perhaps leave as is if it's not possible to align the values
$sf_proc includes date values: 10-May for eg. while others are '< 5' Addressed: This was also noted by Joanne Breckenridge. She suggested adding 'mm' to ensure Excel doesn't convert it to a date immediately. Open to other suggestions if you have them, but otherwise I think that's a good (temporary?) work around.
Sounds OK to me.
IYS_events$day_night includes the values dawn, dusk, night/dawn. The data dictionary also doesn't totally clarify what day/night means... dark, light? Or are the hours between 6 am to 6 pm is day for example.. Not fully sure, I think this information was added arbitarily? Can investigate, but I don't think it has to do with specific times.
It would make sense if it's just daylight vs dark. Up to you if you think it's necessary to confirm...