ioos / bio_data_guide

Standardizing Marine Biological Data Working Group - An open community to facilitate the mobilization of biological data to OBIS.
https://ioos.github.io/bio_data_guide/
MIT License
46 stars 21 forks source link

[dataset]: Animal Satellite Telemetry data #145

Open MathewBiddle opened 1 year ago

MathewBiddle commented 1 year ago

Contact details

mathew.biddle@noaa.gov

Dataset Title

ATN satellite telemetry data

Describe your dataset and any specific challenges or blockers you have or anticipate.

We are very close to a final netCDF template for ATN's satellite trajectory deployment files.

https://github.com/ioos/ioos-atn-data/blob/main/templates/atn_trajectory_template.cdl

Last year, I developed an R script to read in the template and start creating a DwC-A package. This year I'd like to finish that work, assuming we finish the template and create some example files.

https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/DRAFT-R-netCDF2DwC.ipynb

xref:

Link to "raw" Data Files.

https://github.com/ioos/ioos-atn-data/tree/main/data

MathewBiddle commented 1 year ago

The netCDF specification will be documented at https://ioos.github.io/ioos-atn-data/

MathewBiddle commented 1 year ago

need to decide on a decimation strategy. The frequency of observations varies from 2 minutes to multiple days. Below are some examples of time differences between points in an example dataset:

jdpye commented 1 year ago

The decimation strategy that ETN and OTN are working on for acoustic telemetry data is down to a lot of hard work by Peter Desmet and Jonas Mortelmans, and is based in some of Peter's work on camtrap-dp and with other satellite tagged animals. It employs an aggregation strategy of 'take the first detection/location per hour', with other Darwin Core fields like dataGeneralizations helping characterize the summarization by indicating how many detections have been obfuscated by the aggregation.

The benefit of using this method is that each detection is a real point in space and time that the animal was observed, and also it puts a hard upper bound per tag on how many occurrences can be generated by a single individual/tag. There's a lot of background information and ancillary decisions made about how to characterize things like coordinateUncertainty https://github.com/inbo/etn/issues/256 and what the logic for the decimation of the events themselves are here: https://github.com/inbo/etn/blob/main/inst/sql/dwc_occurrence.sql

I've got more code coming that deals with pulling together an Event Core version, with the Occurrences still being generated in a decimated way like this, but with tag attachment and listening station deployments being handled as Events and more things being reported as Extended Measurement or Facts.

MathewBiddle commented 1 year ago

I created an example DwC-A package in this PR https://github.com/ioos/ioos_code_lab/pull/13/commits/e58b2b5a340053ee82b0b4da532afc853b1182cf

The template still isn't finalized so I don't want to go too far down the road, but @albenson-usgs gave some great feedback on the initial package, to start addressing:

MathewBiddle commented 1 year ago

For reference, below is a table of the data available (dumped from the netCDF file), followed by the netCDF header of the metadata available. THESE ARE EXAMPLE DATA and therefore I have redacted some information about the PI.

I think we can address all of the comments above from the available data and metadata.

data table:

| obs | deploy_id | time | z | lat | lon | ptt | instrument | type | location_class | error_radius | semi_major_axis | semi_minor_axis | ellipse_orientation | offset | offset_orientation | gpe_msd | gpe_u | count | qartod_time_flag | qartod_speed_flag | qartod_location_flag | qartod_rollup_flag | crs | trajectory | animal_age | animal_life_stage | animal_sex | animal_weight | animal_length | animal_length_2 | animal | instrument_tag | instrument_location | taxon_name | taxon_lsid | comment | |------:|:------------|:--------------------|----:|-------:|---------:|------:|:-------------|:-------|:-----------------|---------------:|------------------:|------------------:|----------------------:|---------:|---------------------:|----------:|--------:|--------:|-------------------:|--------------------:|-----------------------:|---------------------:|------------:|:-------------------------|-------------:|:--------------------|:-------------|----------------:|----------------:|------------------:|---------:|:-------------------------|:-------------------------|:-----------------------|:------------------------------------------|:----------| | 0 | 09_13-45866 | 2009-09-23 00:00:00 | 0 | 34.03 | -118.56 | 45866 | SPOT | User | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 2 | 1 | 1 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | | | 1 | 09_13-45866 | 2009-09-25 06:42:00 | 0 | 23.59 | -166.18 | 45866 | SPOT | Argos | A | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 4 | 1 | 4 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | | | 2 | 09_13-45866 | 2009-09-25 11:09:00 | 0 | 34.024 | -118.556 | 45866 | SPOT | Argos | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 4 | 1 | 4 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | | | 3 | 09_13-45866 | 2009-09-25 11:11:00 | 0 | 34.035 | -118.549 | 45866 | SPOT | Argos | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 4 | 1 | 4 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | | | 4 | 09_13-45866 | 2009-09-27 17:58:00 | 0 | 34.033 | -118.547 | 45866 | SPOT | Argos | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 1 | 1 | 1 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | |

netCDF metadata:

``` xarray.Dataset { dimensions: obs = 29 ; variables: object deploy_id() ; deploy_id:long_name = id for this deployment. This is typically the tag ptt ; deploy_id:comment = Friendly name given to the tag by the user. If no specific friendly name is given, this is the PTT id. ; deploy_id:instrument = instrument_location ; deploy_id:platform = animal ; deploy_id:coverage_content_type = referenceInformation ; datetime64[ns] time(obs) ; time:standard_name = time ; time:axis = T ; time:_CoordinateAxisType = Time ; time:long_name = Time of the measurement, in seconds since 1990-01-01 ; time:actual_min = 2009-09-23T00:00:00Z ; time:actual_max = 2009-11-23T05:12:00Z ; time:ancillary_variables = qartod_time_flag qartod_rollup_flag qartod_speed_flag ; time:instrument = instrument_location ; time:platform = animal ; time:coverage_content_type = coordinate ; float64 z(obs) ; z:axis = Z ; z:long_name = depth of measurement ; z:positive = down ; z:standard_name = depth ; z:units = m ; z:actual_min = 0.0 ; z:actual_max = 0.0 ; z:instrument = ; z:platform = animal ; z:comment = This variable is synthetically generated to represent the depth of observations ; z:coverage_content_type = coordinate ; float64 lat(obs) ; lat:axis = Y ; lat:_CoordinateAxisType = Lat ; lat:long_name = Latitude portion of location in decimal degrees North ; lat:standard_name = latitude ; lat:units = degrees_north ; lat:valid_max = 90.0 ; lat:valid_min = -90.0 ; lat:actual_min = 23.59 ; lat:actual_max = 34.045 ; lat:ancillary_variables = qartod_location_flag qartod_rollup_flag qartod_speed_flag error_radius semi_major_axis semi_minor_axis ellipse_orientation offset offset_orientation ; lat:instrument = instrument_location ; lat:platform = animal ; lat:coverage_content_type = coordinate ; float64 lon(obs) ; lon:axis = X ; lon:_CoordinateAxisType = Lon ; lon:long_name = Longitude portion of location in decimal degrees East ; lon:standard_name = longitude ; lon:units = degrees_east ; lon:valid_max = 180.0 ; lon:valid_min = -180.0 ; lon:actual_min = -166.18 ; lon:actual_max = -118.504 ; lon:ancillary_variables = qartod_location_flag qartod_rollup_flag qartod_speed_flag error_radius semi_major_axis semi_minor_axis ellipse_orientation offset offset_orientation ; lon:instrument = instrument_location ; lon:platform = animal ; lon:coverage_content_type = coordinate ; float64 ptt(obs) ; ptt:long_name = Platform Transmitter Terminal (PTT) id used for Argos transmissions ; ptt:comment = PTT id for this deployment. PTT ids may be used on multiple deployments, but not concurrently. When combined with deployment dates, PTTs can uniquely identify a deployment. ; ptt:coverage_content_type = referenceInformation ; ptt:instrument = instrument_location ; ptt:platform = animal ; object instrument(obs) ; instrument:comment = Wildlife Computers instrument family. Variable may report manufacturer default values (e.g., Mk10) and may not match correctly defined instrument_location or instrument_tag variables and attributes. ; instrument:long_name = Instrument family ; instrument:instrument = instrument_location ; instrument:platform = animal ; instrument:coverage_content_type = referenceInformation ; object type(obs) ; type:comment = Type of location: Argos, FastGPS or User ; type:long_name = Type of location information - Argos, GPS satellite or user provided location ; type:instrument = instrument_location ; type:platform = animal ; type:coverage_content_type = referenceInformation ; object location_class(obs) ; location_class:standard_name = quality_flag ; location_class:comment = Quality codes from the ARGOS satellite (in meters): G,3,2,1,0,A,B,Z. See http://www.argos-system.org/manual/3-location/34_location_classes.htm ; location_class:long_name = Location Quality Code from ARGOS satellite system ; location_class:code_values = G,3,2,1,0,A,B,Z ; location_class:code_meanings = estimated error less than 100m and 1+ messages received per satellite pass, estimated error less than 250m and 4+ messages received per satellite pass, estimated error between 250m and 500m and 4+ messages per satellite pass, estimated error between 500m and 1500m and 4+ messages per satellite pass, estimated error greater than 1500m and 4+ messages received per satellite pass, no least squares estimated error or unbounded kalman filter estimated error and 3 messages received per satellite pass, no least squares estimated error or unbounded kalman filter estimated error and 1 or 2 messages received per satellite pass, invalid location (available for Service Plus or Auxilliary Location Processing) ; location_class:instrument = instrument_location ; location_class:platform = animal ; location_class:ancillary_variables = lat lon ; location_class:coverage_content_type = qualityInformation ; float64 error_radius(obs) ; error_radius:long_name = Error radius ; error_radius:units = m ; error_radius:comment = If the position is best represented as a circle, this field gives the radius of that circle in meters. ; error_radius:instrument = instrument_location ; error_radius:platform = animal ; error_radius:ancillary_variables = lat lon offset offset_orientation ; error_radius:coverage_content_type = qualityInformation ; float64 semi_major_axis(obs) ; semi_major_axis:long_name = Error - ellipse semi-major axis ; semi_major_axis:units = m ; semi_major_axis:comment = If the estimated position error is best expressed as an ellipse, this field gives the length in meters of the semi-major elliptical axis (one half of the major axis). ; semi_major_axis:instrument = instrument_location ; semi_major_axis:platform = animal ; semi_major_axis:ancillary_variables = lat lon ellipse_orientation offset offset_orientation ; semi_major_axis:coverage_content_type = qualityInformation ; float64 semi_minor_axis(obs) ; semi_minor_axis:long_name = Error - ellipse semi-minor axis ; semi_minor_axis:units = m ; semi_minor_axis:comment = If the estimated position error is best expressed as an ellipse, this field gives the length in meters of the semi-minor elliptical axis (one half of the minor axis). ; semi_minor_axis:instrument = instrument_location ; semi_minor_axis:platform = animal ; semi_minor_axis:ancillary_variables = lat lon ellipse_orientation offset offset_orientation ; semi_minor_axis:coverage_content_type = qualityInformation ; float64 ellipse_orientation(obs) ; ellipse_orientation:long_name = Error - ellipse orientation in degrees clockwise from true north ; ellipse_orientation:units = degrees ; ellipse_orientation:comment = The angle in degrees of the ellipse from true north, proceeding clockwise (0 to 360). A blank field represents 0 degrees. ; ellipse_orientation:instrument = instrument_location ; ellipse_orientation:platform = animal ; ellipse_orientation:ancillary_variables = lat lon semi_major_axis semi_minor_axis offset offset_orientation ; ellipse_orientation:coverage_content_type = qualityInformation ; float64 offset(obs) ; offset:long_name = Error - offset in meters to center of error ellipse or circle ; offset:units = m ; offset:comment = This field is non-zero if the circle or ellipse are not centered on the (Latitude, Longitude) values on this row. "Offset" gives the distance in meters from (Latitude, Longitude) to the center of the ellipse. ; offset:instrument = instrument_location ; offset:platform = animal ; offset:ancillary_variables = lat lon error_radius semi_major_axis semi_minor_axis offset_orientation ; offset:coverage_content_type = qualityInformation ; float64 offset_orientation(obs) ; offset_orientation:long_name = Error - offset orientation angle to ellipse center ; offset_orientation:units = degrees ; offset_orientation:comment = If the "Offset" field is non-zero, this field is the angle in degrees from (Latitude, Longitude) to the center of the ellipse. Zero degrees is true north; a blank field represents 0 degrees. ; offset_orientation:instrument = instrument_location ; offset_orientation:platform = animal ; offset_orientation:ancillary_variables = lat lon error_radius semi_major_axis semi_minor_axis offset ; offset_orientation:coverage_content_type = qualityInformation ; float64 gpe_msd(obs) ; gpe_msd:comment = Historical. No longer applicable. ; gpe_msd:long_name = ; gpe_msd:units = ; gpe_msd:instrument = instrument_location ; gpe_msd:platform = animal ; gpe_msd:coverage_content_type = auxillaryInformation ; float64 gpe_u(obs) ; gpe_u:comment = Historical. No longer applicable. ; gpe_u:long_name = ; gpe_u:units = ; gpe_u:instrument = instrument_location ; gpe_u:platform = animal ; gpe_u:coverage_content_type = auxillaryInformation ; float64 count(obs) ; count:comment = Total number of times a particular data item was received, verified, and successfully decoded. ; count:long_name = Count ; count:units = count ; count:instrument = instrument_location ; count:platform = animal ; count:coverage_content_type = auxillaryInformation ; float32 qartod_time_flag(obs) ; qartod_time_flag:standard_name = gross_range_test_quality_flag ; qartod_time_flag:long_name = Time QC test - gross range test ; qartod_time_flag:implementation = https://github.com/ioos/ioos_qc/ ; qartod_time_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ; qartod_time_flag:flag_values = [1 2 3 4 9] ; qartod_time_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ; qartod_time_flag:coverage_content_type = qualityInformation ; float32 qartod_speed_flag(obs) ; qartod_speed_flag:standard_name = gross_range_test_quality_flag ; qartod_speed_flag:long_name = Speed QC test - gross range test ; qartod_speed_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ; qartod_speed_flag:implementation = https://github.com/ioos/ioos_qc/ ; qartod_speed_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ; qartod_speed_flag:flag_values = [1 2 3 4 9] ; qartod_speed_flag:coverage_content_type = qualityInformation ; float32 qartod_location_flag(obs) ; qartod_location_flag:standard_name = location_test_quality_flag ; qartod_location_flag:long_name = Location QC test - Location test ; qartod_location_flag:implementation = https://github.com/ioos/ioos_qc/ ; qartod_location_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ; qartod_location_flag:flag_values = [1 2 3 4 9] ; qartod_location_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ; qartod_location_flag:coverage_content_type = qualityInformation ; float32 qartod_rollup_flag(obs) ; qartod_rollup_flag:standard_name = aggregate_quality_flag ; qartod_rollup_flag:long_name = Aggregate QC value ; qartod_rollup_flag:implementation = https://github.com/ioos/ioos_qc/ ; qartod_rollup_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ; qartod_rollup_flag:flag_values = [1 2 3 4 9] ; qartod_rollup_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ; qartod_rollup_flag:coverage_content_type = qualityInformation ; int32 crs() ; crs:epsg_code = EPSG:4326 ; crs:grid_mapping_name = latitude_longitude ; crs:inverse_flattening = 298.257223563 ; crs:long_name = Coordinate Reference System - http://www.opengis.net/def/crs/EPSG/0/4326 ; crs:semi_major_axis = 6378137.0 ; crs:coverage_content_type = referenceInformation ; object trajectory() ; trajectory:cf_role = trajectory_id ; trajectory:long_name = trajectory identifier ; float64 animal_age() ; animal_age:units = ; animal_age:long_name = age of the animal as measured or estimated at deployment ; animal_age:coverage_content_type = referenceInformation ; animal_age:animal_age = Not provided ; object animal_life_stage() ; animal_life_stage:animal_life_stage = juvenile ; animal_life_stage:long_name = Lifestage of the animal at time of deployment ; animal_life_stage:coverage_content_type = referenceInformation ; object animal_sex() ; animal_sex:animal_sex = male ; animal_sex:long_name = sex of the animal at time of tag deployment ; animal_sex:coverage_content_type = referenceInformation ; float32 animal_weight() ; animal_weight:units = kg ; animal_weight:long_name = mass of the animal as measured or estimated at deployment ; animal_weight:animal_weight = Not provided ; animal_weight:coverage_content_type = referenceInformation ; float32 animal_length() ; animal_length:animal_length_type = total length ; animal_length:units = cm ; animal_length:animal_length = 213.0 (cm) total length ; animal_length:long_name = length of the animal as measured or estimated at deployment ; animal_length:coverage_content_type = referenceInformation ; float32 animal_length_2() ; animal_length_2:animal_length_2_type = Not provided ; animal_length_2:units = ; animal_length_2:animal_length_2 = Not provided ; animal_length_2:long_name = length of the animal as measured or estimated at deployment ; animal_length_2:coverage_content_type = referenceInformation ; object animal() ; animal:suborder = ; animal:infraorder = ; animal:scientificname = Carcharodon carcharias ; animal:long_name = tagged animal id ; animal:superdomain = Biota ; animal:order = Lamniformes ; animal:authority = (Linnaeus, 1758) ; animal:kingdom = Animalia ; animal:species = Carcharodon carcharias ; animal:genus = Carcharodon ; animal:megaclass = ; animal:family = Lamnidae ; animal:taxonRankID = 220 ; animal:class = Elasmobranchii ; animal:cf_role = trajectory_id ; animal:coverage_content_type = referenceInformation ; animal:subphylum = Vertebrata ; animal:phylum = Chordata ; animal:AphiaID = 105838 ; animal:valid_name = Carcharodon carcharias ; animal:infraphylum = Gnathostomata ; animal:subclass = Neoselachii ; animal:rank = Species ; object instrument_tag() ; instrument_tag:manufacturer = Wildlife Computers ; instrument_tag:make_model = SPOT5 ; instrument_tag:serial_number = 07S0230 ; instrument_tag:long_name = telemetry tag applied to animal ; instrument_tag:coverage_content_type = referenceInformation ; instrument_tag:calibration_date = Not Provided ; object instrument_location() ; instrument_location:manufacturer = Wildlife Computers ; instrument_location:make_model = SPOT5 ; instrument_location:serial_number = 07S0230 ; instrument_location:long_name = Wildlife Computers SPOT5 ; instrument_location:location_type = argos / modeled ; instrument_location:comment = Location ; instrument_location:coverage_content_type = referenceInformation ; instrument_location:calibration_date = Not Provided ; object taxon_name() ; taxon_name:standard_name = biological_taxon_name ; taxon_name:long_name = most precise taxonomic classification for the tagged animal ; taxon_name:coverage_content_type = referenceInformation ; taxon_name:source = Froese, R. and D. Pauly. Editors. (2023). FishBase. Carcharodon carcharias (Linnaeus, 1758). Accessed through: World Register of Marine Species at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=105838 on 2023-08-16 ; taxon_name:url = https://www.marinespecies.org/aphia.php?p=taxdetails&id=105838 ; AGRICULTURE > ANIMAL SCIENCE > ANIMAL ECOLOGY AND BEHAVIOR, EARTH SCIENCE > BIOSPHERE > ECOLOGICAL DYNAMICS > SPECIES/POPULATION INTERACTIONS > MIGRATORY RATES/ROUTES, EARTH SCIENCE > OCEANS, EARTH SCIENCE > CLIMATE INDICATORS > BIOSPHERIC INDICATORS > SPECIES MIGRATION, EARTH SCIENCE > OCEANS, EARTH SCIENCE > BIOLOGICAL CLASSIFICATION > ANIMALS/VERTEBRATES, EARTH SCIENCE > BIOSPHERE > ECOSYSTEMS > MARINE ECOSYSTEMS, PROVIDERS > GOVERNMENT AGENCIES-U.S. FEDERAL AGENCIES > DOC > NOAA > IOOS, PROVIDERS > COMMERCIAL > Axiom Data Science ; :license = These data may be used and redistributed for free, but are not intended for legal use, since they may contain inaccuracies. No person or group associated with these data makes any warranty, expressed or implied, including warranties of merchantability and fitness for a particular purpose, or assumes any legal liability for the accuracy, completeness or usefulness of this information. This disclaimer applies to both individual use of these data and aggregate use with other data. It is strongly recommended that users read and fully comprehend associated metadata prior to use. Please acknowledge the U.S. Animal Telemetry Network (ATN) or the specified citation as the source from which these data were obtained in any publications and/or representations of these data. Communication and collaboration with dataset authors are strongly encouraged. ; :metadata_link = ; :naming_authority = com.wildlifecomputers ; :platform_category = animal ; :platform = fish ; :platform_vocabulary = https://vocab.nerc.ac.uk/collection/L06/current/ ; :processing_level = NetCDF file created from position data obtained from Wildlife Computers API. ; :project = Project White Shark: Juvenile Satellite Biotelemetry, 2001-2020 ; :publisher_email = atndata@ioos.us ; :publisher_institution = US Integrated Ocean Observing System Office ; :publisher_name = US Integrated Ocean Observing System (IOOS) Animal Telemetry Network (ATN) ; :publisher_url = https://atn.ioos.us/ ; :publisher_country = USA ; :standard_name_vocabulary = CF-v78 ; :vendor = Wildlife Computers ; :geospatial_lat_min = 23.59 ; :geospatial_lat_max = 34.045 ; :geospatial_lon_min = -166.18 ; :geospatial_lon_max = -118.504 ; :geospatial_bbox = POLYGON ((-118.504 23.59, -118.504 34.045, -166.18 34.045, -166.18 23.59, -118.504 23.59)) ; :geospatial_bounds = POLYGON ((-166.18 23.59, -118.581 34.038, -118.53 34.045, -118.504 33.989, -118.534 33.972, -119.75 33.517, -166.18 23.59)) ; :geospatial_bounds_crs = EPSG:4326 ; :time_coverage_start = 2009-09-23T00:00:00Z ; :time_coverage_end = 2009-11-23T05:12:00Z ; :time_coverage_duration = P61DT5H12M0S ; :time_coverage_resolution = P2DT2H39M43S ; :date_issued = 2023-08-16T20:00:00Z ; :date_modified = 2023-08-16T20:00:00Z ; :history = 2023-08-07T20:24:04Z - Created by the IOOS ATN DAC from the Wildlife Computers API ; :summary = Wildlife Computers SPOT5 tag (ptt id 45866) deployed on a great white shark (Carcharodon carcharias) by Chris G. Lowe in the North Pacific Ocean from 2009-09-23 to 2009-11-23 ; :title = Great white shark (Carcharodon carcharias) location data from a satellite telemetry tag (ptt id 45866) deployed in the North Pacific Ocean from 2009-09-23 to 2009-11-23, deployment id 5f0668a86321be13bc7ef628 ; :uuid = ff554ebf-bf4b-5a82-8a90-9c0ceb799d96 ; :platform_name = Carcharodon carcharias ; :platform_id = 105838 ; :vendor_id = 5f0668a86321be13bc7ef628 ; :sea_name = North Pacific Ocean ; :arbitrary_keywords = ATN, Animal Telemetry Network, IOOS, Integrated Ocean Observing System, trajectory, satellite telemetry tag ; :contributor_role_vocabulary = https://vocab.nerc.ac.uk/collection/G04/current/ ; :creator_role_vocabulary = https://vocab.nerc.ac.uk/collection/G04/current/ ; :creator_sector_vocabulary = https://mmisw.org/ont/ioos/sector ; :creator_type = person ; :date_metadata_modified = 20230816 ; :instrument = Satellite telemetry tag ; :instrument_vocabulary = ; :keywords_vocabulary = GCMD Science Keywords v15.1 ; :ncei_template_version = NCEI_NetCDF_Trajectory_Template_v2.0 ; :product_version = ; :program = IOOS Animal Telemetry Network ; :publisher_type = institution ; :references = ; :animal_common_name = great white shark ; :animal_id = 09_13 ; :animal_scientific_name = Carcharodon carcharias ; :deployment_id = 5f0668a86321be13bc7ef628 ; :deployment_start_datetime = 2009-09-23T00:00:00Z ; :deployment_end_datetime = 2009-11-23T00:00:00Z ; :wmo_platform_code = ; :comment = 09_13-45866 ; :ptt_id = 45866 ; :deployment_start_lat = 34.03 ; :deployment_start_lon = -118.56 ; :contributor_name = ; :contributor_email = ; :contributor_role = collaborator ; :contributor_institution = ; :contributor_url = ; :creator_role = principalInvestigator ; :creator_sector = academic ; :creator_country = USA ; :creator_institution = ; :creator_institution_url = ; :citation = ; }

MathewBiddle commented 1 year ago

@albenson-usgs I'm poking around in this now.

For locationID I followed the guidance at https://github.com/tdwg/dwc-for-biologging/wiki/Acoustic-sensor-enabled-tracking-of-blue-sharks image

But maybe that's only for the tagging event?

Now that I'm fiddling with the data more, I'm wondering if there should be two/three events.

  1. Tagging of the animal
  2. automated tracking of the animal via satellite telemetry
  3. recovery of animal (if applicable?)

cc @mmckinzie

MathewBiddle commented 1 year ago

Maybe https://github.com/tdwg/dwc-for-biologging/wiki/Movebank-GPS-data#darwin-core-recommendation is the right way?

MathewBiddle commented 1 year ago

This is what I understand from the text on movebank GPS data:

flowchart LR

A([Deployment])
B([Tag attachment])
C([GPS positions])

A --parentEventID--> B
A --parentEventID--> C

subgraph parent event
A
end

subgraph child events
B
C
end
MathewBiddle commented 1 year ago

I worked through some reorganizing after discussion on the Slack space. I think I have addressed most of the comments in https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1692201277

It was decided to go with occurrence and emof (no event).

Here are the files and notebook for review:

I am most curious about additional information we could be porting into the occurrence or emof record. For example, we have information about the Instrument family (eg. SPOT),Type of location: Argos, FastGPS or User, Location Quality Code from ARGOS satellite system, Platform Transmitter Terminal (PTT) id used for Argos transmissions, instrument_tag (telemetry tag applied to animal including serial number and make_model), and instrument_location (serial_number and make_model). Further information about each of those variables are included in the netCDF metadata in this comment https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1692211792

We also have a few flag variables (time, speed, location, and rollup) and a bunch of metadata that could be stuck somewhere.

MathewBiddle commented 1 year ago

ATN data are now being archived at NCEI. For the notebook I'm working on here, I would like to pull the source data from this archival information package. https://www.ncei.noaa.gov/archive/accession/0282699

File - https://www.nodc.noaa.gov/archive/arc0217/0282699/1.1/data/0-data/atn_45866_great-white-shark_trajectory_20090923-20091123.nc

albenson-usgs commented 1 year ago

@sformel-usgs will handle the next review on this. Also I know that @jdpye published some (lots?) of data to OBIS somewhat recently and might have some words of wisdom to share.

jdpye commented 1 year ago

We did!

I looked over Mat's shoulder briefly at the IOOS DMAC but I would gently recommend we further align this to the standard that OTN and ETN had worked out for all our satellite and acoustic telemetry data publishing, if it's possible. Just a bit of summarization of the occurrences to keep the row count manageable when our datasets get included in general queries against OBIS in the future.

MathewBiddle commented 1 year ago

Here is the mapping table for the occurrence record:

DarwinCore netCDF
basisOfRecord data contained in the type variable where type of User = HumanObservation and Argos = MachineObservation.
organismID platform_id global attribute plus the animal_common_name global attribute.
eventDate data contained in time variable. Converted to ISO8601.
occurrenceID eventDate, plus data contained in z variable, plus animal_common_name global attribute.
decimalLatitude data in lat variable.
decimalLongitude data in lon variable.
geodeticDatum attribute epsg_code in the crs variable.
eventID animal_common_name global attribute plus the eventDate.
kingdom kingdom attribute in the animal variable.
taxonRank rank attribute in the animal variable.
occurrenceStatus hardcoded to present.
sex data from the variable animal_sex.
lifeStage data from the variable animal_life_stage.
scientificName data from the variable taxon_name.
scientificNameID data from the variable taxon_lsid.
coordinateUncertaintyInMeters maximum value of the data from the variables error_radius, semi_major_axis, and offset.
MathewBiddle commented 1 year ago

And for the measurement or fact file The measurementOrFact file will only contain information referencing the basisOfRecord = HumanObservation as these observations were made when the animal was directly tagged, in person (ie. when basisOfRecord == HumanObservation).

DarwinCore Term Status netCDF
organismID The platform_id global attribute plus the animal_common_name global attribute.
occurrenceID Required eventDate, plus data contained in z variable, plus animal_common_name global attribute.
measurementType Required long_name attribute of the animal_weight, animal_length, animal_length_2 variables.
measurementValue Required The data from the animal_weight, animal_length, animal_length_2 variables.
eventID Strongly Recommended animal_common_name global attribute plus the eventDate.
measurementUnit Strongly Recommended unit attribute of the animal_weight, animal_length, animal_length_2 variables.
measurementMethod Strongly Recommended animal_weight, animal_length, animal_length_2 attributes of their respective variables.
measurementTypeID Strongly Recommended mapping table somewhere?
measurementMethodID Strongly Recommended mapping table somewhere?
measurementUnitID Strongly Recommended mapping table somewhere?
measurementAccuracy Share if available
measurementDeterminedDate Share if available
measurementDeterminedBy Share if available
measurementRemarks Share if available
measurementValueID Share if available
sformel-usgs commented 1 year ago

@MathewBiddle I'm still getting up to speed on this. Does anything need review right now?

MathewBiddle commented 1 year ago

@jdpye From https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1686715497, my understanding is the decimation strategy for these satellite telemetry observations should be:

'take the first detection/location per hour', with other Darwin Core fields like dataGeneralizations helping characterize the summarization by indicating how many detections have been obfuscated by the aggregation.

So, I will work on taking my occurrence table and decimating it to the first detection each hour. Does that sound reasonable?

MathewBiddle commented 1 year ago

@sformel-usgs Yes! If you don't mind taking a look at the csv files I reference in https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1710385902, that will help us in the overarching organization of these data. I think the decimation strategy will simply limit the amount of rows from what we have above.

jdpye commented 1 year ago

@jdpye From #145 (comment), my understanding is the decimation strategy for these satellite telemetry observations should be:

'take the first detection/location per hour', with other Darwin Core fields like dataGeneralizations helping characterize the summarization by indicating how many detections have been obfuscated by the aggregation.

So, I will work on taking my occurrence table and decimating it to the first detection each hour. Does that sound reasonable?

Yep! With this, you can add into dataGeneralizations a string like 'first of # records' to indicate there are more records in the raw dataset to be discovered by the super-curious.

jdpye commented 1 year ago

I just finished prototyping up a DwC archive to lonboard / Deck.gl vis tool and so i will attempt to eat your DwC archive with it when i get time!

MathewBiddle commented 1 year ago

Here's a stab at filtering the occurrence record down to the first occurrence per hour (in Python). https://gist.github.com/MathewBiddle/d434ac2b538b2728aa80c6a7945f94be

Now to write that in R...

MathewBiddle commented 1 year ago

Figured out how to do it in R (hacky but works for now):

library(lubridate)

# sort by date
occurrencedf <- occurrencedf %>% arrange(eventDate)

# create column of date to the hour which will be our decimation strategy
occurrencedf$eventDateHrs <- format(as.POSIXct(occurrencedf$eventDate, format="%Y-%m-%dT%H:%M:%SZ"),"%Y-%m-%dT%H")

# filter table to only unique date + hour and pick the first row keeping all the columns
occurrencedf <- distinct(occurrencedf,eventDateHrs,.keep_all = TRUE)

# nuke the invented column
occurrencedf$eventDateHrs <- NULL

occurrencedf

Filtering by data quality codes

In these data we also have additional information about the Location Quality Code from ARGOS satellite system and QARTOD tests. Below are the codes and those meanings.

ARGOS Codes

code_values code meanings
G estimated error less than 100m and 1+ messages received per satellite pass
3 estimated error less than 250m and 4+ messages received per satellite pass
2 estimated error between 250m and 500m and 4+ messages per satellite pass
1 estimated error between 500m and 1500m and 4+ messages per satellite pass
0 estimated error greater than 1500m and 4+ messages received per satellite pass
A no least squares estimated error or unbounded kalman filter estimated error and 3 messages received per satellite pass
B no least squares estimated error or unbounded kalman filter estimated error and 1 or 2 messages received per satellite pass
Z invalid location (available for Service Plus or Auxilliary Location Processing)

Since codes A, B, and Z are essentially bad values, I propose that we filter those out.

Also, create a mapping table for coordinateUncertaintyInMeters that corresponds to the ARGOS code maximum error as shown in the table below:

code coordinateUncertaintyInMeters
G 100
3 250
2 500
1 1500
0 >1500 (not sure what would go there?)

QARTOD Codes

value meaning
1 PASS
2 NOT_EVALUATED
3 SUSPECT
4 FAIL
9 MISSING

The QARTOD tests are:

variable long_name
qartod_time_flag Time QC test - gross range test
qartod_speed_flag Speed QC test - gross range test
qartod_location_flag Location QC test - Location test
qartod_rollup_flag Aggregate QC value

I'm not sure what to do here. My preference would be to include all rows where qartod_rollup_flag == 1 and drop the rest. But I'm open to suggestions.

MathewBiddle commented 1 year ago

@sformel-usgs @jdpye I've updated the notebook (and on nbviewer) to include this decimation strategy as well as adding in some initial filtering based on location class and the inclusion of dataGeneralizations to the occurrence record. I've filtered down the emof to only contain data where data were observed.

If you don't mind taking a look when you get a chance, it would be much appreciated! I think there are some additional details we can add to the occurrence/emof from the netCDF files, I'm just not sure what.

sformel-usgs commented 1 year ago

@MathewBiddle here are some thoughts. I'm still feeling like I don't have a good grasp on all the moving parts, so please ping me here or in Slack if there is anything I didn't address specifically, no matter how small. I don't see any big issues, what you've derived works as a DwC-A. But I'm going to dig through the data a little more and see if there is anything else I think could be included.

  1. I was able to work through most of the R notebook with no big issues. There are some spots where I think I could help make things more succinct and/or readable. I just forked the repo and will submit a PR with some suggestions. I'll try to do this tomorrow morning.

  2. I couldn't quickly identify where to grab the file, atn_trajectory_template.nc that is referenced in the EML building (cell 54).

  3. coordinateUncertaintyInMeters needs to be an integer or blank. So, if you can't put a confident maximum boundary on > 1500, then you can leave it blank for unknown. I'll take a closer look at that data when I have some more time.

  4. I understand that the QARTOD flags are for QC, but I don't know enough about them to say if they should be filtered out. If not, the flags could be included through eMOF (although that might be easy to overlook, and therefore a bad idea).

  5. For the eMOF attributes that you've marked as "mapping table somewhere?". I'm not sure if this is what you're after, but I think these would need to be found on a case-by-case basis. But it should be easy to find some examples for measurementUnitID. The other two would depend on whether or not anyone has published the method and type in database like NERC.

jdpye commented 1 year ago

I think I can help find your P01 codes for the measurements, sorry, I didn't look at the emof file on the first pass.

I'll look at this today!

jdpye commented 1 year ago

for the coordinateUncertaintyInMeters distance for Argos location class 0, this paper suggests an upper bound of ~ 10km. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0063051

From that paper, this quote:

In brief, “good” positions (location codes 3, 2, 1, A) are accurate to about 2 km, while 0 and B locations are accurate to about 5–10 km. However, due to the lognormal distribution of the errors, larger outliers are to be expected in all location codes and need to be accounted for in the user’s data processing.

does not fill my heart with joy, so the upper bound of the estimate is probably a safer value to include.

MathewBiddle commented 1 year ago

Thanks for taking a look! I should have mentioned the EML section of the notebook is a work in progress. It should reference the same netcdf file that is used to generate the dwc files (the one from NCEI). I just haven't updated it in a few months.

Something to discuss is if generating the EML is even necessary. Would OBIS-USA generate the EML? Is there a way to for a provider to upload an EML xml file? How should we deal with this with the expectation that we might want to automate the process?

jdpye commented 1 year ago

If everyone has filled in their metadata for the NetCDF files in the same way, we should get a simple EML template for this flavour of data and map our incoming data to it, and submit that to your OBIS publication endpoint along with the data, as an initial pass of the metadata for the archive. Amendments can be made after the initial metadata harvest from the source NetCDF, but we should have a good start from there.

If we build a simple eml.xml and zip it up, the metadata pre-populates and will save your OBIS data manager a bit of headache :D

sformel-usgs commented 1 year ago

@MathewBiddle the IPT is all fat fingers. So, the more EML you can generate programmatically, the less time it will take and the less chance of human error. But just do the easy stuff, don't worry about getting every detail.

MathewBiddle commented 1 year ago

Since these are satellite telemetry observations, our depth of measurement is always == 0, so minimumDepthInMeters and maximumDepthInMeters should be 0, correct? Does it cause an issue if they are the same value?

albenson-usgs commented 1 year ago

No that's fine that they are the same value.

MathewBiddle commented 1 year ago

I have added in min/max depth to the occurrence file https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/atn_45866_occurrence.csv

I've merged the optimizations @sformel-usgs proposed and cleaned up some of the comments.

As far as the metadata goes, the source netCDF files are built via an automated pipeline, so we know what content is going where and how much (or little) it will be standardized. It's merely a mapping exercise to get the information into EML for the records. However, I am curious to get @mmckinzie to weigh in on the granularity of the "datasets" for OBIS. Right now, we are archiving at NCEI on a deployment by deployment basis, is that too granular for OBIS?

Obviously, it would be much simpler to have 1 ATN dataset that is updated with new deployments as they make it to NCEI. But, we loose some granularity in the credits at OBIS when we do that.

Some items to consider:

Maybe there's lessons learned from the CREMP datasets we should explore? image

I think answering those questions will help us decide what needs to be mapped into the metadata record.

MathewBiddle commented 1 year ago

Should we also include samplingProtocol == satellite telemetry? Similar to https://github.com/inbo/etn/blob/abfe5b000913706f50a7563c92e9024f668046a1/inst/sql/dwc_occurrence.sql#L222C45-L222C61

sformel-usgs commented 1 year ago

@MathewBiddle sorry if I'm overlooking it in the above comments, could you point me to an example metadata record from ATN/NCEI? I don't have a sense of what is included, how many people are credited, and how often it's updated.

MathewBiddle commented 1 year ago

Here is the NCEI landing page for this dataset https://www.ncei.noaa.gov/archive/accession/0282699

That metadata record is built at NCEI directly from the netCDF file, plus any additional NCEI metadata. My hope would be that we would build the EML metadata directly from the netCDF file instead of harvesting from another source. But, I'm open to suggestions.

In a perfect world, these data wouldn't have updates. The archive packages will be updated only when there are additions of other observing methods, like profile observations or modeled tracks (foie gras analysis), which would be added in separate files. So, the satellite telemetry data files would be static. But, we all know that perfect worlds are hard to come by, so building in an update process would be who of us.

As for the number of people credited, that could be anywhere from 1-n, some of these will be one PI, others could have ten, it's highly variable.

Note: ATN and NCEI are still working out the authorship and acknowledgements in the files and resultant NCEI metadata as some pieces we're mapped correctly. That should be addressed very soon.

MathewBiddle commented 1 year ago

I got confused with the files in different repos. So, I've added the mobilization notebook here as a PR and converted it to .Rmd

rmarkdown:::convert_ipynb('atn_satellite_telemetry_netCDF2DwC.ipynb',"atn_satellite_telemetry_netCDF2DwC.Rmd")

The .Rmd, source data, and resultant DwC can be found in this directory: https://github.com/MathewBiddle/bio_data_guide/tree/add_atnsat_telem/datasets/atn_satellite_telemetry

jdpye commented 1 year ago

I like the samplingProtocol as 'satellite telemetry', we were talking with the rest of the tdwg MOBS group about deciding on a controlled or suggested vocabulary for samplingProtocol and any steps we take towards that will help us down the line.

I would argue strongly for creating granular datasets, first because attribution can be precise and comprehensive without overattributing researchers to unrelated tracks held at ATN, but also because that would allow individual researchers to revise/update/extend their program or their individual track data as needed without triggering a major update of some ATN-wide archive.

MathewBiddle commented 8 months ago

Is there a place in Darwin Core where we could have a link that goes to the NCEI archived raw data?

@laurabrenskelle was looking into this.

MathewBiddle commented 8 months ago

associatedMedia? associatedReferences?

We would also want to do this for passive acoustic data. Pointing back to the raw audio files at NCEI.

laurabrenskelle commented 8 months ago

Created an issue to discuss this in the DwC Q&A repo: https://github.com/tdwg/dwc-qa/issues/207

laurabrenskelle commented 8 months ago

dcterms:references is the term to use. We will just need to make sure there is a way to trace an occurrence from OBIS to a particular record in the ATN NCEI archive. Probably occurrenceID?

sformel-usgs commented 7 months ago

I agree, if the identifier for the observation record can stay consistent across service endpoints, that would be ideal, and occurrenceID would be the way to go.

@MathewBiddle for PAM and the raw audio, I think we should use associatedMedia, to describe a single wav/flac file within an archive, since this is analogous to the raw sequences from DNA data that are references in associatedSequences. However, the entire archive of sound files could either be described with associatedMedia or references (maybe let the community guide us on this one).

MathewBiddle commented 7 months ago

I don't want to close this just yet.

TODO:

MathewBiddle commented 5 months ago

@laurabrenskelle can you take a look at the rmd and see how we can add the NCEI url into the dwc archive?

See https://github.com/ioos/bio_data_guide/tree/main/datasets/atn_satellite_telemetry/

And https://github.com/ioos/bio_data_guide/tree/main/datasets/atn_satellite_telemetry/data/dwc

laurabrenskelle commented 5 months ago

@MathewBiddle Are we just wanting to add the link to the landing page to references or are we trying to be more granular than that?

MathewBiddle commented 5 months ago

Good question. By granular, what do you mean? I don't think we can get much more granular than that from NCEI. Unless we're talking about the specific url to the data file?

I think either including the url to the landing page (eg. https://www.ncei.noaa.gov/archive/accession/0282699) or one of the identifiers from the landing page (screenshot below) would suffice. image

laurabrenskelle commented 5 months ago

Sorry, I guess because this is just one dataset from one shark's track, the landing page should suffice. Is that the case for all ATN data, or are they ever aggregated with data from multiple animals mixed together?

MathewBiddle commented 5 months ago

They will be archived on a deployment by deployment basis. So it should be one animal for each netCDF file.

laurabrenskelle commented 5 months ago

An "old" but still ATN-relevant conversation from the TDWG Darwin Core Q&A issues: https://github.com/tdwg/dwc-qa/issues/173 I thought it was worth dropping here for future reference.

MathewBiddle commented 5 months ago

Thanks, @laurabrenskelle ! I think I forgot to include dwc:samplingProtocol as satellite telemetry per https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1814954547