Closed peterdesmet closed 3 years ago
This is great Peter. The layout is super simple and easy to follow. In entering a brave new world without necessarily using Event Core, I suspect that there we should examine whether there are some terms here that will be particularly useful for identifying different types of machine observations that will fit into the occurrence core. For example, I feel that it would be really useful to differentiate types of study, eg: acoustic telemetry vs gps telemetry vs geolocation vs radio tracking. Is this the right job for samplingProtocol - should we be aiming for a semi formal vocabulary at least for these terms?
Just FYI, we did some work a while back on lossy transformation of GPS data using a spatiotemporal grid. The difference with this approach is that more data are retained when the individual is covering larger distances (e.g. every kilometer in addition to every hour). See https://github.com/iobis/ziptrack but use with caution as we didn't test this extensively.
@peggynewman: controlled vocabulary for samplingProtocol
: that is a good idea, will make a new issue.
@pieterprovoost nice, good to know! I have opted for one by hour, because it is simple to explain and implement (basic window function in SQL).
I have updated the README with a summary of the transformation approach, and will now close this issue. I will extend the use case next year to all available Movebank terms and - with the help of @niconoe - make the transformation steps generic so they can run on any Movebank gps dataset expressed as a frictionless data package.
I have finished a new use case:
It is similar to the Mahoney use case @sarahcd made, but rather than attempting to map all source data to Darwin Core, it is lossy (as suggested at our TDWG WG session). It extracts the more basic biological occurrence data that can be harvested by GBIF/OBIS. E.g. it does not include tag, deployment end and acceleration data and subsamples the data per hour. The result is 2 occurrence files:
dwc_occurrence_deployment.csv
: a HumanObservation for the tag deployment, with sex, lifeStage, etc. of the animaldwc_occurrence_gps.csv
: MachineObservations by the gps tag, with an indication that it is subsampled per hourThere is no Event Core, since there is not really location or time information to group the occurrences by. Occurrence do have an
eventID
though (atag-id
+animal-id
combination) to allow grouping these in deployments (each one containing a single HumanObservation and number of MachineObservations).Transformations are done in documented sql queries based on a sqlite database derived from the source data (a data package). These transformations have been reviewed by @sarahcd, but you are all welcome to leave comments.