tdwg / dwc-for-biologging

Darwin Core recommendations for biologging data
Creative Commons Attribution 4.0 International
13 stars 3 forks source link

Ready for review: Movebank GPS → Darwin Core #34

Closed peterdesmet closed 3 years ago

peterdesmet commented 3 years ago

I have finished a new use case:

Lossy transformation of GPS data formatted in the Movebank Attribute Dictionary to Darwin Core.

It is similar to the Mahoney use case @sarahcd made, but rather than attempting to map all source data to Darwin Core, it is lossy (as suggested at our TDWG WG session). It extracts the more basic biological occurrence data that can be harvested by GBIF/OBIS. E.g. it does not include tag, deployment end and acceleration data and subsamples the data per hour. The result is 2 occurrence files:

There is no Event Core, since there is not really location or time information to group the occurrences by. Occurrence do have an eventID though (a tag-id + animal-id combination) to allow grouping these in deployments (each one containing a single HumanObservation and number of MachineObservations).

Transformations are done in documented sql queries based on a sqlite database derived from the source data (a data package). These transformations have been reviewed by @sarahcd, but you are all welcome to leave comments.

peggynewman commented 3 years ago

This is great Peter. The layout is super simple and easy to follow. In entering a brave new world without necessarily using Event Core, I suspect that there we should examine whether there are some terms here that will be particularly useful for identifying different types of machine observations that will fit into the occurrence core. For example, I feel that it would be really useful to differentiate types of study, eg: acoustic telemetry vs gps telemetry vs geolocation vs radio tracking. Is this the right job for samplingProtocol - should we be aiming for a semi formal vocabulary at least for these terms?

pieterprovoost commented 3 years ago

Just FYI, we did some work a while back on lossy transformation of GPS data using a spatiotemporal grid. The difference with this approach is that more data are retained when the individual is covering larger distances (e.g. every kilometer in addition to every hour). See https://github.com/iobis/ziptrack but use with caution as we didn't test this extensively.

peterdesmet commented 3 years ago

@peggynewman: controlled vocabulary for samplingProtocol: that is a good idea, will make a new issue.

peterdesmet commented 3 years ago

@pieterprovoost nice, good to know! I have opted for one by hour, because it is simple to explain and implement (basic window function in SQL).

peterdesmet commented 3 years ago

I have updated the README with a summary of the transformation approach, and will now close this issue. I will extend the use case next year to all available Movebank terms and - with the help of @niconoe - make the transformation steps generic so they can run on any Movebank gps dataset expressed as a frictionless data package.