inbo / movepub

R package to prepare animal tracking data from Movebank for publication in a research repository or GBIF
https://inbo.github.io/movepub/
Other
3 stars 1 forks source link

Grouping events #10

Closed peterdesmet closed 2 years ago

peterdesmet commented 2 years ago

The setup suggested in #7 has a grouping event (deployment) and occurrences (1 tag attachment, 0-more gps positions). In line with the new model, @timrobertson100 suggests to have a separate event for all occurrences (basically eventID = occurrenceID), and to group all GPS positions under the deployment using parentEventID. The deployment itself also has occurrence attached: the tag attachment. See table:

occurrenceID | eventID   | parentEventID | basisOfRecord | eventRemarks
------------ | --------- | ------------- | ------------- | ------------------
ani1_tag1    | ani1_tag1 |               | HumanObs      | deployment remarks
occ1         | occ1      | ani1_tag1     | MachineObs    |
occ2         | occ2      | ani1_tag1     | MachineObs    |

@sarahcd I'm trying this approach to see how it looks at GBIF. Given that the capture and deployment event are one and the same in this model, I would name it tag deployment rather than tag attachment.

peterdesmet commented 2 years ago

@sarahcd we could give the first record the occurrenceID: ani1_tag1_capture as you suggested in the SQL, but I think it is easier to name the occurrenceID and eventID the same.

timrobertson100 commented 2 years ago

@peterdesmet - did you intend to have the parentEventID set on the deployment event? (we didn't in our slack chat) I imagine it's there either as a mistake or perhaps you're thinking it'd be a utility to group all related records, but semantically it seems wrong to me and I don't think it'd be consistently applied by others so couldn't be relied upon.

peterdesmet commented 2 years ago

@timrobertson100 oh right, my mistake. I didn't in my SQL:

https://github.com/inbo/movepub/blob/24c314dc135ad3fec965961997c96da58123d143/inst/sql/movebank_dwc_occurrence.sql#L49

sarahcd commented 2 years ago

I used parentEventID in previous attempts at this, it makes sense to me. Would be good to see how it looks on GBIF. And tag deployment sounds right. In this case, is the humanObs identified as a tag attachment event anywhere?

timrobertson100 commented 2 years ago

Would be good to see how it looks on GBIF

I anticipate we'll need to do two things - firstly to index parentEventID and then rework the UI a little. Neither should be huge tasks though.

peterdesmet commented 2 years ago

In this case, is the humanObs identified as a tag attachment event anywhere?

It is not. The current setup considers the deployment (a period) and the attachment (a timestamp) as the same thing. IMO that makes it a bit tricky, e.g. do you assign a duration deploy on timestamp/deploy off timestamp or single timestamp deploy on timestamp to that parent event? The first makes sense as an encompassing parent event, but less so for a single point in time when the e.g. biometric measurements where taken.

Here's how it would look if you separated those. It would require an event core. Note that as far as I understand, Movebank doesn't consider those as separate entities (e.g. what would the attach1 ID be?, how to split remarks?). @sarahcd @timrobertson100 thoughts?

occurrenceID | eventID   | parentEventID | eventDate | basisOfRecord | eventRemarks
------------ | --------- | ------------- | --------- | ------------- | ------------------
ani1_tag1    | ani1_tag1 |               | start/end | -             | Deployment remarks <-- in EVENT core
attach1      | attach1   | ani1_tag1     | timestamp | HumanObs      | Attachment remarks
occ1         | occ1      | ani1_tag1     | timestamp | MachineObs    |
occ2         | occ2      | ani1_tag1     | timestamp | MachineObs    |
sarahcd commented 2 years ago

I would lean towards just 1 event for the attachment/deployment because

Downsides are

MortenHofft commented 2 years ago

The way events are handled in GBIF isn't perfect, but we can do something:

Search by parentEventId is already possible https://api.gbif.org/v1/occurrence/search?parentEventId=2512a2db-851e-4b80-8ecd-b35ff6cbd54e with the caveat that it only search the immediate parent (not grand parents)

You can see the siblings and a link to the parent event https://www.gbif.org/dataset/aea17af8-5578-4b04-b5d3-7adf0c5a1e60/event/934265ab-3e45-45c4-9ebe-3d1acffe3798

And from the parent event you can see the children https://www.gbif.org/dataset/aea17af8-5578-4b04-b5d3-7adf0c5a1e60/parentevent/2512a2db-851e-4b80-8ecd-b35ff6cbd54e

And for individual occurrences there is a link to the event https://www.gbif.org/occurrence/3738499222

timrobertson100 commented 2 years ago

I don't know if this use is within the scope of how others use it in practice. @timrobertson100 ?

I checked with @tucotuco (Darwin Core lead author) and he agreed that it seems reasonable to use parentEventID in this manner.

peterdesmet commented 2 years ago

Great, let’s go with version 1.3 then:

This model does not require a separate event core

peterdesmet commented 2 years ago

Sorry to come back to this, but there is something bothering me about making the parentEvent (in reality a deployment time window) the same as the tag attachment (a specific moment in time). In addition, it does not allow to include deploy-off information, which can be included in the Movebank deployments table. That table basically includes two events (the deploy-on and the deploy-off) in a single row, but these are still two events.

I would suggest making all records siblings of the same parentEventID, including the deploy-on (and when available deploy-off) event. This is similar to the first approach, except that all records get their own eventID. I discussed this approach with @tucotuco

parentEventID | eventID | eventDate       | coordinates
A             | s       | deploy-on-date  | deploy-on-coord
A             | 1       | timestamp1      | position1
A             | 2       | timestamp2      | position2
A             | 3       | timestamp3      | position3
A             | e       | deploy-off-date | deploy-off-coord

There is no need to create a separate record for the parentEvent A (and thus Event Core), because all the information (start date, end date, location) is already included in detail in the child events. The only information that needs to be retained is the fact that it groups those child events (i.e. an identifier), which can be added in the Occurrence core as an parentEventID column.

I'll work out an example.

sarahcd commented 2 years ago

Sounds good to me, if you want to send me the updated sql when you have an example, I can check for other info we could populate in the deploy-off event. We'll also want to see how it behaves in datasets missing this information. Note deploy-on information can also be missing. (In organizing data on Movebank, it is only really essential when pre/post-deployment data have been imported, or there have been multiple deployments of tags/animals.)

peterdesmet commented 2 years ago

A new version has been updated at https://ipt.gbif.org/resource?r=o_westerschelde&v=1.4.

tucotuco commented 2 years ago

I don't know if you have any examples among existing data, but to protect against it, maybe consider have sex be a changing property.

peterdesmet commented 2 years ago

@tucotuco I'm unaware of tracked animals that change sex during their lifetime, but misidentification can happen. We have a case of a bird sexed as male in the first deployment an as female in the second. I'm not sure whether that would count against adding sex to all occurrences within a deployment.

tucotuco commented 2 years ago

Right, the ever-perplexing balance of what's possible versus what's usual. I'm not qualified to make that judgement, but I wanted to point it out in case it could one day matter and break everything.

peterdesmet commented 2 years ago

Interesting remark by @sarahcd in https://github.com/inbo/movepub/issues/24#issuecomment-1125252039:

[...] one thing to think about: The deploy start/end times are sometimes based on something other than a human observation. For the deploy-on time, we can assume with current methods that the tag had to be attached by a person, and therefore assume it at least approximates the time of a human observation. For the deploy-off-time, this is regularly used to define the end of the reliable tracking data, e.g., to exclude locations from the track determined to have been recorded after the tag stopped moving or sending reliable data. Even when tags are physically retrieved, this often happens after an animal has died, or after it has been automatically released from the animal, so doesn't necessarily represent an observation of the live animal.

@tucotuco @timrobertson100 does that change anything regarding what model to use?

sarahcd commented 2 years ago

It would be easy to change the code or output so that the sex only applied to the deployment event, or update if a correction is needed. For the DwC, we set this animal attribute to apply to the full track, along with species, because we thought it might be interpreted as inconsistent or unreliable if the value was present just once in many records of the same animal.

tucotuco commented 2 years ago

@tucotuco @timrobertson100 does that change anything regarding what model to use?

I don't think so. The cause of the event doesn't really change what its parent is or the need to give more metadata about the parent than an identifier for grouping, no?

peterdesmet commented 2 years ago

Thanks @tucotuco. Our end deployment observations/events will just not always be of the quality and precision we hope for. E.g. observations will be created for “around that date the deployment likely stopped, no location information”. But I guess that is not worse than some other occurrences. 🤷‍♂️

sarahcd commented 2 years ago

I am hesitant to automatically create HumanObservation records for deployment ends. This is because

Ideas: We could either leave them out, or have some kind of user selection with an explanation and the option to create deployment end events where appropriate (but do not create them by default).

peterdesmet commented 2 years ago

@sarahcd I think these are good arguments to leave them out, but still keep the same model:

parentEventID | eventID | eventDate       | coordinates
A             | s       | deploy-on-date  | deploy-on-coord
A             | 1       | timestamp1      | position1
A             | 2       | timestamp2      | position2
A             | 3       | timestamp3      | position3
# no deploy-of records

Information that might be available that we don't include then:

I think that is reasonable given the often low quality of deployment end information.

peterdesmet commented 2 years ago

@timrobertson100 @tucotuco @sarahcd can you indicate if you find that a reasonable approach?

tucotuco commented 2 years ago

Could you not still include a deployment-off type record for those cases that are concrete? That is, the animal was there, and dead, the purpose for the deployment was at a definitive end, and there is an Occurrence involved.

timrobertson100 commented 2 years ago

Thanks for all your work on this @peterdesmet - it makes sense to me.

We can explore what we can do to improve how it is displayed on GBIF.org to better indicate these are groupings under a deployment. It's not there yet but eventType will help disambiguate things too.

peterdesmet commented 2 years ago

@tucotuco it is just very hard to identify such records. Especially since were are creating a generic function for the mapping.

tucotuco commented 2 years ago

@peterdesmet Then yes, it makes sense not to include records that can be ambiguous. I guess you have algorithms to determine when to stop including records in a deployment, and that these usually exclude the deployment-off event.

peterdesmet commented 2 years ago

I guess you have algorithms to determine when to stop including records in a deployment, and that these usually exclude the deployment-off event.

Yes, either:

In all the above cases, the datetime of the last GPS position can be considered the actual deployment end date. It will be included in the Darwin Core view and is a valid occurrence of the animal.

peterdesmet commented 2 years ago

Updated in a68d6d2, data now looks like this (first record = human obs):

Screenshot 2022-05-20 at 08 53 46