Closed peterdesmet closed 2 years ago
@sarahcd we could give the first record the occurrenceID
: ani1_tag1_capture
as you suggested in the SQL, but I think it is easier to name the occurrenceID and eventID the same.
@peterdesmet - did you intend to have the parentEventID
set on the deployment event? (we didn't in our slack chat)
I imagine it's there either as a mistake or perhaps you're thinking it'd be a utility to group all related records, but semantically it seems wrong to me and I don't think it'd be consistently applied by others so couldn't be relied upon.
@timrobertson100 oh right, my mistake. I didn't in my SQL:
I used parentEventID
in previous attempts at this, it makes sense to me. Would be good to see how it looks on GBIF. And tag deployment
sounds right. In this case, is the humanObs
identified as a tag attachment event anywhere?
Would be good to see how it looks on GBIF
I anticipate we'll need to do two things - firstly to index parentEventID and then rework the UI a little. Neither should be huge tasks though.
In this case, is the humanObs identified as a tag attachment event anywhere?
It is not. The current setup considers the deployment (a period) and the attachment (a timestamp) as the same thing. IMO that makes it a bit tricky, e.g. do you assign a duration deploy on timestamp/deploy off timestamp
or single timestamp deploy on timestamp
to that parent event? The first makes sense as an encompassing parent event, but less so for a single point in time when the e.g. biometric measurements where taken.
Here's how it would look if you separated those. It would require an event core. Note that as far as I understand, Movebank doesn't consider those as separate entities (e.g. what would the attach1
ID be?, how to split remarks?). @sarahcd @timrobertson100 thoughts?
occurrenceID | eventID | parentEventID | eventDate | basisOfRecord | eventRemarks
------------ | --------- | ------------- | --------- | ------------- | ------------------
ani1_tag1 | ani1_tag1 | | start/end | - | Deployment remarks <-- in EVENT core
attach1 | attach1 | ani1_tag1 | timestamp | HumanObs | Attachment remarks
occ1 | occ1 | ani1_tag1 | timestamp | MachineObs |
occ2 | occ2 | ani1_tag1 | timestamp | MachineObs |
I would lean towards just 1 event for the attachment/deployment because
eventRemarks
now describe the tag attachment. To me that resolves this issue.Downsides are
parentEventID
("An identifier for the broader Event that groups this and potentially other Events."). I don't know if this use is within the scope of how others use it in practice. @timrobertson100 ?The way events are handled in GBIF isn't perfect, but we can do something:
Search by parentEventId
is already possible
https://api.gbif.org/v1/occurrence/search?parentEventId=2512a2db-851e-4b80-8ecd-b35ff6cbd54e
with the caveat that it only search the immediate parent (not grand parents)
You can see the siblings and a link to the parent event https://www.gbif.org/dataset/aea17af8-5578-4b04-b5d3-7adf0c5a1e60/event/934265ab-3e45-45c4-9ebe-3d1acffe3798
And from the parent event you can see the children https://www.gbif.org/dataset/aea17af8-5578-4b04-b5d3-7adf0c5a1e60/parentevent/2512a2db-851e-4b80-8ecd-b35ff6cbd54e
And for individual occurrences there is a link to the event https://www.gbif.org/occurrence/3738499222
I don't know if this use is within the scope of how others use it in practice. @timrobertson100 ?
I checked with @tucotuco (Darwin Core lead author) and he agreed that it seems reasonable to use parentEventID
in this manner.
Great, let’s go with version 1.3 then:
This model does not require a separate event core
Sorry to come back to this, but there is something bothering me about making the parentEvent (in reality a deployment time window) the same as the tag attachment (a specific moment in time). In addition, it does not allow to include deploy-off information, which can be included in the Movebank deployments table. That table basically includes two events (the deploy-on
and the deploy-off
) in a single row, but these are still two events.
I would suggest making all records siblings of the same parentEventID
, including the deploy-on (and when available deploy-off) event. This is similar to the first approach, except that all records get their own eventID
. I discussed this approach with @tucotuco
parentEventID | eventID | eventDate | coordinates
A | s | deploy-on-date | deploy-on-coord
A | 1 | timestamp1 | position1
A | 2 | timestamp2 | position2
A | 3 | timestamp3 | position3
A | e | deploy-off-date | deploy-off-coord
There is no need to create a separate record for the parentEvent A (and thus Event Core), because all the information (start date, end date, location) is already included in detail in the child events. The only information that needs to be retained is the fact that it groups those child events (i.e. an identifier), which can be added in the Occurrence core as an parentEventID
column.
I'll work out an example.
Sounds good to me, if you want to send me the updated sql when you have an example, I can check for other info we could populate in the deploy-off event. We'll also want to see how it behaves in datasets missing this information. Note deploy-on information can also be missing. (In organizing data on Movebank, it is only really essential when pre/post-deployment data have been imported, or there have been multiple deployments of tags/animals.)
A new version has been updated at https://ipt.gbif.org/resource?r=o_westerschelde&v=1.4.
ani1_tag1
parent eventstag deployment start
, gps
, tag deployment end
ani1_tag1_start
and ani1_tag1_end
since they don't have their own identifiers in the database.I don't know if you have any examples among existing data, but to protect against it, maybe consider have sex be a changing property.
@tucotuco I'm unaware of tracked animals that change sex during their lifetime, but misidentification can happen. We have a case of a bird sexed as male in the first deployment an as female in the second. I'm not sure whether that would count against adding sex to all occurrences within a deployment.
Right, the ever-perplexing balance of what's possible versus what's usual. I'm not qualified to make that judgement, but I wanted to point it out in case it could one day matter and break everything.
Interesting remark by @sarahcd in https://github.com/inbo/movepub/issues/24#issuecomment-1125252039:
[...] one thing to think about: The deploy start/end times are sometimes based on something other than a human observation. For the deploy-on time, we can assume with current methods that the tag had to be attached by a person, and therefore assume it at least approximates the time of a human observation. For the deploy-off-time, this is regularly used to define the end of the reliable tracking data, e.g., to exclude locations from the track determined to have been recorded after the tag stopped moving or sending reliable data. Even when tags are physically retrieved, this often happens after an animal has died, or after it has been automatically released from the animal, so doesn't necessarily represent an observation of the live animal.
@tucotuco @timrobertson100 does that change anything regarding what model to use?
It would be easy to change the code or output so that the sex only applied to the deployment event, or update if a correction is needed. For the DwC, we set this animal attribute to apply to the full track, along with species, because we thought it might be interpreted as inconsistent or unreliable if the value was present just once in many records of the same animal.
@tucotuco @timrobertson100 does that change anything regarding what model to use?
I don't think so. The cause of the event doesn't really change what its parent is or the need to give more metadata about the parent than an identifier for grouping, no?
Thanks @tucotuco. Our end deployment observations/events will just not always be of the quality and precision we hope for. E.g. observations will be created for “around that date the deployment likely stopped, no location information”. But I guess that is not worse than some other occurrences. 🤷♂️
I am hesitant to automatically create HumanObservation records for deployment ends. This is because
Ideas: We could either leave them out, or have some kind of user selection with an explanation and the option to create deployment end events where appropriate (but do not create them by default).
@sarahcd I think these are good arguments to leave them out, but still keep the same model:
parentEventID | eventID | eventDate | coordinates
A | s | deploy-on-date | deploy-on-coord
A | 1 | timestamp1 | position1
A | 2 | timestamp2 | position2
A | 3 | timestamp3 | position3
# no deploy-of records
Information that might be available that we don't include then:
I think that is reasonable given the often low quality of deployment end information.
@timrobertson100 @tucotuco @sarahcd can you indicate if you find that a reasonable approach?
Could you not still include a deployment-off type record for those cases that are concrete? That is, the animal was there, and dead, the purpose for the deployment was at a definitive end, and there is an Occurrence involved.
Thanks for all your work on this @peterdesmet - it makes sense to me.
We can explore what we can do to improve how it is displayed on GBIF.org to better indicate these are groupings under a deployment. It's not there yet but eventType will help disambiguate things too.
@tucotuco it is just very hard to identify such records. Especially since were are creating a generic function for the mapping.
@peterdesmet Then yes, it makes sense not to include records that can be ambiguous. I guess you have algorithms to determine when to stop including records in a deployment, and that these usually exclude the deployment-off event.
I guess you have algorithms to determine when to stop including records in a deployment, and that these usually exclude the deployment-off event.
Yes, either:
In all the above cases, the datetime of the last GPS position can be considered the actual deployment end date. It will be included in the Darwin Core view and is a valid occurrence of the animal.
Updated in a68d6d2, data now looks like this (first record = human obs):
The setup suggested in #7 has a grouping event (deployment) and occurrences (1 tag attachment, 0-more gps positions). In line with the new model, @timrobertson100 suggests to have a separate event for all occurrences (basically eventID = occurrenceID), and to group all GPS positions under the deployment using
parentEventID
. The deployment itself also has occurrence attached: the tag attachment. See table:@sarahcd I'm trying this approach to see how it looks at GBIF. Given that the capture and deployment event are one and the same in this model, I would name it
tag deployment
rather thantag attachment
.