No way of linking media and events?

jevansbio commented 9 months ago

As far as I can tell the current media csv file, there is no key linking the event to the media. In event based observation, this means media is less useful. The only way to work out what event a media item belongs to is to use the start/end time of the event and the deployment ID. I feel having at least the option to provide an event ID in the media file would be useful?

bencevans commented 9 months ago

I’ve also just encountered this issue as our labelling is done at the sequence/event level, and for validation & visualisation purposes, we need to identify the media related to the event.

As I understand it, when an observation is made at an event level, the eventID is set in observations.csv, but there is no relationship to individual mediaIDs.

Two methods could work...

Method 1 (Extension) - Creating Events CSV

Create a new events.csv file with the following structure to link eventIDs with mediaIDs:

eventID	mediaID
E00001	M00001
E00001	M00002
E00001	M00003
E00002	M00004

This method gives the advantage that it isn’t breaking the current specification but can be optionally utilised as an extension. However, if considered to be adopted by the core specification, it would be great if it became a required table in the case event-level annotations are incorporated.

Method 2 :warning: Change to `mediaID` rules/type

Currently in the specification for mediaID (string):

Identifier of the media file that was classified. Only applicable for media-based observations (observationLevel = media). Foreign key to media.mediaID.

The idea would be to utilise the otherwise blank mediaID for event observations with a comma-separated list of associated mediaIDs.

E.g.

observationID,deploymentID,mediaID,eventID,eventStart,eventEnd,observationLevel,event,observationType
O0001,D001,”M001,M002,M003”,E001,2020-03-01T22:00:00Z,2020-04-01T22:00:00Z,E001,animal

This would break the current rules/validation of the specification and may complicate parsing due to commas in the field.

I'm currently exploring Method 1 as not to break any compatibility with the current spec / tooling.

jevansbio commented 9 months ago

I actually went with Method 1 in my implementation of camera trap DP export and mentioned in the datapackage.json. I feel this is the simplest solution. Having an eventID column in the media table would of course be even simpler, but will not work in situations where a media item is used in multiple events..

peterdesmet commented 7 months ago

Hi (and sorry for the late answer)

The only way to work out what event a media item belongs to is to use the start/end time of the event and the deployment ID.

Indeed, that is by design. We have gone back and forth on multiple ways to express events, but settled on the current model because:

It keeps media.csv to just the facts: "these are the media collected per deployment"
It provides a high level of flexibility in defining events in observations.csv (only): 1) at media level, 2) without assigning an eventID (just eventStart and eventEnd), 3) having multiple events for the same media file (e.g. a video file), etc.

That flexibility does put a bit of burden on software that reads Camtrap DP, since events need to be extracted or created.

Create your own events from media-based observations

Filter observations on observationLevel=media
Define an event rule (e.g. observations that were seen within 120 seconds of each other belong together)
Group observations on that rule and assign eventIDs
Link events to media files using obs.eventStart <= media.timestamp <= obs.eventEnd and obs.deploymentID = media.deploymentID

Extract (predefined) events:

Filter observations on observationLevel=event
Take the unique eventID + eventStart + eventEnd combinations to get the events
Link events to media files using obs.eventStart <= media.timestamp <= obs.eventEnd and obs.deploymentID = media.deploymentID

For R users, we plan to incorporate this functionality in camtraptor. This file could indeed be saved as an events.csv (your suggested method 1), but we want to avoid adding tables to the core specifications that can be generated. The beauty of Data Package is that you can extend it as you want (like you did).

lrdijkhuis commented 5 months ago

Hi @peterdesmet,

Today I ran into an issue with the same origin as @bencevans and @jevansbio: absence of event (sequence) information in the Media. I have many videofiles of which Agouti cannot reliably obtain the timestamp from the metadata. All timestamps are therefore floored to 01-01-1970 00:00:00.

The method you suggest, where you create eventes from media-based observations is unfeasible in this way. Every event will be merged to every media. Hence: building something up from Media is impossible.

I've been a user of Agouti for a very long time now, and this has not always been an issue. In the past media were not only linked to the deployment, but also to the (sequence of) bursts within a deployment. Since it makes it possible to separate different triggers from each other. This is very important information in the data management. Since it makes it possible to separate different triggers from each other. Including issues with simultaneous timelapse triggers and motion triggers (https://github.com/tdwg/camtrap-dp/issues/375#issue-2093581669).

Secondly, not every one has programmer-level experience and can build their own event data (like you suggest). Therefore your suggested solution is absolutely not user-oriented and user-friendly.

It would solve many issues with users if eventID would be added to the media information. Just like it was before this year. This would be solution Number 3.

Please contact me if you would like to discuss this further. Laurens

lrdijkhuis commented 5 months ago

Dear @peterdesmet,

I've looked a little further into this issue myself. It seems that it originates from an assumption that is made in the code that all timestamps are unique within a deployment. "https://inbo.github.io/camtrapdp/reference/read_camtrapdp.html#assign-eventids". This would make sense, but does not hold, since time stamps from for example video's cannot be read reliably from the video meta data (pers comm. agouti staff). Yet, every video is an event, but can not be connected to it's own meta data with absence of eventID.

Secondly eventID is a Key variable in de structure LocationID > deploymentID > eventID (i.e. trigger (formerly referred to as sequenceID)) > (observationID(s) >) mediaID. Hence it should not be created using the method you propose: "eventStart <= media.timestamp <= eventEnd" but should always be present in the data export.

In camera trapping, the trigger would be the event (animal passing by OR time lapse/standardized photo taken). These two cannot be regarded as one event (the suggestion in the readme states) since the trigger originates from a different cause. The "sub-event"(ref.) would be what happens on the sequence of photo's in the trigger, being the Observation --> observationID.
Following this, along with the issues it poses, hard-linking mediaID to eventIDs would be more suitable than letting the coding do the job (with all errors it makes). In the end, the package should return the raw data as it is in agouti, or any other program. Not doing extensive preprocessing and modifying data.

My main concern is that this has never been an issue, until the beginning of this year, and i fear that using the current approach issues will remain coming up.

tdwg / camtrap-dp