tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
44 stars 5 forks source link

How can sequences group and split multimedia files? #110

Closed peterdesmet closed 3 years ago

peterdesmet commented 3 years ago

@ben-norton in ChapmanCore, multimedia files have a sequenceID. This allows observations to be media-based (one distinct sequenceID per file) or sequence-based (one sequenceID for multiple files).

video1   seq1
imageA1  seq2
imageA2  seq2
imageA3  seq2
imageB1  seq3
imageB2 seq4

Do I understand correctly that without creating duplicate file records this approach:

Just trying to wrap my head around this.

ben-norton commented 3 years ago

Awesome question. Give me a day or so. After reading and responding, I need to make sure my documentation is in-line with these responses.

ben-norton commented 3 years ago

_Does not allow for videos to be split into sequences, as you can only assign one sequence ID per file?__ You could tackle this in two ways.

  1. Each video file is a single sequence. Therefore, each sequence is defined by a single property, video_id. To split a video into sequences, you would split the video into multiple files. Each of those files would be a sequence. Here's an example scenario. You start with one video file. You want to associate that video files with 10 sequences. Each of those sequences is associated with 4 identifications. Since you can't associate a single video with 10 sequences, you would split the video file into 10 separate files, each belonging to one sequence -> 10 video files = 10 sequences.
  2. Allow each video file can be associated with 1 to many sequences. Therefore, each sequence is defined by three properties, video_id, start time, and end time. This gets a bit complicated with images. Ideally, I'd like to go with number 2, but I need to really think it through to make sure there are no unintended consequences.
ben-norton commented 3 years ago

Does not allow for file and sequence based observations to exist in the same package, as you can only assign either a unique or grouping sequenceID per file? It should, so if it doesn't then I need to fix my documentation. This is a key feature of Chapman Core after learning from eMammal (sequence-based) and Wildlife Insights (image and sequence-based). Even though Wildlife Insights can accomodate both, you must delcare the type at the project level. A single project cannot contain both file and sequence based observations.

The challenge is how you determine the number of animals associated with a single identification. In analytics, the number of animals in a single file and the number of animals in a sequence are two very different things. Roland can elaborate better on this specific challenge.

ben-norton commented 3 years ago

You can't use a single-image animal count and a sequence count in the same dataset for modeling purposes. You must group images by an independent interval (60seconds in eMammal) to process alongside sequences. Otherwise, you're occupancy models are all wrong.

kbubnicki commented 3 years ago

A single project cannot contain both file and sequence based observations.

It makes a lot of sense to me, thats also how we designed Camtrap DP i.e. to support both cases. However, thinking about the future (ML/AI) I would put more emphasis on file-based identification.

In analytics, the number of animals in a single file and the number of animals in a sequence are two very different things.

Definitely. For the expert-based annotations/identifications we solved this issue introducing a field count_new which next to a total number of animals identified on an image/video (i.e. count) stores information about new individuals on current image/video taking into account an entire sequence. This, together with a predefined sequence_interval, gives us a possibility for automatic aggregation of count data. How to solve the same problem for ML/AI is less clear to me - likely some kind of smart objects-tracking algorithm would need to be involved.

peterdesmet commented 3 years ago

@ben-norton I think we have discussed this, and the nullable multimedia_id as currently in use in Camtrap DP is the most flexible option. Can we close this issue?

ben-norton commented 3 years ago

Yes