tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
42 stars 5 forks source link

Restructuring the model #203

Closed peterdesmet closed 1 year ago

peterdesmet commented 2 years ago

In a discussion with @tucotuco on how to better align Camtrap DP with a common model for biodiversity data, a proposal came up on how to better structure sequences in Camtrap DP.

Preamble

  1. For the purpose of this discussion, I want to clarify what we mean by a sequence here:

    1. It is a group of media files
    2. It can be used as the basis of an observation (i.e. the group of image files was assessed as a whole, not unlike the frames of a video). The alternative is image-based observations, which come with some benefits (see point 4).
    3. It is created after the media files were captured, based on a pre-defined sequence interval "Maximum number of seconds between timestamps of successive media files to be considered part of a single sequence". As a result, a sequence can contain multiple triggers/bursts. sequence interval is not a camera setting, but one by the programme used to manage the images afterwards.
    4. It can be used as an "event" for biological analysis. The downside of sequence-based observations is that you are stuck with the sequence interval settings that were chosen. With image-based observations you can choose yourself how to group images together in logical events based on their timestamp.
    5. Sequences typically don't result in a physical file. If they were, they would be like a gif/video looping through the originating files.
  2. This proposal is not about whether image-based observations are better than sequence-based observations. The current situation is that both approaches exists (and likely will for a while) and Camtrap DP wants to support both.

  3. The examples show a how the data would look for 3 images, using image-based vs sequence-based observations. In the first 2 images a wild boar (Sus scrofa) can be seen.

Current situation 0

Screenshot 2022-02-15 at 20 10 11
  1. Sequences only consists as identifiers (sequenceID), in both media and observations.
  2. Observations have a sequenceID and mediaID, which are both foreign keys to the media table. Image-based observations need to populate both, sequence-based observations only sequenceID. As a result, joins between observations and media are conditional: you kinda need to know what key to use to make a join that will yield results. That is not great.
  3. Because the join over media is conditional, we added the convenience terms deploymentID and timestamp to observations, so that they can be easily joined with deployments - without having to go over media - to get useful biological data (location, time, species).

Image-based observations

media.csv
mediaID | sequenceID | deploymentID | timestamp           | filePath
------- | ---------- | ------------ | ------------------- | --------
med1    | void_seq1  | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | void_seq1  | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | void_seq1  | dep1         | 2020-01-01T00:00:02 | med3.jpg

observations.csv
observationID | sequenceID | mediaID | deploymentID | timestamp           | observationType | scientificName | count | countNew
------------- | ---------- | ------- | ------------ | ------------------- | --------------- | -------------- | ----- | --------
obs1          | void_seq1  | med1    | dep1         | 2020-01-01T00:00:00 | animal          | Sus scrofa     | 1     | 1
obs2          | void_seq1  | med2    | dep1         | 2020-01-01T00:00:01 | animal          | Sus scrofa     | 1     | 0
obs3          | void_seq1  | med3    | dep1         | 2020-01-01T00:00:02 | blank           | NULL           | NULL  | NULL

Sequence-based observations

media.csv
mediaID | sequenceID | deploymentID | timestamp           | filePath
------- | ---------- | ------------ | ------------------- | --------
med1    | seq1       | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | seq1       | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | seq1       | dep1         | 2020-01-01T00:00:02 | med3.jpg

observations.csv
observationID | sequenceID | mediaID | deploymentID | timestamp           | observationType | scientificName | count | countNew
------------- | ---------- | ------- | ------------ | ------------------- | --------------- | -------------- | ----- | --------
obs1          | seq1       | NULL    | dep1         | 2020-01-01T00:00:00 | animal          | Sus scrofa     | 1     | NULL

Suggested change 1

Screenshot 2022-02-18 at 11 28 02

In media.csv

  1. Sequences are considered media (not unlike videos), they get their own rows in media.csv. The definition of that table becomes something along the lines of:

    Table with media files associated with a deployment (deploymentID). Media files can be captured by the camera trap (images/videos) or created afterwards by grouping files.

  2. Image and video files are still listed in media.csv. They have an optional parentMediaID to associate them with sequences. That allows joins to find the images that belong to a sequence.
  3. Including sequence rows is entirely optional: there is no need to include them if you only have image-based observations. You could, if you want to convey somehow what grouping the system "used", but since they are not used as a basis of observation, you can leave the grouping into "events" entirely up to the user. All the information is there to do it.
  4. filePath and fileMediaType become optional fields. They are typically not populated for sequence rows.

In observations.csv

  1. Observations are only linked via mediaID. That media row can be a single image (image-based observations) or a sequence. This is a huge benefit, as it no longer required conditional joins.
  2. Observations are no longer directly linked to deployments, to make it clear that they are derived from media objects. Since joins with media.csv are no longer conditional, it's quite easy to join observations -> media -> deployments. You do have to download the media.csv to do the join though.
  3. Observations no longer require a timestamp field. That information can be found in media.

Most importantly, we think this model better represents the actual situation with camera traps: deployments → generate media → generate observations

Image-based observations

media.csv
mediaID | parentMediaID | deploymentID | timestamp           | filePath
------- | ------------- | ------------ | ------------------- | --------
med1    | NULL          | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | NULL          | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | NULL          | dep1         | 2020-01-01T00:00:02 | med3.jpg

observations.csv
observationID | mediaID | observationType | scientificName | count | countNew
------------- | ------- | --------------- | -------------- | ----- | --------
obs1          | med1    | animal          | Sus scrofa     | 1     | 1
obs2          | med2    | animal          | Sus scrofa     | 1     | 0
obs3          | med3    | blank           | NULL           | NULL  | NULL

Sequence-based observations

media.csv
mediaID | parentMediaID | deploymentID | timestamp           | filePath
------- | ------------- | ------------ | ------------------- | --------
seq1    | NULL          | dep1         | 2020-01-01T00:00:00 | NULL      <---- NEW ROW
med1    | seq1          | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | seq1          | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | seq1          | dep1         | 2020-01-01T00:00:02 | med3.jpg

observations.csv
observationID | mediaID | observationType | scientificName | count | countNew
------------- | ------- | --------------- | -------------- | ----- | --------
obs1          | seq1    | animal          | Sus scrofa     | 1     | NULL

Suggested change 2 (an less drastic update to the current situation)

This was suggested in https://github.com/tdwg/camtrap-dp/issues/203#issuecomment-1046656754. Comments above that are about suggested change 1 only.

ben-norton commented 2 years ago

I agree.

peterdesmet commented 2 years ago

Implemented at #204. To be discussed.

Open questions and my preference:

yliefting commented 2 years ago

I'm in favor of this change. It simplifies things. Personally I need to get used to term mediaID but if you think of sequences as frames that could just as well have been a video it's easy to understand.

timrobertson100 commented 2 years ago

I also find this intuitive in terms of the class layout and relationships.

Sequences are considered media (not unlike videos), they get their own rows in media.csv

~A video is a piece of media, with a binary serialization in a format while here it's really just a field to group individual media files. Looking at the possible terms the only ones you'd anticipate maybe relevant are the timestamp and the possibly the comments and captureMethod. Would they ever exist on the sequence row or in different ways to the image media rows?~

~What I'm wondering is if having a row for the sequence brings any benefit, say to e.g. keeping the sequenceID column which is very intuitive.~ (answering my own question. It's needed to simplify the observation join)

Out of curiosity - are images ever manipulated, e.g. cropping out a section and creating a new image? If so, the parentMediaID seems very appropriate and intuitive.

I think it might be useful to add a type to Media (Image, SequenceOfImages, Video) to remove any assumptions. At the moment, you need to infer that because parentMediaID=null then it's a sequence, but if people create sub-images (e.g. cropping, adjusting brightness etc) that may not hold true.

peterdesmet commented 2 years ago

are images ever manipulated, e.g. cropping out a section and creating a new image? If so, the parentMediaID seems very appropriate and intuitive.

@timrobertson100 not necessarily to create a new physical medium. But subsections (bounding boxes) of images are quite common by e.g. AI to indicate where in the image it noticed an animal. That info can currently not be captured in Camtrap DP v1 (needs more thought). Options are to represent those as as sub-images (with a parentMediaID), but more likely is adding a bounding box field to the observation.

I think it might be useful to add a type to Media (Image, SequenceOfImages, Video) to remove any assumptions.

I agree. parentMediaID=null is not a good filter, because a dataset with image-based observations (only) would only contain images with parentMediaID=null too. Options:

timrobertson100 commented 2 years ago

Mainly for reasons of keeping things intuitive, and to avoid mixing concepts I'd favor a type (or similarly named) field.

By mixing concepts, I mean that capture is related to what happened in the field to "trigger" the media existing, fileMediaType is about the encoding of the binary stream and sequence is really just a grouping of items largely for data management purposes (i.e. allow you to refer to a grouping of items in an annotation). Those seem like separate concerns to me which warrant their own field.

Aside: this model implies media would only ever exist in a single sequence unless you duplicate media records with e.g. the same filename (meaning observations are based on an image in a particular sequence and not on the image itself). I don't know enough to comment if that is appropriate.

jimcasaer commented 2 years ago

As far as concerns the second remark : that looks right to me : an image only exists in one single sequence - however, the same image can be the source for two different observations;

For me it still is a little bit confusing that, if I get it right, in the new data model in the media.csv there are some records referring to single images and others records referring to sequences that contain images that are listed in the same media.csv -table. It looks to me like two different levels of information are contained within the same table - not being a data scientist this is the first time I encounter this kind of a mixed-levels table in a data model :-)

tucotuco commented 2 years ago

It is a common modeling pattern to include multiple subtypes of an entity within a single table and to distinguish them with a type field to void having to create additional tables or hierarchical structures. Here that pattern seems well justified. Another part of that pattern is to name the type field based on the table it is in and concept it represents so that it can stand alone without context in a data dictionary (a glossary of terms). Based on these practices, I would recommend the term be adopted and that it be called "mediaType".

ben-norton commented 2 years ago

This may a bit overly cautious, but I'd opt for acquisitionType instead of mediaType to avoid confusion/overlap with the common use of mediaType as a reference to the MIME Media Types.

tucotuco commented 2 years ago

@ben-norton I think this probably arises from the media table serving multiple roles for the sake of simplification. I agree that the mediaType should be limited to media types - digital results. I think that still needs to be there. To me the acquisitionType is a statement about the event (something not explicitly modeled by the Camtrap DP structure) that generated the result. In a model that expresses this activity explicitly, I would indeed include something to specify that. In the GBIF publishing model we're doing in parallel, that would be an eventType.

peterdesmet commented 2 years ago

@tucotuco: It is a common modeling pattern to include multiple subtypes of an entity within a single table and to distinguish them with a type field to void having to create additional tables or hierarchical structures.

I'm not sure mixing (sub)types is that common. To me it is the biggest icky factor in an otherwise elegant proposal (cf comments by @jimcasaer @timrobertson100). I'd there for like to suggest an approach that deviates less from the current situation. For clarity, I'm also naming the proposals:

  1. Current situation
  2. Suggested change (with parentMediaID)
  3. Suggested change below

Suggested change 2 (an less drastic update to the current situation)

Screenshot 2022-02-21 at 10 14 26

Image-based observations

media.csv
mediaID | sequenceID | deploymentID | timestamp           | filePath
------- | ---------- | ------------ | ------------------- | --------
med1    | NULL       | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | NULL       | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | NULL       | dep1         | 2020-01-01T00:00:02 | med3.jpg

observations.csv
observationID | mediaID | sequenceID | observationType | scientificName | count | countNew
------------- | ------- | ---------- | --------------- | -------------- | ----- | --------
obs1          | med1    | NULL       | animal          | Sus scrofa     | 1     | 1
obs2          | med2    | NULL       | animal          | Sus scrofa     | 1     | 0
obs3          | med3    | NULL       | blank           | NULL           | NULL  | NULL

Sequence-based observations

media.csv
mediaID | sequenceID | deploymentID | timestamp           | filePath
------- | ---------- | ------------ | ------------------- | --------
med1    | seq1       | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | seq1       | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | seq1       | dep1         | 2020-01-01T00:00:02 | med3.jpg

observations.csv
observationID | mediaID | sequenceID | observationType | scientificName | count | countNew
------------- | ------- |  --------- | --------------- | -------------- | ----- | --------
obs1          | NULL    | seq1       | animal          | Sus scrofa     | 1     | NULL
tucotuco commented 2 years ago

@peterdesmet I understand what you are trying to do, and even why. It only makes me cringe from a database modeling perspective where in SQL databases one tries to achieve the highest reasonable Normal Form (https://en.wikipedia.org/wiki/Database_normalization#Normal_forms) to protect against redesign problems with changes that might come in the future.

In Suggested Change 2 you are treating sequences as properties (albeit properties of two distinct entities), not as identifiers of an entity to use in the role of a key. The reason you can "get away with that" is that sequences have no non-identifying properties. So the thing that worries me (the "cringe factor") is that you are painting yourself into a corner. If you ever do add non-identifying properties to sequences in the future, you will have to repeat that information in media.csv or observations.csv or both, or add a sequence.csv with relationships to media and observations, and thereby change the structure in a way that will break existing implementations. Suggested change 1 doesn't overcome future-proofing sequences either, by the way, it treats them as one of the types of media with no properties of their own.

For demonstration only, a model that would future-proof sequences (and be in 5th normal form - 5NF) would be something like the following:

sequence.csv
sequenceID | deploymentD | starttimestamp
---------- | ----------- | -------------------
seq1       | dep1        | 2020-01-01T00:00:00
seq2       | dep1        | 2020-02-01T00:00:00

media.csv
mediaID | sequenceID | timestamp           | filePath
------- | ---------- | ------------------- | --------
med1    | seq1       | 2020-01-01T00:00:00 | med1.jpg
med2    | seq1       | 2020-01-01T00:00:01 | med2.jpg
med3    | seq1       | 2020-01-01T00:00:02 | med3.jpg

observationID | observationType | scientificName | count | countNew
------------- | --------------- | -------------- | ----- | --------
obs1          | animal          | Sus scrofa     | 1     | 1
obs2          | animal          | Sus scrofa     | 1     | 0
obs3          | blank           | NULL           | NULL  | NULL
obs4          | animal          | Sus scrofa     | 1     | NULL

mediaobservation.csv
mediaID | observationID
------- | -------------
med1    | obs1
med2    | obs2
med3    | obs2

sequenceobservation.csv
sequenceID | observationID
---------- | -------------
seq2       | obs4
jniedballa commented 2 years ago

Commenting here as a relative outsider to the project. Overall I think this goes in the right direction: deployments create media, media lead to observations. In my opinion sequences are an artificial add-on without any real benefits, but I never used it myself and also don't really how sequences are meant to be used in this standard, so I may be missing important points. Below are some general notes, concerns and questions to consider, and a suggestion for a somewhat different database system that may help accomodate sequences and other things. Apologies for a long post ahead.

Conceptual concerns

Practical concerns

I see three possible cases (with their data relationships):

A: easiest option. No sequences needed at all. If for some compatibility reason it is necessary to always have a sequence table, each media item can be considered a separate sequence and data structure would be identical to B (it would be redundant and a bit silly though).

B: can be created automatically from image-based annotation in A using sequence_interval (see below). It would only introduce an intermediate sequence table and sequence IDs in the observation table. If B is created from A, then B still implies A (as long as observations in B retain their mediaID). Not sure if that is relevant.

C: is this even necessary (can media.csv be missing)? Maybe relevant for old data sets?

The only real difference is: A: observations refer to media ID B: sequence table exists. observations refer to sequence ID, sequences refer to media.csv C: observations refer to sequence ID, which directly refers to deployments.

Would it be possible to set a flag in the project metadata as to which case it is (and thus, which key to use)?

Scope for automation?

Videos

The points above are for images only. Video support in this scheme may lead to additional complications:

Suggestion

I suggest having a look at the database structure of digiKam for inspiration. I find it very clear, logical and extensible, but different from the current cameratrap DP scheme. If you have digiKam installed you can open its database in R with:

camtrapR:::accessDigiKamDatabase(db_directory = "C:/Users/YOURUSERNAME/Pictures", 
                              db_filename = "digikam4.db")

In short, it contains 5 items:

$AlbumRoots [1] "id" "label" "status" "type" "identifier" "specificPath"

$Albums [1] "id" "albumRoot" "relativePath" "date" "caption" "collection" "icon"

$Images [1] "id" "album" "name" "status" "category" "modificationDate" "fileSize" "uniqueHash" "manualOrder"

$Tags [1] "id" "pid" "name" "icon" "iconkde"

$ImageTags [1] "imageid" "tagid"

This scheme can be expanded nicely, e.g. a separate table for sequences (which assigns sequences to the file ids in the "Images" table - can maybe be created automatically as mentioned above). This would allow easily gathering of image tags (species IDs etc) and image information (timestamps etc) for sequences.

Future proofing for deep learning

It would also allow easy linking to AI / deep learning methods, e.g. with a separate table containing bounding box coordinates for object detection. This would work both for model training and model deployment, and can maybe be based on the COCO camera traps format. It would also remove the need to crop / duplicate images.

Then there can be another table containing the labels and confidence values for these bounding boxes. For model training this second table only needs one label, for predictions it can either contain the top label and probability only, or top k labels, or all labels with their probabilities.

Also, all these deep learning methods for image classification / object detection that I'm aware of use images, not sequences. Sequences can actually be harmful in this respect, especially for image classification (when the animal walked out of the frame during the sequence, but the entire sequence is labelled as a species). In object detection, bounding boxes for sequences also don't make sense. They need to be image-specific. *

* EDIT: COCO camera trap format allows both image and sequence-specific bounding boxes, which may not be precise at image-level (see link above). I find the statement that 'sequences are the "atom of interest" in most ecological applications' questionable though.

Video annotation at the file level should be no different than image annotation. I don't know how to annotate at the frame level.

peterdesmet commented 2 years ago

Thanks @tucotuco and @jniedballa! I had some time to digest this information and discussed it with @damianooldoni. We think the following suggestion would be a model that answers the issues. It will not solve - but can represent - the fact that some systems make observations at the level of "sequences/groups of images" (which restricts creating smaller events at the analysis stage).

Suggested change 3: 4th table, between observations and media

  1. Add a 4th table that links observations to media. We suggest the name ~evidence~ mediaGroup. For image-based observations, it will contain 1-to-1 relations, for sequence-based observations, it will contain 1-to-many relations.
  2. The mediaGroup table will always be present, so joins can always be made the same way: deployments -> media -> mediaGroup -> observations (+ group by). No conditional joins.
  3. The biggest downside is that this table is superfluous for image-based-observations (because it will only contain 1-to-1). On the production side however, it is not that hard to create this table, since identifiers can be reused, e.g. populating mediaGroupID and mediaID with the same identifiers (see examples below). On the user side, it simplifies joins and allows to use a single model to represent different use cases. It also avoids the "paint yourself into a corner" problem @tucotuco pointed out with the more succinct representation in suggested change 2.
  4. The mediaGroup table can potentially also represent parts of media files, e.g. bounding boxes or durations (see examples below). ~This is why we prefer the name evidence over e.g. mediaGroup.~ That conflicts somewhat with the name mediaGroup, but I find it still a more intuitive name.
  5. Sequences are not considered media files.
  6. The term sequence is avoided altogether, because it has different meanings. Here we use mediaGroup as the group, media file or part of media file that was used as the basis for an observation.
  7. A column level (see below) could be added (with a controlled vocabulary) to more easily filter certain observations. For easier discovery, the metadata term classificationLevel could be updated to contain a list of all the levels a dataset contains.

Example:

  1. obs1, obs2, obs3 are image-based observations. In med3 no animal was seen.
  2. obs4 is a group-based observation. Media files med1, med2, med3 where assessed as a whole (a disadvantage for later analyses, but often occurring).
  3. obs5 is made on a part of med3, i.e. a specific bounding box. It is considered a separate mediaGroup.
  4. obs6 is an observation based on a part of a video, i.e. a specific duration with start and end timestamp.
media.csv
mediaID | deploymentID | timestamp           | filePath
------- | ------------ | ------------------- | --------
med1    | dep1         | 2020-01-01T00:00:00 | med1.jpg
med2    | dep1         | 2020-01-01T00:00:01 | med2.jpg
med3    | dep1         | 2020-01-01T00:00:02 | med3.jpg
med4    | dep1         | 2020-01-04T08:00:00 | med4.mov

mediagroups.csv
mediaGroupID | mediaID | level    | boundingBox        | timeRange
------------ | ------- | -------- | ------------------ | ---------
med1         | med1    | file     |                    |  
med2         | med2    | file     |                    |  
med3         | med3    | file     |                    | 
seq1         | med1    | group    |                    | 
seq1         | med2    | group    |                    | 
seq1         | med3    | group    |                    | 
bbox1        | med1    | bbox     | [x,y,width,height] | 
duration1    | med4    | duration |                    | start/end

observations.csv
observationID | mediaGroupID | observationType | scientificName | count
------------- | ------------ | --------------- | -------------- | -----
obs1          | med1         | animal          | Sus scrofa     | 1
obs2          | med2         | animal          | Sus scrofa     | 1
obs3          | med3         | blank           | NULL           | NULL
obs4          | seq1         | animal          | Sus scrofa     | 1
obs5          | bbox1        | animal          | Sus scrofa     | 1
obs6          | duration1    | animal          | Vulpes vulpes  | 1

@jniedballa sequence_interval is currently saved in the project metadata. But maybe we should allow more flexible ways to indicate how "mediaGroups" were created.

danstowell commented 2 years ago

@peterdesmet as a relative outsider I like the look of this new "suggested change 3" better than previous ones. It seems correct to me that sequences are not considered media files.

Your bounding box example is clear; I can see that the format also allows for an observation which is based on a bounding-box that moves/changes shape over the duration of a sequence (this is one "tricky case" we discuss sometimes). But then would the level be bbox or group? My solution to that would be to forget having bbox as an explicit type: it can be implicit from the fact that the boundingBox column is non-null. (I'm dubious about the need for the level column at all, but I presume you're suggesting it for ease of data consumption.)

peterdesmet commented 2 years ago

I'm dubious about the need for the level column at all, but I presume you're suggesting it for ease of data consumption.

Yes indeed. It doesn’t necessarily need to be there.

peterdesmet commented 2 years ago

Alternative name for evidence: observationUnit.

peterdesmet commented 2 years ago

@danstowell could we consider that 4th table a "region of interest" (Section 7.11 of https://ac.tdwg.org/termlist/)?

Regions of Interest (ROI) designate specific parts of media items.

Could a region of interest also be larger than a single image file?

danstowell commented 2 years ago

@danstowell could we consider that 4th table a "region of interest" (Section 7.11 of https://ac.tdwg.org/termlist/)?

Regions of Interest (ROI) designate specific parts of media items.

Could a region of interest also be larger than a single image file?

We always intended that an ROI could cover multiple frames, but we have not worked out the details. In practice I think the AC definition of ROI is all about a hyper-rectangular box (e.g. imagine a box confined in the x, y, z and time axes), whereas what's nice about your proposal is that an observation is composed of a sequence of different* ROIs, one per frame. A sequence of different ROIs is not a hyper-rectangular box. Thus: I think the 4th table is not equivalent to an ROI.

I would say that your columns timeRange and boundingbox are closely tied to AC's notion of ROI.

danstowell commented 2 years ago

FWIW I'm OK with mediaGroups. (I prefer it over observationUnit)

kbubnicki commented 2 years ago

Hi all and sorry for this late feedback! Great discussion! I have spent some time recent days thinking about the last proposal and have had the meeting with @peterdesmet this morning. Here is the outcome; below you will find two new proposals that (hopefully) still add something to our discussion:

Suggested change 4: 4 tables (similar to the Suggested change 3 with some modifications)

media.csv
| mediaID     | deploymentID | timestamp           | filePath |
|-------------|--------------|---------------------|----------|
| med1        | dep1         | 2020-01-01T00:00:00 | med1.jpg |
| med2        | dep1         | 2020-01-01T00:00:01 | med2.jpg |
| med3        | dep1         | 2020-01-01T00:00:02 | med3.jpg |
| med4        | dep1         | 2020-01-04T08:00:00 | med4.mov |

mediagroups.csv
| mediaGroupID | mediaID |
|--------------|---------|
| med1         | med1    |
| med2         | med2    |
| med3         | med3    |
| med4         | med4    |
| seq1         | med1    |
| seq1         | med2    |
| seq1         | med3    |

observations.csv
| observationID | mediaGroupID | observationLevel | observationType | scientificName | count | individualID | boundingBox                                     | timeRange   |
|---------------|--------------|------------------|-----------------|----------------|-------|--------------|-------------------------------------------------|-------------|
| obs1          | med1         | file             | animal          | Sus scrofa     |     1 |              |                                                 |             |
| obs2          | med2         | file             | animal          | Sus scrofa     |     2 |              | [[x1,y1,width1,height1],[x2,y2,width2,height2]] |             |
| obs2a         | med2         | file             | animal          | Sus scrofa     |     1 | ind1         | [[x1,y1,width1,height1],]                       |             |
| obs2b         | med2         | file             | animal          | Sus scrofa     |     1 | ind2         | [[x2,y2,width2,height2],]                       |             |
| obs3          | med3         | file             | blank           | NULL           |  NULL |              |                                                 |             |
| obs4          | seq1         | sequence         | animal          | Sus scrofa     |     2 |              |                                                 |             |
| obs5          | med4         | file             | animal          | Sus scrofa     |     1 |              | [[x,y,width,height],]                           | start1/end1 |
| obs6          | med4         | file             | animal          | Sus scrofa     |     1 |              | [[x,y,width,height],]                           | start2/end2 |
| obs7          | med4         | file             | animal          | Sus scrofa     |     1 |              |                                                 | start/end   |
| obs8          | med4         | file             | animal          | Sus scrofa     |     1 |              |                                                 |             |
  1. We keep the mediagroups.csv table. The advantages are that we can mix sequence- and file-based observations in one package and that this table can be easily extended when needed in the future.

  2. The attributes boundingBox (spatial window; now 2D array) and timeRange (temporal window) are moved to the observations.csv table. I find both attributes more related to the observation than media-grouping process. Think about two-stage observation process: i. animals (or other objects as humans, vehicles etc) detection in space (boundingBox) and/or time (timeRange) -> ii. classification (observationType, scientificName etc). The advantage of this change is also that the mediagroups.csv table will be more "compressed" as it will not have rows for each single detected object (bounding box) and/or video-frame e.g. imagine 10k videos * 60 1s frames classified by AI and each containing from 1-10 wild boar. In the previous proposal both mediagroups.csv & observations.csv tables would quickly grow enormously in similar scenarios.

  3. In the observations.csv table there is a new attribute observationLevel - this is just for user's convenience (e.g. quick selection of file-based observations only).

  4. This proposal (as well as the next one) supports the following cases: a) file-level observations -> obs1 (image) and obs8 (video) b) file-level & object-based image observations -> obs2 (multiple objects of the same type on 1 image), obs2a & obs2b (different objects on 1 image, separate rows), c) file-level & object-based video observations -> obs5 & obs6 (same or different objects detected on separate video frames; both spatial and temporal window defined), obs7 (only temporal window of an observation defined); please note that a similar logic can be applied to audio files d) sequence-based observations -> obs4

  5. Maybe a trivial comment, but an interesting side-effect of having mediaGroupID for file-based observations is that one can define mediaGroupID for pairs of images from 2-cameras deployments e.g. when monitoring lynx, tigers or some other "marked" animal species, where both cameras typically record media of the same individual (e.g. left & right side of an animal passing a forest path):

| mediaID     | deploymentID | timestamp           | filePath |
|-------------|--------------|---------------------|----------|
| med1a       | dep1a        | 2020-01-01T00:00:00 | med1.jpg |
| med1b       | dep1b        | 2020-01-01T00:00:00 | med1.jpg |

| mediaGroupID | mediaID |
|--------------|---------|
| med1         | med1a   |
| med1         | med1b   |

Suggested change 5: 3 tables (similar to the original model with some modifications; developed interactively during the meeting with Peter)

Sequence-based example

media.csv
| mediaID | mediaGroupID | deploymentID | timestamp           | filePath |
|---------|--------------|--------------|---------------------|----------|
| med1    | seq1         | dep1         | 2020-01-01T00:00:00 | med1.jpg |
| med2    | seq1         | dep1         | 2020-01-01T00:00:01 | med2.jpg |
| med3    | seq1         | dep1         | 2020-01-01T00:00:02 | med3.jpg |
| med4    | seq2         | dep1         | 2020-01-04T08:00:00 | med4.mov |

observations.csv
| observationID | mediaGroupID | observationType | scientificName | count | individualID | boundingBox | timeRange |
|---------------|--------------|-----------------|----------------|-------|--------------|-------------|-----------|
| obs1          | seq1         | animal          | Sus scrofa     |     2 |              |             |           |
| obs2          | seq2         | animal          | Sus scrofa     |     1 |              |             |           |

File-based example

media.csv
| mediaID | mediaGroupID | deploymentID | timestamp           | filePath |
|---------|--------------|--------------|---------------------|----------|
| med1    | med1         | dep1         | 2020-01-01T00:00:00 | med1.jpg |
| med2    | med2         | dep1         | 2020-01-01T00:00:01 | med2.jpg |
| med3    | med3         | dep1         | 2020-01-01T00:00:02 | med3.jpg |
| med4    | med4         | dep1         | 2020-01-04T08:00:00 | med4.mov |

observations.csv
| observationID | mediaGroupID | observationType | scientificName | count | individualID | boundingBox                                     | timeRange   |
|---------------|--------------|-----------------|----------------|-------|--------------|-------------------------------------------------|-------------|
| obs1          | med1         | animal          | Sus scrofa     |     1 |              | [[x,y,width,height],]                           |             |
| obs2          | med2         | animal          | Sus scrofa     |     2 |              | [[x1,y1,width1,height1],[x2,y2,width2,height2]] |             |
| obs2a         | med2         | animal          | Sus scrofa     |     1 | ind1         | [[x1,y1,width1,height1],]                       |             |
| obs2b         | med2         | animal          | Sus scrofa     |     1 | ind2         | [[x2,y2,width2,height2],]                       |             |
| obs3          | med3         | blank           | NULL           |  NULL |              |                                                 |             |
| obs4          | med4         | animal          | Sus scrofa     |     1 |              | [[x,y,width,height],]                           | start1/end1 |
| obs5          | med4         | animal          | Sus scrofa     |     1 |              | [[x,y,width,height],]                           | start2/end2 |
| obs6          | med4         | animal          | Sus scrofa     |     1 |              |                                                 | start/end   |
| obs7          | med4         | animal          | Sus scrofa     |     1 |              |                                                 |             |

1) There is no mediagroups.csv table. Basically, we go back to the original model (v0.1.7, https://github.com/tdwg/camtrap-dp/tree/0.1.7) but there are some critical differences. 2) There are new attributes boundingBox and timeRange in the observations.csv table (described above). 3) There is no deploymentID in the observations.csv table which makes the entire model more linear. 4) There is a new attribute mediaGroupID in the media.csv table. 5) The Camtrap DP packages should be either file-based or sequence-based (as indicated in the package-level metadata). It is not necessarily a limitation of this proposal; Camtrap DP has been designed as a standard for data exchange/publishing at a level of a single camera trapping project where typically people do not mix both annotation approaches. 6) The biggest advantage of this proposal I see it is the simplicity of the model (no 4th table) and its human-user-friendliness. Also the flexibility is still there, I believe most of the use-cases (as listed above) are covered with this design.

@peterdesmet Please edit this comment if you find that I have missed sth (or if sth is not clear enough)!

Best, K

peterdesmet commented 2 years ago

Thanks @kbubnicki, great summary of our discussion. I just want to add that in suggestion 4 the number of records in mediagroups is always going to be the same as there are records in media (given you never mix a file and sequence based approach, which is a good limitation in my opinion). Knowing that, we can simplify things, which resulted in suggestion 5:

I’m all in favour of suggestion 5. Feedback welcome, especially from those that commented already @tucotuco @danstowell @jniedballa …

danstowell commented 2 years ago

I'm not so excited by the idea of moving the bboxes into the observations table, for the reason that it then fails to support one of the important use cases we have here: objects detected in image-sequences, with a different bbox in each image, and then one overall identification applied to that sequence of bboxes. This is a real example from our insect-cameras, and probably occurs in plenty of other systems with bboxes tracked over time.

A workaround would be to repeat multiple rows in observations for each frame in this sequence, but that's tricky because we then wouldn't want users to sum the count column and over-count.

I can't comment on the file-size implications.

You write "Think about two-stage observation process" (detect, then identify) but to me that doesn't motivate the change.

A separate and minor comment: I suggest that the arrays-of-bboxes format might be a bit troublesome for data consumers - it's starting to look like structured data inside a CSV cell.

tucotuco commented 2 years ago

I don't have a lot of time to comment in detail (i.e., offer alternative solutions) right now.

tucotuco commented 2 years ago

Just had a chat with @peterdesmet about my most recent comments. If it will be a rule that data sets must be either of observations from media or observations from mediagroups, but never both, then my second concern doesn't really apply. Similarly, if data sets are never mixed, then the mediaID could act as a mediaGroupID for the sake of practicality (not having to mint another identifier). I cringe in terms of semantics (it was rejected that mediagroups were just a type of media), but that shouldn't matter until/unless these data start to be linked semantically.

ben-norton commented 2 years ago

I think the stipulation that a dataset is either sequence-based (observation - mediaGroup) or image-based (observation - media) is a fair stipulation that solves a number of problems. Since most datasets don't utilize multiple observation techniques (e.g., expert identification and computer vision model), adoption shouldn't be overly problematic for most providers. Several projects arrived at this same conclusion (after months of debate). To my knowledge, field testing this solution hasn't resulted in any significant problems. One important note. Aside from the logistics and organization of the model, the impact of this resides in the analysis. To combine sequence and image based observations for modelling purposes, the calculation technique for the number of unique individuals over a given period of time is crtical. The irony is that the image-based observations will be grouped over a specific time-interval for modeling purposes. In other words, its all sequences in the end.

kbubnicki commented 1 year ago

A workaround would be to repeat multiple rows in observations for each frame in this sequence, but that's tricky because we then wouldn't want users to sum the count column and over-count.

@danstowell Thats why we have this field in Camtrap DP: https://tdwg.github.io/camtrap-dp/data/#observations.countnew

We use this field when annotating our camera trap records to track information about a "real" group size of animals staying for a while in front of a camera trap (or just passing it by). This applies to image-level annotation and prevents over-counting when aggregating data for analysis.

peterdesmet commented 1 year ago

Hi all, I picked up this dormant issue with John Wieczorek (@tucotuco) in an effort to reach a recommendation. We mainly discussed the pros and cons of two of the main proposals suggested above:

I also compared how one would query data using either model, at https://github.com/peterdesmet/camtrap-dp-query-test (repository likely to be deleted at some point).

Recommendation

Our conclusion is that the mediaGroupID approach (Suggested change 5):

And thus a reasonable simplification of the model. It is an improvement over the current model (where information is needlessly repeated) and plays well with the unified common model. It allows to express bounding boxes (at the level of observations). If I read the comments above, this proposal is something that @kbubnicki @ben-norton @jniedballa and now @tucotuco could get on board with. I will create a pull request with the suggested changes. Thank you all for your patience and for participating in this discussion!

@danstowell you liked the possibilities of the 4th table approach - maybe especially as a model for Audubon Core - but for Camtrap DP we believe it would needlessly complicate things as an exchange format. Hope you understand.

Rename to eventID

One change we suggest is to rename mediaGroupID to eventID. As in, this is the event the data publisher choose to group their observations by. For image-based (recommended approach), the selected events are the duration of the media file (image or video), for sequence-based, the selected events are sequences. In software you can always create larger events (by grouping), but never smaller events.

Image-based (if we reuse identifiers):

media.csv
mediaID | eventID
------- | -------
med1    | med1
med2    | med2

observations.csv
observationID | eventID
------------- | -------
obs1          | med1
obs2          | med2

Sequence-based:

media.csv
mediaID | eventID
------- | -------
med1    | seq1
med2    | seq1

observations.csv
observationID | eventID
------------- | -------
obs1          | seq1
peterdesmet commented 1 year ago

Quick update: we are still working on restructuring the model. The current approach is to abandon trying to capture image vs event-based annotation in a single observations table, but to work with an eventobservations and imageobservations table (in addition to a media and deployments table).

The main advantage is clarity: easier for the user to understand and easier for us to document. Additionally, it allows to export both approaches in a single package, e.g. AI image-level observations that underpin event-level consensus observations.

Screenshot 2022-11-17 at 14 39 29

We are currently testing this approach and hammering out the details.

peterdesmet commented 1 year ago

The suggested change (splitting the observation table) has been implemented in #289. All who participated here are welcome to review the changes.

peterdesmet commented 1 year ago

Fixed in Camtrap DP 0.6 #297.

ben-norton commented 1 year ago

Congrats. That's a very challenging task.