tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
44 stars 5 forks source link

Handling of multi-camera deployments #328

Closed matobler closed 11 months ago

matobler commented 1 year ago

Going over the updated schema I realized that there is currently no clear way of handing multi-camera deployments (e.g. paired camera traps which are being widely used). Let's assume we have a camera location where we have two cameras facing each other. Each camera is programmed to take 3 photos and 1 video. We want an event based classification where all photos and videos of an animal passing between the two camera are classified as a single event (as required for most ecological analyses). Since a deployment is by definition a single camera, an "event" is defined as an observation and an observation is linked to a deployment by a single deploymentID, there is no way to link a single event to more than one deployment. One could use locationID (not a mandatory field) or coordinates to link deployments, but that is more of a hack than a proper way of doing things. In the case of locationID this will fail if locationID was not defined (it is not a required field). Using the coordinates those would have to be identical for both cameras which can’t be enforced.

Ideally locationID would be a required field for all datasets and the deployment to observation relationship would be a many:many relationship (would require an extra table, so not ideal). An alternative (and more logical to me) would be to link media to deployment and observation to media, but it would still require a many:many relationship if observations can be either event or media based (event based: each observation can apply to multiple media files, media bases: each media can have multiple observations such as bounding boxes).

This is relevant to discussions in #314 and #203.

Requiring a locationID as that can also link together multiple deployments at the same location (e.g. changing batteries and SD card multiple times during a survey which each time can create a new deployment).

I am interested in hearing your thoughts on this. Given the large number of multi-camera datasets that exist this is something that needs to be resolved as ignoring it could have significant implications for data analysis (e.g. this could double the number of detections or independent locations that are being used).

peterdesmet commented 1 year ago

Thanks for bringing this up.

  1. You are correct that two paired cameras would be two deployments and that an observation (media or event) is currently scoped to a single deployment.
  2. This would indeed be solved (your alternative) by allowing an observation to be directly linked to multiple media (i.e. a collection of media, not necessarily scoped to a single deployment). But as you write, that would require an extra table, complicating many other use cases.
  3. So, as you suggest, the more pragmatic approach would be to link the two deployments as related, e.g. with locationID.

Given those limitations, would you then suggest to only have 1 observation for a pair or would you duplicate for each deployment?

# deployments
dep1 | location1 | camera1 | pair:1
dep2 | location1 | camera2 | pair:1
dep3 | location1 | camera1 | pair:2
dep4 | location1 | camera2 | pair:2

# observations
obs1 | dep1 | species x
obs2 | dep2 | same information as for obs 1 # Would you have this observation or not?

Requiring locationID might help for paired deployments, but the cardinality of the locationID is more important, i.e. people should be guided in reusing the same locationID for those paired deployments. Note also, that locationID is linked to the location and if the location of the cameras doesn't change between deployments, there might potentially be more than 2 deployments with a shared locationID (see example above). A better field might therefore be deploymentGroups, where you can specifically scope to just 2 deployments with a shared value (e.g. pair:1 in the example above).

I think my advise would therefore be to use deploymentGroups for paired cameras or to add a dedicated pairID to deployments in Camtrap DP (since this is such a common use case). I'm not sure what to advise on the observations (just one per pair or duplicate the info).

kbubnicki commented 1 year ago

This is indeed a very good point @matobler but I think that the current data model supports this particular use case, i.e., multi-camera deployments. Please consider the following example of the observations.csv table:

observationID deploymentID mediaID eventID observationLevel start end scientificName count
obs1 dep1 med1 event1 media 2020-08-02T05:00:14Z 2020-08-02T05:00:14Z Panthera onca 2
obs2 dep1 med2 event1 media 2020-08-02T05:00:16Z 2020-08-02T05:00:16Z Panthera onca 1
obs3 dep2 med3 event1 media 2020-08-02T05:00:14Z 2020-08-02T05:00:14Z Panthera onca 2
obs4 dep2 med4 event1 media 2020-08-02T05:00:16Z 2020-08-02T05:00:16Z Panthera onca 1
obs5 dep1 NULL event1 event 2020-08-02T05:00:14Z 2020-08-02T05:00:16Z Panthera onca 2
obs6 dep2 NULL event1 event 2020-08-02T05:00:14Z 2020-08-02T05:00:16Z Panthera onca 2

In this example, we have two deployments (dep1 and dep2) with one camera each. We know that cameras are paired, observing exactly the same location at exactly the same time. Both cameras observed 2 jaguars passing by. To indicate that two derived event-based observations (ObservationLevel == event), namely obs5 and obs6, are linked to the same (ecological) event we can use the eventID field (see https://tdwg.github.io/camtrap-dp/data/#observations.eventID).

To prevent over-counting of event-based observations from paired cameras, users should aggregate data at the eventID level using the maximum function (instead of the sum) before running any further ecological analysis/aggregations. The maximum function should be used as one camera can potentially miss some individuals (that is why obs5 and obs6 are not necessarily duplicates) but two cameras observing the same objects at the same time and place obviously cannot observe more than the maximum.

I agree that it still would be useful to mark paired cameras in the deployments.csv table using the deploymentGroups field, as suggested by @peterdesmet above.

What do you think? @matobler @peterdesmet

peterdesmet commented 1 year ago

Thanks for the example @kbubnicki! This answers my question on what to advise regarding observations ("just one per pair or duplicate the info?"): Provide two observations (one per deployment), but link with eventID. They are not necessarily duplicates.

My conclusion is:

  1. Observations can have the same eventID across deployments, tying things together.
  2. locationID is not ideal to pair deployments, since its cardinality is higher (see https://github.com/tdwg/camtrap-dp/issues/328#issuecomment-1594307173). There is therefore no need to make this a required field.
  3. The fact that deployments are paired can be indicated in the deployments table. deploymentsGroups or deploymentTags would allow to do that, but are a bit annoying to parse. If paired cameras are a common use case, then I can see a case for adding a dedicated field for that in the deployments table.
matobler commented 1 year ago

Thanks @kbubnicki for the example. I mistakenly assumed (based on #314) that eventID was being removed from the Observations table, but it was only removed from the Media table, which does not affect this use case. A few additional thoughts:

  1. eventID works well for identifying events across different media and deployments. It will store redundant information (one line of data for each deployment) but that is acceptable since the goal here is not data management but rather data exchange. Even with the existence of eventID though, I would be hesitant to use that as the sole way of tying paired cameras together. deploymentGroup or deploymentTags could be used for that purpose, but since those are not standardized fields, one needs additional information to interpret the values, making automated data integration (for import or analysis) impossible. The examples given in the documentation are "season:winter 2020 | grid:A1" which refers to a survey season or a geographic areas throughout which certain cameras were deployed. So, in order to use those fields to indicate camera pairs, that would need to be indicated (in a standardized way) somewhere in the metadata else automated code could pair up all the camera in the same geographic area or survey.
  2. The example from @peterdesmet in the first post actually makes a lot of sense to me. Yes, there can be multiple deployments at the same location (see below). In my view a locationID should directly translate to a point defined by lat/long coordinates (it seems that the Dublin core definition of location can be much broader). 'locationID' will solve two things, tying together paired cameras trap as well as consecutive deployments (important for example for occupancy studies). In the example below we have a camera trap survey starting at 2022-1-1 and ending 2022-5-31. For location1 (single camera trap site with paired cameras) we deploy camera1 and camera2 on 2022-1-1. Camera1 runs out of batteries on 2022-3-5 and we check the camera and replace the batteries 2022-3-15. Camera2 completely fails on 2022-4-8 and we replace it with camera3 on 2022-4-15. An event would be defined as either one of the two cameras or both taking a picture of an animal. For many analyses the above scenario involving four deployments and three different cameras would be handled exactly the same as if there were a single camera continuously active from 2022-1-1 until 2022-5-31 at location1 (although detection probabilities would be higher for paired cameras). This easily expands to setups with more than 2 cameras deployed at the same time at the same location.
  3. There is no reason that the locationID can't be re-used across surveys. If we have repeated surveys using the same camera trap sites the locationID should be re-used in my opinion as shown below. This would be crucial for analyses such as dynamic occupancy models where e need to know which data come from the same locations (coordinates could be used for that of course, assuming they are exactly identical).
#deployments
dep1 | location1 | camera1 | 2022-1-1  | 2022-3-5   #2022 survey, paired cameras
dep2 | location1 | camera2 | 2022-1-1  | 2022-4-8
dep3 | location1 | camera1 | 2022-3-15 | 2022-5-31  #batteries for camera 1 were replaced
dep4 | location1 | camera3 | 2022-4-15 | 2022-5-31  #camera2 failed and was replaced by camera3

dep5 | location1 | camera4 | 2023-1-1  |  2023-5-31   #2023 survey, single camera
peterdesmet commented 1 year ago

Hi @matobler, looks like we are in agreement.

Even with the existence of eventID though, I would be hesitant to use that as the sole way of tying paired cameras together.

Indeed, it only ties observations within the same event together. Those events don't necessarily indicate paired cameras, so as you write, you need more metadata in the deployment table for that.

The examples given in the documentation are "season:winter 2020 | grid:A1" which refers to a survey season or a geographic areas throughout which certain cameras were deployed. So, in order to use those fields to indicate camera pairs, that would need to be indicated (in a standardized way) somewhere in the metadata else automated code could pair up all the camera in the same geographic area or survey.

Indeed. The purpose of deploymentGroups is to offer flexibility, but at the detriment of being clear. For example, it could be used in a particular way for a study (or collective of studies) to support a specific analysis, but that study (group) then needs clear rules on how to populate it as well as clarify it in metadata. The same would be true for paired deployments, but it's a much more common use case, so a dedicated field might be good.

In my view a locationID should directly translate to a point defined by lat/long coordinates

Yes! And I completely agree with the rest of your text and example. That is how locationID is intended. In my example (on the cardinality of locationID) I was only trying to indicate that a locationID is not necessarily only shared by two physical cameras deployed at the same time. It can also be shared by deployments of one cameras over time. If that doesn't affect analysis of paired cameras, then I think this is a good field to use.


Given all that, do you still see a benefit of having a pairID field in the deployments table? Or is locationID sufficient, in which case we could update the definition to clarify that.

ben-norton commented 1 year ago

I recommend against using locationID. A paired deployment is a specific grouping of deployments based a design decision, not a location. They happen to be at the same location, but that's not the reason they are paired, nor does it provide any context to the design of the deployment. Using locationID is convenient and pragmatism is important, but I hesitate when that factor comes at the expense of important context such as the case here. Paired deployment are the most common use case, but there are many deployment group designs. These are thoughtful and intentional designs to answer a specific research question. This context should be preserved in some way since it has an significant impact on analysis. In regards to pairID, that would solve this specific use case and it is the most common but the scope is very limited. You could add a validation that verifies that each pairID is associated with two and only two deployments, but that's cumbersome. If the field is used for any purpose other than a paired deployment, you lose context resulting in an unreliable data point. Peter I think you know where I am going with this. :) I think another table is the only viable way to handle them. It also covers every use case both now and in the future.

peterdesmet commented 11 months ago

I think this use case has been mostly resolved/answered. No changes to the standard are required.