Closed matobler closed 11 months ago
Thanks for bringing this up.
locationID
.Given those limitations, would you then suggest to only have 1 observation for a pair or would you duplicate for each deployment?
# deployments
dep1 | location1 | camera1 | pair:1
dep2 | location1 | camera2 | pair:1
dep3 | location1 | camera1 | pair:2
dep4 | location1 | camera2 | pair:2
# observations
obs1 | dep1 | species x
obs2 | dep2 | same information as for obs 1 # Would you have this observation or not?
Requiring locationID
might help for paired deployments, but the cardinality of the locationID
is more important, i.e. people should be guided in reusing the same locationID
for those paired deployments. Note also, that locationID
is linked to the location and if the location of the cameras doesn't change between deployments, there might potentially be more than 2 deployments with a shared locationID
(see example above). A better field might therefore be deploymentGroups
, where you can specifically scope to just 2 deployments with a shared value (e.g. pair:1
in the example above).
I think my advise would therefore be to use deploymentGroups
for paired cameras or to add a dedicated pairID
to deployments in Camtrap DP (since this is such a common use case). I'm not sure what to advise on the observations (just one per pair or duplicate the info).
This is indeed a very good point @matobler but I think that the current data model supports this particular use case, i.e., multi-camera deployments. Please consider the following example of the observations.csv
table:
observationID | deploymentID | mediaID | eventID | observationLevel | start | end | scientificName | count |
---|---|---|---|---|---|---|---|---|
obs1 | dep1 | med1 | event1 | media | 2020-08-02T05:00:14Z | 2020-08-02T05:00:14Z | Panthera onca | 2 |
obs2 | dep1 | med2 | event1 | media | 2020-08-02T05:00:16Z | 2020-08-02T05:00:16Z | Panthera onca | 1 |
obs3 | dep2 | med3 | event1 | media | 2020-08-02T05:00:14Z | 2020-08-02T05:00:14Z | Panthera onca | 2 |
obs4 | dep2 | med4 | event1 | media | 2020-08-02T05:00:16Z | 2020-08-02T05:00:16Z | Panthera onca | 1 |
obs5 | dep1 | NULL | event1 | event | 2020-08-02T05:00:14Z | 2020-08-02T05:00:16Z | Panthera onca | 2 |
obs6 | dep2 | NULL | event1 | event | 2020-08-02T05:00:14Z | 2020-08-02T05:00:16Z | Panthera onca | 2 |
In this example, we have two deployments (dep1
and dep2
) with one camera each. We know that cameras are paired, observing exactly the same location at exactly the same time. Both cameras observed 2 jaguars passing by. To indicate that two derived event-based observations (ObservationLevel
== event
), namely obs5
and obs6
, are linked to the same (ecological) event we can use the eventID
field (see https://tdwg.github.io/camtrap-dp/data/#observations.eventID).
To prevent over-counting of event-based observations from paired cameras, users should aggregate data at the eventID
level using the maximum function (instead of the sum) before running any further ecological analysis/aggregations. The maximum function should be used as one camera can potentially miss some individuals (that is why obs5
and obs6
are not necessarily duplicates) but two cameras observing the same objects at the same time and place obviously cannot observe more than the maximum.
I agree that it still would be useful to mark paired cameras in the deployments.csv
table using the deploymentGroups
field, as suggested by @peterdesmet above.
What do you think? @matobler @peterdesmet
Thanks for the example @kbubnicki! This answers my question on what to advise regarding observations ("just one per pair or duplicate the info?"): Provide two observations (one per deployment), but link with eventID
. They are not necessarily duplicates.
My conclusion is:
eventID
across deployments, tying things together.locationID
is not ideal to pair deployments, since its cardinality is higher (see https://github.com/tdwg/camtrap-dp/issues/328#issuecomment-1594307173). There is therefore no need to make this a required field.deploymentsGroups
or deploymentTags
would allow to do that, but are a bit annoying to parse. If paired cameras are a common use case, then I can see a case for adding a dedicated field for that in the deployments table.Thanks @kbubnicki for the example. I mistakenly assumed (based on #314) that eventID
was being removed from the Observations table, but it was only removed from the Media table, which does not affect this use case. A few additional thoughts:
eventID
works well for identifying events across different media and deployments. It will store redundant information (one line of data for each deployment) but that is acceptable since the goal here is not data management but rather data exchange. Even with the existence of eventID
though, I would be hesitant to use that as the sole way of tying paired cameras together. deploymentGroup
or deploymentTags
could be used for that purpose, but since those are not standardized fields, one needs additional information to interpret the values, making automated data integration (for import or analysis) impossible. The examples given in the documentation are "season:winter 2020 | grid:A1" which refers to a survey season or a geographic areas throughout which certain cameras were deployed. So, in order to use those fields to indicate camera pairs, that would need to be indicated (in a standardized way) somewhere in the metadata else automated code could pair up all the camera in the same geographic area or survey. locationID
should directly translate to a point defined by lat/long coordinates (it seems that the Dublin core definition of location can be much broader). 'locationID' will solve two things, tying together paired cameras trap as well as consecutive deployments (important for example for occupancy studies). In the example below we have a camera trap survey starting at 2022-1-1 and ending 2022-5-31. For location1 (single camera trap site with paired cameras) we deploy camera1 and camera2 on 2022-1-1. Camera1 runs out of batteries on 2022-3-5 and we check the camera and replace the batteries 2022-3-15. Camera2 completely fails on 2022-4-8 and we replace it with camera3 on 2022-4-15. An event would be defined as either one of the two cameras or both taking a picture of an animal. For many analyses the above scenario involving four deployments and three different cameras would be handled exactly the same as if there were a single camera continuously active from 2022-1-1 until 2022-5-31 at location1 (although detection probabilities would be higher for paired cameras). This easily expands to setups with more than 2 cameras deployed at the same time at the same location.locationID
can't be re-used across surveys. If we have repeated surveys using the same camera trap sites the locationID
should be re-used in my opinion as shown below. This would be crucial for analyses such as dynamic occupancy models where e need to know which data come from the same locations (coordinates could be used for that of course, assuming they are exactly identical). #deployments
dep1 | location1 | camera1 | 2022-1-1 | 2022-3-5 #2022 survey, paired cameras
dep2 | location1 | camera2 | 2022-1-1 | 2022-4-8
dep3 | location1 | camera1 | 2022-3-15 | 2022-5-31 #batteries for camera 1 were replaced
dep4 | location1 | camera3 | 2022-4-15 | 2022-5-31 #camera2 failed and was replaced by camera3
dep5 | location1 | camera4 | 2023-1-1 | 2023-5-31 #2023 survey, single camera
Hi @matobler, looks like we are in agreement.
Even with the existence of eventID though, I would be hesitant to use that as the sole way of tying paired cameras together.
Indeed, it only ties observations within the same event together. Those events don't necessarily indicate paired cameras, so as you write, you need more metadata in the deployment table for that.
The examples given in the documentation are "season:winter 2020 | grid:A1" which refers to a survey season or a geographic areas throughout which certain cameras were deployed. So, in order to use those fields to indicate camera pairs, that would need to be indicated (in a standardized way) somewhere in the metadata else automated code could pair up all the camera in the same geographic area or survey.
Indeed. The purpose of deploymentGroups
is to offer flexibility, but at the detriment of being clear. For example, it could be used in a particular way for a study (or collective of studies) to support a specific analysis, but that study (group) then needs clear rules on how to populate it as well as clarify it in metadata. The same would be true for paired deployments, but it's a much more common use case, so a dedicated field might be good.
In my view a
locationID
should directly translate to a point defined by lat/long coordinates
Yes! And I completely agree with the rest of your text and example. That is how locationID
is intended. In my example (on the cardinality of locationID
) I was only trying to indicate that a locationID
is not necessarily only shared by two physical cameras deployed at the same time. It can also be shared by deployments of one cameras over time. If that doesn't affect analysis of paired cameras, then I think this is a good field to use.
Given all that, do you still see a benefit of having a pairID
field in the deployments table? Or is locationID
sufficient, in which case we could update the definition to clarify that.
I recommend against using locationID. A paired deployment is a specific grouping of deployments based a design decision, not a location. They happen to be at the same location, but that's not the reason they are paired, nor does it provide any context to the design of the deployment. Using locationID is convenient and pragmatism is important, but I hesitate when that factor comes at the expense of important context such as the case here. Paired deployment are the most common use case, but there are many deployment group designs. These are thoughtful and intentional designs to answer a specific research question. This context should be preserved in some way since it has an significant impact on analysis. In regards to pairID, that would solve this specific use case and it is the most common but the scope is very limited. You could add a validation that verifies that each pairID is associated with two and only two deployments, but that's cumbersome. If the field is used for any purpose other than a paired deployment, you lose context resulting in an unreliable data point. Peter I think you know where I am going with this. :) I think another table is the only viable way to handle them. It also covers every use case both now and in the future.
I think this use case has been mostly resolved/answered. No changes to the standard are required.
Going over the updated schema I realized that there is currently no clear way of handing multi-camera deployments (e.g. paired camera traps which are being widely used). Let's assume we have a camera location where we have two cameras facing each other. Each camera is programmed to take 3 photos and 1 video. We want an event based classification where all photos and videos of an animal passing between the two camera are classified as a single event (as required for most ecological analyses). Since a deployment is by definition a single camera, an "event" is defined as an observation and an observation is linked to a deployment by a single deploymentID, there is no way to link a single event to more than one deployment. One could use locationID (not a mandatory field) or coordinates to link deployments, but that is more of a hack than a proper way of doing things. In the case of locationID this will fail if locationID was not defined (it is not a required field). Using the coordinates those would have to be identical for both cameras which can’t be enforced.
Ideally locationID would be a required field for all datasets and the deployment to observation relationship would be a many:many relationship (would require an extra table, so not ideal). An alternative (and more logical to me) would be to link media to deployment and observation to media, but it would still require a many:many relationship if observations can be either event or media based (event based: each observation can apply to multiple media files, media bases: each media can have multiple observations such as bounding boxes).
This is relevant to discussions in #314 and #203.
Requiring a locationID as that can also link together multiple deployments at the same location (e.g. changing batteries and SD card multiple times during a survey which each time can create a new deployment).
I am interested in hearing your thoughts on this. Given the large number of multi-camera datasets that exist this is something that needs to be resolved as ignoring it could have significant implications for data analysis (e.g. this could double the number of detections or independent locations that are being used).