inbo / camtrapdp

R package to read and manipulate Camera Trap Data Packages (Camtrap DP)
https://inbo.github.io/camtrapdp/
Other
5 stars 0 forks source link

Mutating deployment duration #107

Open sannegovaert opened 4 months ago

sannegovaert commented 4 months ago

A part of a deployment can be invalid. E.g. a storm happens and the tree with the camera trap falls. All observations after this event are invalid. This has implications for:

There are two options:

User case:

library(camtrapdp)

raw_data <- example_dataset()
#> ✔ Updating temporal and spatial scopes in metadata based on data.
deploymentID_storm <- "62c200a9"
new_deploymentEnd <- as.POSIXct("2021-04-02 17:31:00", tz = "Europe/Brussels")

# inspect raw data
deployments(raw_data) %>% 
  dplyr::select(deploymentID, deploymentStart, deploymentEnd)
#> # A tibble: 4 × 3
#>   deploymentID deploymentStart     deploymentEnd      
#>   <chr>        <dttm>              <dttm>             
#> 1 00a2c20d     2020-05-30 02:57:37 2020-07-01 09:41:41
#> 2 29b7d356     2020-07-29 05:29:41 2020-08-08 04:20:40
#> 3 577b543a     2020-06-19 21:00:00 2020-06-28 23:33:22
#> 4 62c200a9     2021-03-27 20:38:18 2021-04-18 21:25:00
observations(raw_data) %>% 
  dplyr::filter(deploymentID == deploymentID_storm) %>% 
  dplyr::summarise(last_eventEnd = max(eventEnd))
#> # A tibble: 1 × 1
#>   last_eventEnd      
#>   <dttm>             
#> 1 2021-04-18 21:25:00
media(raw_data) %>% 
  dplyr::filter(deploymentID == deploymentID_storm) %>% 
  dplyr::summarise(last_timestamp = max(timestamp))
#> # A tibble: 1 × 1
#>   last_timestamp     
#>   <dttm>             
#> 1 2021-04-18 21:25:00

# mutate deploymentEnd
clean_data <- raw_data
camtrapdp::deployments(clean_data) <-
  camtrapdp::deployments(clean_data) %>%
  dplyr::mutate(
    deploymentEnd =
      as.POSIXct(
        dplyr::if_else(
          deploymentID == deploymentID_storm,
          new_deploymentEnd,
          deploymentEnd
        )
      )
  )

# Identify observations and media to remove
to_remove <-
  clean_data %>%
  camtrapdp::filter_observations(deploymentID == deploymentID_storm, eventEnd >= new_deploymentEnd)
#> Warning: There was 1 warning in `dplyr::filter()`.
#> ℹ In argument: `eventEnd >= new_deploymentEnd`.
#> Caused by warning in `.check_tzones()`:
#> ! 'tzone' attributes are inconsistent

observations_to_remove <- observations(to_remove)
media_to_remove <- media(to_remove)

# Update observations and media
observations(clean_data) <- 
  observations(clean_data) %>% 
  dplyr::anti_join(observations_to_remove)
#> Joining with `by = join_by(observationID, deploymentID, mediaID, eventID,
#> eventStart, eventEnd, observationLevel, observationType, cameraSetupType,
#> scientificName, count, lifeStage, sex, behavior, individualID,
#> individualPositionRadius, individualPositionAngle, individualSpeed, bboxX,
#> bboxY, bboxWidth, bboxHeight, classificationMethod, classifiedBy,
#> classificationTimestamp, classificationProbability, observationTags,
#> observationComments, taxon.taxonID, taxon.taxonRank, taxon.vernacularNames.eng,
#> taxon.vernacularNames.nld)`

media(clean_data) <-
  media(clean_data) %>% 
  dplyr::anti_join(media_to_remove)
#> Joining with `by = join_by(mediaID, deploymentID, captureMethod, timestamp,
#> filePath, filePublic, fileName, fileMediatype, exifData, favorite,
#> mediaComments, eventID)`

# inspect clean data
deployments(clean_data) %>% 
  dplyr::select(deploymentID, deploymentStart, deploymentEnd)
#> # A tibble: 4 × 3
#>   deploymentID deploymentStart     deploymentEnd      
#>   <chr>        <dttm>              <dttm>             
#> 1 00a2c20d     2020-05-30 02:57:37 2020-07-01 11:41:41
#> 2 29b7d356     2020-07-29 05:29:41 2020-08-08 06:20:40
#> 3 577b543a     2020-06-19 21:00:00 2020-06-29 01:33:22
#> 4 62c200a9     2021-03-27 20:38:18 2021-04-02 17:31:00

observations(clean_data) %>% 
  dplyr::filter(deploymentID == deploymentID_storm) %>% 
  dplyr::summarise(last_eventEnd = max(eventEnd))
#> # A tibble: 1 × 1
#>   last_eventEnd      
#>   <dttm>             
#> 1 2021-03-31 22:59:21

media(clean_data) %>% 
  dplyr::filter(deploymentID == deploymentID_storm) %>% 
  dplyr::summarise(last_timestamp = max(timestamp))
#> # A tibble: 1 × 1
#>   last_timestamp     
#>   <dttm>             
#> 1 2021-03-31 22:59:21

# The last observation is a couple of days before the storm (new deploymentEnd)

# (Update metadata in assignment functions is not merged with main branch yet)
clean_data <- clean_data %>% 
  camtrapdp:::update_temporal() %>% 
  camtrapdp:::update_taxonomic()

Created on 2024-10-15 with reprex v2.1.1

peterdesmet commented 2 months ago

I understand this as a theoretical use case, is it also one provided by a user? I wonder if such data cleaning aspects should be resolved in a data management system like Agouti or in the camtrapdp.

sannegovaert commented 2 months ago

It is a real user case of @bramdhondt.

bramdhondt commented 2 months ago

If this is referring to the deployment that I think it is, the real world scenario was even "worse": after the tree annex camera went down, the camera was taken away to an office desk nearby, where it kept filming the employee sitting at his computer for some days :-)

sannegovaert commented 4 weeks ago

A feature request about this issue been logged to de Agouti repository by @peterdesmet.