inbo / bird-tracking

🛰🐦 Bird tracking - GPS tracking network for large birds
MIT License
20 stars 7 forks source link

Automatically mark outliers based on speed #146

Closed milotictanja closed 4 years ago

milotictanja commented 4 years ago

There are some obvious outliers in the Western marsh harrier datasets. Following the recent paper of Vansteelant et al. (in press), outliers are marked in case speed > 30 m/s. Outliers are marked in 3 iterative rounds:

  1. calculate speed based on latitude, longitude and timestamps of 2 consecutive observations
  2. filter all observations with speed >30 m/s
  3. in order to catch successive outliers, repeat this procedure 2 times

The following code was used to flag outliers in the Western marsh harriers datasets:

library(tidyverse)
library(lubridate)
library(sp)
library(trip)

speed <- function(x){
  trip.matrix <- data.matrix(x[,c("location.long","location.lat")], rownames.force = NA) 
  between.point.distances <- trackDistance(trip.matrix, longlat = TRUE)          
  x$PointDist <- c(0,between.point.distances)   # dist in km
  x$DateTime <- x$timestamp                                                    
  x$TimeElapsed <- 0                                                             
  for (i in 2:NROW(x)){
    x$TimeElapsed[i] <- difftime(x$DateTime[i], x$DateTime[i-1],                 
                                 units = "secs")                                                              
  }
    (x$PointDist * 1000)/x$TimeElapsed   # speed = dist/time. in m/s
}

MH_Waterland_GPS <- read.csv("MH_WATERLAND - Western marsh harriers (Circus aeruginosus, Accipitridae) breeding near the Belgium-Netherlands border.csv")

levels <- levels(MH_Waterland_GPS$individual.local.identifier)

# I did not manage to put the iterative speed calculation in 1 function, so I used a stepwise approach
# select an individual (to reduce calculating time), sort by timestamp and calculate speed
MH_W1_step1 <- MH_Waterland_GPS %>%
  filter(MH_Waterland_GPS$individual.local.identifier == levels[1]) %>%
  mutate(timestamp = as_datetime(timestamp)) %>%
  arrange(timestamp) 
MH_W1_step1$speed <- speed(MH_W1_step1)
# create a dataframe for the outliers
MH_W1_outliers1 <- MH_W1_step1 %>%
  filter(speed > 30) %>%
  select(event.id) %>%
  mutate(removal.round = "step1")
# filter the outliers from the dataset, sort by timestamp and calculate speed with the remaining observations
MH_W1_step2 <- MH_W1_step1 %>%
  filter(speed <= 30 | is.na(speed)) %>%
  arrange(timestamp)
MH_W1_step2$speed <- speed(MH_W1_step2)
MH_W1_outliers2 <- MH_W1_step2 %>%
  filter(speed > 30) %>%
  select(event.id) %>%
  mutate(removal.round = "step2")
# round 3 of outlier detection
MH_W1_step3 <- MH_W1_step2 %>%
  filter(speed <= 30 | is.na(speed)) %>%
  arrange(timestamp)
MH_W1_step3$speed <- speed(MH_W1_step3)
MH_W1_outliers3 <- MH_W1_step3 %>%
  filter(speed > 30) %>%
  select(event.id) %>%
  mutate(removal.round = "step3")

# create a dataframe with all observations marked as outliers and the step in which it was marked
MH_W1_outliers <- rbind(MH_W1_outliers1, MH_W1_outliers2, MH_W1_outliers3) 

# join the outlier dataframe with the full observations dataframe and mark outliers as 'TRUE' in the 'algorithm.marked.outlier' field
MH_W1 <- left_join(MH_W1_step1, MH_W1_outliers, by = "event.id") %>%
  mutate(algorithm.marked.outlier = ifelse(is.na(removal.round), F, T))

Using this approach,

peterdesmet commented 4 years ago

MH_WATERLAND

Confirmed: 65 import-marked-outliers.

Previously, no fixes were marked as outlier in the database, now there are many, which don't always overlap with the import outliers. And #145: Almut (H185298) no longer has data, so I used the Movebank data.

animal-id records manual-outlier import-outlier (runs) total-outlier remaining records remark
H173481 13209 0 4 (2) 4 13205
H185298 475 0 0 (1) 0 475
L143451 183985 20 16 (2) 28 183957
L143457 62297 11 8 (2) 15 62282 affects start timestamp
L143467 31070 4 7 (3) 8 31062
L143472 85924 27 30 (3) 42 85882
L143473 950 0 0 (1) 0 950
total 377910 62 65 97 377813
peterdesmet commented 4 years ago

MH_ANTWERPEN

Confirmed: 14 import-marked-outliers.

animal-id records manual-outlier import-outlier (runs) total-outlier remaining records remark
H171693 2690 2 2 (2) 4 2686
H197169 28181 0 8 (2) 8 28173
L177801 17046 0 4 (2) 4 17042
total 47917 2 14 16 47901
peterdesmet commented 4 years ago

H_GRONINGEN

Did more runs and got 316 outliers (instead of 306)

animal-id records manual-outlier import-outlier (runs) total-outlier remaining records remark
5325667 781906 72 289 (7) 327 781579  had to split in 8 chunks
5327085 21337 19 8 (2) 23 21314
5336455 5420 0 0 (2) 0 5420
5446465 178830 9 19 (5?) 26 178804
total  987493 100 316 376 987117
peterdesmet commented 4 years ago

Done in #147

peterdesmet commented 4 years ago

All data uploaded to Movebank and Zenodo.