psolymos / bSims

Bird Point Count Simulator
https://peter.solymos.org/bSims/
4 stars 1 forks source link

Fuzzy individual clustering #7

Closed psolymos closed 5 years ago

psolymos commented 5 years ago

Observed might perceive same individual as multiple ones, and multiple individuals as the same, depending on the spatial distinctness (clustering) of detection events.

One such implementation is using ADPclust:

## fit ADP clustering
library(ADPclust)
## get coords
xy <- do.call(rbind, lapply(1:length(x$events), function(i) {
  cbind(x$events[[i]]$x+x$nests$x[i],x$events[[i]]$y+x$nests$y[i])
}))
## get individual id
i <- do.call(c, lapply(1:length(x$events), function(i)
  rep(i, nrow(x$events[[i]]))))
ad <- adpclust(xy, dmethod = "euclidean")
## number of clusters found
ad$nclust
## classification
tab <- table(inds=i, clust=ad$clusters)
psolymos commented 5 years ago

It can also be just a max distance (same units as extent). This would render nearest neighbors within max distance perceived as same individual. This would better allow changing this setting and making this more reproducible (i.e. not dependent on realized movement trajectories).

Implementation:

Use get_detections, manipulate $i before call to get_events. E.g. inds 1 and 2 will be min(1, 2).

psolymos commented 5 years ago

This relates to #5

psolymos commented 5 years ago

Alpha hull is also an option, needs fine tuning

library(alphahull)
library(bSims)

s <- list(
  movement=0.2,
  move_rate=1
)
x <- bsims_all(s)$new()
e <- get_events(x, event_type = "move")

a <- ahull(e$x, e$y, 0.2)

plot(y ~ x, e)
plot(x$tess, add=TRUE, "tess", "none")
plot(a, col=2, add=TRUE)

It is still not clear how this can be implemented effectively.

Here are some thoughts:

  1. When there is no under/over counting, it is simple, just use $i.
  2. For under counting, we can merge individuals with nearest nest locations assigning the smaller individual id to both. This can be governed by a 0-1 number that gives the proportion of perceived / actual number of individuals in the landscape, which can also take a binomial form (rbinom(1, total, p)). This is straightforward, because we don't have to check events. We can randomly pick, and use Voronoi area as weight in sample. Have to do this iteratively.
  3. When we want double counting, we have to look at events to be able to split events among 2 perceived individuals. See a proposed iterative algorithm for this. The argument would be a p> number, thus 1-p giving the proportion of total individuals that will be double counted.

Algorithm for over counting: from a mechanism perspective, we want to identify individuals with the following characteristics:

These 3 measures can be combined to rank individuals. Then use sign of PC1 scores to divide into 2 individuals. Then iteratively repeat the process.

We might need an add1 and drop1 method.

psolymos commented 5 years ago

Slight complexity: this process relates to the actual detections and not events. So N can be between 0 and total # of detections. Pooling needs to happen based on xy of detections.

Using hclust it becomes damn simple:

library(bSims)

set.seed(1)
s <- list(
  movement=0.1,
  move_rate=2,
  tau=2
)
x <- bsims_all(s)$new()

d <- get_detections(x, condition="alldet")

(N <- length(unique(d$i)))
hc <- hclust(dist(cbind(d$x, d$y)))

h <- 2
ct <- cutree(hc, k=min(nrow(d), max(1, round(N*h))))

plot(y ~ x, e, pch=3, cex=0.6, col="grey")
points(0,0,pch=3, cex=2, col=2)
plot(x$tess, add=TRUE, "tess", "none", col="grey")
points(y ~ x, d, pch=19, col=ct)