rethomics / behavr

Data structure to store and manipulate high throughput behavioural data in R
http://rethomics.github.io
6 stars 4 forks source link

rejoin returns emtpy data table #8

Closed pepelisu closed 7 years ago

pepelisu commented 7 years ago

After creating some data with toy_*_data, and summarizing it using rejoin gives back an empty data.table.

Steps to reproduce:

query <- data.frame(experiment_id="toy_experiment",
                   region_id=1:32, # 40 animals
                   condition=c("A","B"), # conditions A,B,A,B ...
                   # drift is a coeficient to drift the time so that we make 
                   # to slightly different periods see below
                   drift=c(1.001,1.000)
 )
toy_data <- toy_activity_data(query, duration = days(10))
toy_data[, t := as.integer(t/60)] #new time in minutes
summary_toy_data <- toy_data[, .(counts=sum(moving)), by=.(id,t)]
summary_all<-rejoin(summary_toy_data)

summary_all is then a empty data.table.

qgeissmann commented 7 years ago

Thanks! So the problem starts when doing toy_data[, .(counts=sum(moving)), by=.(id,t)]. This changes the key:

data.table::key(summary_toy_data)
#> [1] "id" "t" 

The default behaviour of behavr (pun intended ;)) is to coherce the returned table to a data.table. Then rejoin fails.

So we should:

qgeissmann commented 7 years ago

A simple and efficient way to do it. We group by ID, then, within each group, we resample/aggregate:

summary_toy_data <- toy_data[, 
                             .SD[, .(count=sum(moving)), by="t"], 
                             by="id"]