Closed matiasandina closed 1 year ago
This seems to address #12 and #6. Maybe there's an issue with the handling of factors/datetimes in different versions of ggplot2
(and I might be running different version in different machines). I can confirm this is working on version 3.4.0
while errors where experienced using 3.3.5
.
A remaining issue in this PR is the strategy of using split
on the y
axis. When using anything other than wombat
in the y
(e.g., behaviour
, trial
), this would group wombats together (unless using facets).
Still, when we separate using facet_wrap()
, the behaviors are bound together from min
to max
.
ggplot(aes(y=behaviour,
color=factor(behaviour),
behaviour=factor(behaviour))) +
geom_ethogram() +
facet_wrap(wombat~trial)
I think this is tied to the way the data is split and the function calls. I will show what I think it's problematic with a simpler example from gimli
dataset.
gimli <- wombats |>
filter(wombat == "gimli")
lapply(split(gimli, gimli$behaviour),
vctrs::vec_identify_runs)
$digging
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48
attr(,"n")
[1] 48
$pondering
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79
attr(,"n")
[1] 79
$snuffling
[1] 1 2 3 4 5
attr(,"n")
[1] 5
We later call min
max
, which will give us something akin to calling
lapply(split(gimli, gimli$behaviour),
function(tt) range(vctrs::vec_identify_runs(tt)))
$digging
[1] 1 48
$pondering
[1] 1 79
$snuffling
[1] 1 5
And this is exactly what we get with the geom_ethogram()
call
ggplot(gimli, aes(y=behaviour, behaviour=behaviour, color=behaviour)) + geom_ethogram()
I think we should probably consider switching to using vec_identify_runs()
or something like data.table::rleid
or the new dplyr::consecutive_id()
with this approach.
gimli %>%
mutate(run_id = vctrs::vec_identify_runs(behaviour)) %>%
group_by(run_id) %>%
summarise(behaviour=unique(behaviour), x=first(frame), xend=last(frame)) %>%
ggplot(aes(x=x, xend=xend, y=behaviour, behaviour=behaviour)) + geom_ethogram()
# Will produce same as above
gimli %>%
mutate(run_id = data.table::rleid(behaviour)) %>%
group_by(run_id) %>%
summarise(behaviour=unique(behaviour), x=first(frame), xend=last(frame)) %>%
ggplot(aes(x=x, xend=xend, y=behaviour, behaviour=behaviour)) + geom_ethogram()
There are other issues with this approach (i.e., we have to be cautious about grouping the data correctly before calling summarise
), but I think that the math being explicitly done is less prone to unexpected behavior.
I will give it a try soon to see if we could have a working alternative solution.
@wilkox pushed a dplyr branch, could you please check it? I know it adds dependencies but I think it might be more robust help us move forward with a release. Tests are failing because I didn't change the files, I might run it later on merge if you think the changes are OK.
The dplyr branch is currently missing xend = last(x) + sampling_interval
to account for the period between the last observation and the first observation of the next behavior, when computing ethograms.
data <- data %>% dplyr::mutate(x = seq_along(group)) %>%
dplyr::mutate(run_id = vctrs::vec_identify_runs(behaviour)) %>%
dplyr::group_by(group, run_id) %>% dplyr::summarise(behaviour = unique(behaviour),
xend = last(x), x = first(x), y = unique(y), yend = unique(yend),
PANEL = unique(PANEL), colour = unique(colour),
.groups = "keep")
It produces "cute" behavior by leaving black the space between the last observation of one behavior and the next one
But I think we should be accurate and add the sampling interval to the last
Given #16, I think this is behaving now and we should move forward :smile:
from
?rle
This is potentially a problem with aggregating
NA
s. We don't see this most of the time because the default is to filterNA
s out.Potential options are:
scale_y_discrete
(see here) hasna.translate
andna.value
If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right. Maybe we can tap into that and coerce values into something usablerle(is.na(c("A", "A", NA_character_, NA_character_, "R")))
to reconstruct the instances where whe haveNAs
. Could solve it, but likely to add more buggy indexingrle
function instead https://rdrr.io/github/earowang/mists/man/na-rle.htmlggethos::calculate_ethogram(data) %>% ggplot(aes(...)) + geom_segment()