Open frederic-mahe opened 4 years ago
From the README:
If the number of samples where the 'potential parent' has a positive presence is below the minimum_relative_cooccurence (default 95%, meaning that 1 in 20 samples are allowed to have no parent presence), the 'potential parent' is rejected.
Contrary to what is indicated in the README, it seems that when using the default minimum_ratio_type = min
, cases when the parent is missing in one or more samples are always rejected by lulu
(no merging). A parent missing in one sample means that the minimum_ratio
value is null
, which leads to rejection. In consequence, changing the minimum_relative_cooccurence` value has no effect.
I wrote a toy-example to demonstrate this behavior:
library(lulu)
library(dplyr) # lulu requires dplyr
matchlist_name <- data.frame(x = "B" , y = "A", z = 99.0)
otutable_name <- data.frame(
row.names = c("A", "B"),
s01 = c(0, 1), # <= 'B' present, 'A' absent
s02 = c(9, 1),
s03 = c(9, 1),
s04 = c(9, 1),
s05 = c(9, 1),
s06 = c(9, 1),
s07 = c(9, 1),
s08 = c(9, 1),
s09 = c(9, 1),
s10 = c(9, 1),
s11 = c(9, 1),
s12 = c(9, 1),
s13 = c(9, 1),
s14 = c(9, 1),
s15 = c(9, 1),
s16 = c(9, 1),
s17 = c(9, 1),
s18 = c(9, 1),
s19 = c(9, 1),
s20 = c(9, 1),
s21 = c(9, 1),
s22 = c(9, 0)) # <= 'A' present, 'B' absent
## Note that parent 'A' is absent in one sample where 'B is present',
## and present in one sample where 'B' is absent (same total
## spread). The relative co-occurence is 20 / 21 = 0.95238, which is
## greater than 0.95, the default threshold value.
## bug: no merging with default parameters
## (minimum_ratio = 0)
lulu::lulu(otutable_name, matchlist_name)
## with minimum_ratio_type = "avg": merging as expected
## (minimum_ratio (average) is 8.57)
lulu::lulu(otutable_name, matchlist_name, minimum_ratio_type = "avg")
quit(save = "no")
Dear @tobiasgf
I think there is a undesired behavior with
minimum_ratio
whenminimum_ratio_type == "min"
, and to some degree whenminimum_ratio_type == "avg"
. It happens in the code below:When
minimum_relative_cooccurence
is smaller than 1.0 (default is 0.95), the potential parent OTU can be absent from some samples where the daughter OTU is present. A missing parent means that the ratio for that particular sample will be zero. This is were the issue is, the wayminimum_ratio
is computed, it will identify the samples where the father is missing as the ones with the smallest ratio (zero). This is belowminimum_ratio
and the father OTU is rejected, despite usingminimum_relative_cooccurence
.To solve that, the minimum_ratio should be searched among the non-null ratio values, not all ratio values.
When
minimum_ratio_type == "avg"
, I think the average should be calculated on non-zeros ratio values too.