mwpennell / arbutus

Assessing the adequacy of phylogenetic models of quantitative trait evolution
7 stars 2 forks source link

Bug in compare_pic_stat when observed test statistic is NA #21

Closed mwpennell closed 9 years ago

mwpennell commented 9 years ago

There may be cases where the observed test statistic is not able to be calculated (i.e. returns NA) because the model is so poor (for example, fitting lm to the abs(contrasts) ~ node height chokes on some datasets).

Currently, this results in a p-value than goes towards 2 as more sims are added

MWE

library(arbutus)
library(diversitree)

t <- tree.bd(pars=c(1,0), max.taxa=100)
d <- sim.character(t, 1)

## build unit tree
uu <- make_unit_tree(t, data=d)

## calculate a simple test statistic (doesn't matter what it is)
foo <- function(x) mean(x$pics[,"contrasts"])

## Calculate the test statistics
stat_obs <- calculate_pic_stat(uu, stats=list(mean=foo))

## Simulate new datasets
sims <- simulate_char_unit(uu, nsim=1000)

## Calculate test statistic on simulated data
stat_sim <- calculate_pic_stat(sims, stats=list(mean=foo))

## Compare the two
compare_pic_stat(stat_obs, stat_sim)

## Everything works fine

## Make the value of the test stat equal to NA
stat_obs[1] <- NA

## Now compare
compare_pic_stat(stat_obs, stat_sim)

if the test statistic cannot even be calculated on the observed data, the data is clearly outside of the bounds of what we expect if our model was adequate. I.e., the p-value should automatically be 0 when this is the case

What is the safest way to handle this case in compare_pic_stat

richfitz commented 9 years ago

The issue here comes from these lines:

            pr <- length(s[s > o])/(length(s) + 1)
            pl <- length(s[s <= o])/(length(s) + 1)
            p <- min(pr, pl)
            p_values[i] <- p*2