lamho86 / phylolm

GNU General Public License v2.0
30 stars 12 forks source link

phylolm() obscure errors when a predictor has unused levels #72

Open AMBarbosa opened 2 weeks ago

AMBarbosa commented 2 weeks ago

phylolm() fails with obscure error messages (very hard to debug) when a predictor has no variation in the provided (sub)dataset, or even when a predictor is a factor with fewer than all possible levels. Here's some reproducible code based on help file examples:

# create example data:

set.seed(16)
tre = rcoal(60)
taxa = sort(tre$tip.label)
b0=0; b1=1
(x <- rTrait(n=1, phy=tre, model="BM",
            parameters=list(ancestral.state=0, sigma2=10)))
(xd <- rTraitDisc(phy=tre, model="ER", k = 3))
unique(xd)  # 2 levels out of 3
y <- b0 + b1*x + rTrait(n=1, phy=tre, model="lambda",
                        parameters=list(
    ancestral.state=0, sigma2=1, lambda=0.5))
dat = data.frame(trait=y[taxa], x=x[taxa], xd=xd[taxa])
str(dat)

# model:

fit = phylolm(trait ~ x + xd, data=dat, phy=tre, model="lambda")

# Error in solve.default(comp$XX) :
# Lapack routine dgesv: system is exactly singular: U[3,3] = 0

# or some other times:
# Error in solve.default(comp$XX) : 
# system is computationally singular: reciprocal condition number = 2.12258e-17
# [or another number]

Below are some possible solutions that I've been using in my scripts:

# drop unused factor levels:

dat <- droplevels(dat)
str(dat)
fit1 = phylolm(trait ~ x + xd, data=dat, phy=tre, model="lambda")

# character instead of factor for discrete predictors:

dat$xd <- as.character(dat$xd)
str(dat)
fit2 = phylolm(trait ~ x + xd, data=dat, phy=tre, model="lambda")

all.equal(fit1, fit2)

Incorporating these (or equivalent) solutions in the function, or at least a more informative error message, could spare the users from countless hours of debugging.

Weirdly, I found some cases where I still get a phylostep() error after using droplevels(), but not after using as.character() for factors instead -- let me know if you want me to send you my data for reproducing this.

Regards,