lamho86 / phylolm

GNU General Public License v2.0
30 stars 12 forks source link

Residuals check for phyloglm () #27

Closed rosbakh closed 7 months ago

rosbakh commented 4 years ago

Dear all,

I am afraid it is an off-topic question, but I failed to find any other documentation on phylolm elsewhere.

Suppose you fit a phylogenetic logistic regression to your binary data (presence/absence of a trait) - how do you check the model fit afterwards, e.g. normality of residuals? I usually use the DHARMa package to check logistic regression, but it is not supported by your package. Thanks in advance!

Best, Sergey

florianhartig commented 4 years ago

Hi, I am the developer of DHARMa. Sergey contacted me about the possibility to do DHARMa residual checks for the phyloglm function.

I have looked into this, and believe that the current implementation of phyloglm does not allow interfacing with DHARMa, for lack of a simulate and predict function in phyloglm. I have created an issue in DHARMa here https://github.com/florianhartig/DHARMa/issues/129, which also explains which functions would have to be provided to allow residual checks with DHARMa.

Best, Florian

cecileane commented 4 years ago

Ideally, one would want to output independent (and i.d.) residuals that map to the nodes of the trees, just like Felsenstein's independent contrasts do. That would make it possible to look at how residuals may violate assumptions: to find out if there is a trend over time, or if variances might differ from one clade to another, etc.

We can de-correlate the residuals by multiplying them by any square-root-inverse of the correlation matrix, but the standard version of this square root inverse (which is symmetric d.p.) does not provide any mapping to the nodes of the tree. The square root inverse that leads to Felsenstein's independent contrasts (extended to non-BM covariance matrices) is implemented in function sqrt_OU_covariance of package ℓ1ou. In case anyone has the time to use it to return independent residuals mapping to internal nodes...

For a simulation approach to checking assumptions, the phylolm package has rbinTrait to simulate binary trait matching the model used in phyloglm, and rTrait to simulate continuous traits matching the model used in phylolm. These simulation functions are used within phyloglm and phylolm for their bootstrap option: the code in there could serve as a basis to define simulate and predict functions.

florianhartig commented 4 years ago

Hi Cecile,

I find your comments regarding the independent contrasts extremely interesting. I wonder if this could also be applied to simulated residuals, as a way to check if the simulations adhere to the same correlation structure as assumed (if that worked, the same should work for temporal / spatial autocorrelation). I have to think more about this, but if you have any ideas, feel free to comment here or contact me.

About simulate() and predict() - it sounds as if this would be relatively straightforward to implement. As I wrote in the comments to DHARMa, I strongly prefer if the package developers do that, because they usually know the workings of their package best, but if you don't want to do this, I can also provide a wrapper in DHARMa.

Once simulate and predict functions (ideally also the other standard functions such as family(), see https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA) are available, I can integrate this into DHARMa.

Madeleinecg commented 1 year ago

Hi, is there any update on this?

florianhartig commented 1 year ago

In the meantime, I have added some rotation options in DHARMa similar to what @cecileane suggested above (see simulateResiduals, option "rotation") that would likely allow to account for the correlation structure in the residuals and test if the phylogenetic signal is completely removed by the model.

However, I still don't have any functions (in particular simulate) on the phylolm side to couple to DHARMa.

@cecileane - I suppose your bootstrap is a parametric bootstrap? I would basically need access to the simulated response to couple this to DHARMa! Detailed guide in https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA

cecileane commented 1 year ago

Unfortunately I'm working on way too many projects, and I don't have the time to dive back into this.

@lamho86 : would you have the time to code the necessary functions (simulate, predict, family) with API as needed for DHARMa, based on the simulation functions that already exist to run parametric bootstrap (rTrait, rbinTrait)?

PRs are welcome!

lamho86 commented 1 year ago

I cannot do it immediately but will look into it.

florianhartig commented 1 year ago

Sorry, just one more question: are the simulations that are done for the parametric bootstrap recorded, so that I could access them? In this case, a separate simulate function would not be needed, I just do not want to write code for simulating!

lamho86 commented 1 year ago

The simulated data are currently not part of the outputs, but can be added. If they are all you need to make DHARMa works, then it can be done quickly.

florianhartig commented 1 year ago

Yes, I would mainly need access to the simulated response.

I'm not familiar with the way you compute the likelihood, but I suppose the residual phylogenetic correlation is not estimated explicitly, so that it would be possible to condition on it, or can that be done? If that is possible, it would be ideal if simulations could be switched between:

a) simulating the entire model b) simulating conditional on the estimated phylogenetic RE (or whatever you call that)

Otherwise I can work just with a)

Cheers, Florian

lamho86 commented 1 year ago

Hello @florianhartig

I have exposed the bootstrap data (simulated from the model) to users (save = TRUE). The data are part of the output ($bootdata) where each column is a dataset. Please let me know if you need anything else to make it works.

Best, Lam

florianhartig commented 1 year ago

Hello Lam,

thanks, I will check it out!

Cheers, Florian

florianhartig commented 1 year ago

Hi Lam,

I made a few tests, a base implementation in branch https://github.com/florianhartig/DHARMa/tree/129-phylolm seems to be working, at least in principle. If you want to try this out, run

set.seed(123456)
tre = rtree(50)
x = rTrait(n=1,phy=tre)
X = cbind(rep(1,50),x)
y = rbinTrait(n=1,phy=tre, beta=c(-1,0.5), alpha=1 ,X=X)
dat = data.frame(trait01 = y, predictor = x)
fit = phyloglm(trait01~predictor,phy=tre,data=dat,boot=100, save = F)

summary(fit)
coef(fit)
vcov(fit)

DHARMa::simulateResiduals(fit, plot = T)

A few comments:

lamho86 commented 1 year ago
florianhartig commented 1 year ago

Great, thanks! About point 2) As a principle, I don't want to hard-code specific calculations for packages because then I have to track changes and make sure everything still works. However, as the rotation argument is not automatically applied by DHARMa anyway, I will probably just leave a note in the help and leave this up to the user.

florianhartig commented 2 weeks ago

Hi all,

I have more user requests for this - are there any plans to push a CRAN update so that this functionality is available if the package is installed from CRAN?

I'm hesitant to add this functionality to DHARMa unless it works directly with the CRAN version.

Best Florian

MorceletL commented 2 weeks ago

@lamho86, in your previous comment to @florianhartig, you mentionned doing V^{-1/2} using the package https://github.com/khabbazian/l1ou to implemente the rotation of the residuals in order to check if the model actually correct for phylogenetic auto-correlation.

But when you are refering to V are you refering to model$vcov, or the one from the tree used in the modelisation ?

florianhartig commented 2 weeks ago

Hi @lamho86,

for my purposes, the rotation is implemented in the DHARMa package, see example here https://github.com/florianhartig/DHARMa/issues/129

So, from my side, all I need is the option to simulate (which I have now), everything else is implemented in DHARMa, so you could lean back and suggest to use this to look at the residuals.

Best Florian

EDIT sorry, I mean @MorceletL, not @lamho86 - from my perspective this problem is solved as soon as the update is on CRAN. I will probably nevertheless push a DHARMa update soon that checks for the existence of the exposed update and throws an error otherwise!

lamho86 commented 2 weeks ago

@florianhartig I will push the new version to CRAN. Thank you very much to make DHARMa works with phylolm.

@MorceletL V is the covariance (up to a constant) matrix of observations which is from the tree and the model.

MorceletL commented 2 weeks ago

@lamho86 thank you for your answer.

Then I have another question reagarding the matrix. Indeed @cecileane mentionned earlier this one should be symetrical, however mine is not and has negative values (minimum being: -1.199319) which raises issues when applying it to the rotation argument. @florianhartig

I was wondering if one of you had a clean solution to solve this 2 issues. Thanks

dat_dh<- breeding_RRH[c("log_tRRH","PC2")]
fit = phylolm(log_tRRH ~ PC2, phy = tree, data = dat_dh, model ="BM")
res_raw = DHARMa::simulateResiduals(fit, plot = F) # raw residuals
res_ind <- DHARMa::simulateResiduals(fit, plot = F, rotation = "estimated") # independant
test<-sqrt_OU_covariance(tree)$sqrtInvSigma
res_bis <- DHARMa::simulateResiduals(fit, plot = F, rotation = test) # independant
lamho86 commented 2 weeks ago

Hello @MorceletL,

Why isn't your V symmetric. On the other hand, the V^{-1/2} does not need to be symmetric and can have negative values.

MorceletL commented 2 weeks ago

Hello @lamho86 I have no idea why V is not symmetric. I got my tree from a collegue based on 193 species of birds.

isSymmetric(l1ou::sqrt_OU_covariance(tree)$sqrtSigma)
[1] FALSE

Regarding the inverse squared root matrix, it raises an issue to complete the rotation explained by @florianhartig. Indeed, the function behind the rotation is chol() allowing to "compute the Cholesky factorization of a real symmetric positive-definite square matrix" (from CRAN reading). Thus, as V^{-1/2} is not symmetric and has negative values I can not perform the rotation and check if (1) the model actually removed all phylogenetic signal, and (2) distribution and homoscedasticity of the residuals.

lamho86 commented 2 weeks ago