Open DuyDN opened 7 years ago
Hi @DuyDN
Thanks for moving this issue over here. Yes, we are aware that the GTR fitting in phangorn does not scale well to datasets with thousands (or more) different sequences. The recommended alternative right now is to use RaxML via the ips package within R.
This was discussed in an earlier issue (https://github.com/benjjneb/dada2/issues/88), in particular see the very helpful comments from @giriarteS
Also for my co-authors @spholmes @krisrs1128 @jfukuyama @joey711 -- this is something that we should consider updating if there is a new version of the F1000 workflow, as I've gotten several emails/opened issues in the dada2 repository from people hitting this same roadblock.
Hi, I have a question that how long does it take to construct the phen tree. I mean that run the optim.pml command took me more than 3 days and not done so far
fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))
I found some trick on internet but not sure if it can be apply with the dada2 workflow which help me to complete the construct tree
fitGTR <- optim.pml(fitGTR, rearrangement = "stochastic", ratchet.par = list(iter = 5L, maxit = 5L, prop = 1/3))
data can be download here and the code i run below
(https://drive.google.com/file/d/0B5ombffhUkLFNGQwLWpBblJpdGM/view?usp=sharing)
Remove chimeras
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE) dim(seqtab.nochim) sum(seqtab.nochim)/sum(seqtab) seqs <- getSequences(seqtab.nochim) names(seqs) <- seqs # This propagates to the tip labels of the tree alignment <- AlignSeqs(DNAStringSet(seqs), anchor=NA) save(seqs, ps,taxa, file = "dada.phylo.Rdata") length(seqtab.nochim) dim(seqtab.nochim) library(phangorn) phang.align <- phyDat(as(alignment, "matrix"), type="DNA") dim(alignment) dm <- dist.ml(phang.align) compact.align <- unique(phang.align) length(compact.align) # ~5000 length(phang.align) treeNJ <- NJ(dm) # Note, tip order != sequence order fit = pml(treeNJ, data=phang.align)
negative edges length changed to 0!
fitGTR <- update(fit, k=4, inv=0.2)
this one did not complete
fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))
this one complete on a day
fitGTR <- optim.pml(fitGTR, rearrangement = "stochastic", ratchet.par = list(iter = 5L, maxit = 5L, prop = 1/3))