Open lejecrs opened 8 months ago
Which part of the pipeline is slow? With that many sequences, I would expect the dnapars
command to take quite awhile, especially if there are many parsimony trees. How many trees are reported in the output file outfile
that dnapars produces? If there are many, then it's likely that the gctree infer
command will also take awhile, but it's difficult to be sure.
I don't really have any suggestions to make inference faster, although there's a small chance that using a gctree version before v4.0.0 might work better.
Thanks for the reply. Yes! dnapars is super slow. For the 1000 sequence, it hasn't finished yet on the server for 2 days. For 300 sequences run on my laptop (Macbook Pro 2020), I calculated the runtime of the whole pipeline and it ranges 15000-24000 seconds (4-6 hrs) depending on different data. The 300 sequences will at most 2 inferred trees by the GCTree.
I think that the dnapars is slow because I tested the gctree infer runtime and it finished in a few seconds.
The only suggestion I have is that you could do a less thorough tree search with dnapars, by providing the --quick
argument to the mkconfig
command. Of course, the quality of the final inferred trees may decrease. Besides this, I have no recommendation, phylogenetic inference on thousands of sequences tends to be quite slow. It's possible that iqtree will give you a tree in a more reasonable amount of time than dnapars does, in which case you could consider using that tool instead of gctree.
Thank you! Any potential modification on distributing the computation? For example, if we could manually adjust the therads used by GCTree?
Hi! I was using GCTree for the inference of phylogenetics based on sequence number of about 1000, it is super slow. Do you have any optimization on that?