iqtree / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
231 stars 55 forks source link

Speed up, Iteration times #84

Closed Wenwen012345 closed 2 years ago

Wenwen012345 commented 2 years ago

Hello,@thomaskf

Iqtree2 was used to build evolutionary trees but it was found to take too long. It's been more than ten days. I constructed an evolutionary tree of RTDomains of LTR-retrotransposons, with over 4000 sequences from five species. The command is as follows: iqtree2 -s ty1.mafft -bb 1000-nt AUTO -m q.ammal +F+R5. This command is a reference from someone else, so I don't have a clear understanding. The most critical point is that dozens of sequences were missed due to software problems (the software used to discover LTR). Now I might have to rerun iqTree again. Do you have any good suggestions? I looked at some options that should speed things up, but I was afraid to mess with them:

ULTRAFAST BOOTSTRAP/JACKKNIFE: -B,--ufboot NUM Replicates for ultrafast bootstrap (>=1000) -J,--ufjack NUM Replicates for ultrafast jackknife (>=1000) --jack-prop NUM Subsampling proportion for jackknife (default:0.5) --sampling STRING GENE|GENESITE resampling for partitions (default:SITE) --boot-trees Write bootstrap trees to .ufboot file (default:none) --wbtl Like --boot-trees but also writing branch lengths --nmax NUM Maximum number of iterations (default:1000) --nstep NUM Iterations for UFBoot stopping rule (default:100) --bcor NUM Minimum correlation coefficient (default:0.99) --beps NUM RELL epsilon to break tie (default:0.5) --bnni Optimize UFBoot trees by NNI on bootstrap alignment

What do you suggest?

log.file:

ty1.mafft(1).log

roblanf commented 2 years ago

Hi @wensulin93,

Can you move this to the google group - that way we can use the google group to build up a knowledge base for other users too. Try to keep github issues for issues with the software itself, and use the google group for discussions about how to use it.

Rob

Wenwen012345 commented 2 years ago

Hello @roblanf

This is a good suggestion. Unfortunately, I am in mainland China, which prevents me from accessing Google forums. The firewall doesn't allow me access to Google forums. This problem will not be solved in a short time.

Could you give me some advice if it's convenient for you? I wonder if changing "-bb 1000" to "-bb 100000" will work?

roblanf commented 2 years ago

Ah, good point! To provide any advice on how or whether you should speed up your inference, I think I'd really need to understand what you are trying to do. I.e. would you be happy with a quick tree without bootstrap support? How much do you care that you are using the best model, or whatever model the people that gave you the command are using, etc.

The more detail you can provide, the better.

(And no, increasing -bb will not speed it up, it will slow it down. You could speed it up by removing the -bb command altoether though).

Wenwen012345 commented 2 years ago

Hello @roblanf Thank you for your prompt reply. I think what I need is an LTR-retrotransposon evolutionary tree that is relatively accurate but not too complicated, not too time-consuming, not necessarily 100% accurate. It looks like this tree down here. I think it's important that the audit experts agree.

图片123(1)

Before you gave me advice to build Neighbour - Joining the evolutionary tree (https://github.com/iqtree/iqtree2/issues/79#issuecomment-1112797836). But I can't find where the setup for building a Neighbour-joining evolutionary tree is. I'm also not sure which model can be cut. Can you give me some advice?

I build the evolutionary tree of reference from: https://onlinelibrary.wiley.com/doi/10.1111/jse.12850. The text reads as follows: "Based on the sequence alignment file of the RT domain described above, IQtree V2.1.4 (Minh et al., 2020) with the following parameters' -bb 1000, -nt AUTO ' Was used to construct a maximum likelihood with bootstrap analysis phylogeny tree for Copia and Gypsy. The substitution model employed by IQtree for Copia And Gypsy is Q.mammal +F+R5 and Q.mammal +F+G4 respectively. "

I am not sure which parameters are necessary and which are not necessary in the model given by the author of this article. Can you give me some advice? In short, my goal is to build an evolutionary tree with some degree of accuracy (but might not necessarily 100%) that recognizes LTR-retrotransposons. Thus, the evolutionary relationship of different LTR-RT can be seen.

thomaskf commented 2 years ago

Hi @wensulin93

From your log file, I noticed that you are using an option: -nt AUTO and your machine has 40 CPU cores. I remember that we talked about this before. Using "-nt AUTO" will let IQ-Tree detect the best CPU threads to achieve the highest efficiency for the data set under your machine. However, it does not mean that it will always use all the CPU threads on your machine. For your case, if you want to utilise all the CPU power, it is better to specify the exact number of CPU threads, like "-nt 40" (if this is the only job running on your machine).

By the way, the option "-bb" defines the number of bootstrap replicates. If you do not need bootstrapping, removing this option will have a massive speed up of the program. Bootstrapping will report a number for each internal branch showing the reliability of that branch. If you only need to show the tree but not the reliability of each branch, you may not need bootstrapping.

thomaskf commented 2 years ago

HI @wensulin93

Just a quick note that from your log file, I saw your command was: iqtree2 -s ty1.mafft -v -T 30 -bb 1000 -nt AUTO -m Q.mammal+F+R5 . In fact, the option "-nt AUTO" overrode "-T 30". And the option "-v" is for verbose mode, which is not needed. Thus your command should be changed to: iqtree2 -s ty1.mafft -bb 1000 -nt 30 -m Q.mammal+F+R5

In your log file, I also saw the message: "BEST NUMBER OF THREADS: 5". Thus the program might only used 5 CPU threads.

If necessary, please refer to https://github.com/iqtree/iqtree2/wiki/Command-Reference#general-options for the meaning of the options.

roblanf commented 2 years ago

@wensulin93, if all you need is a relatively accurate tree, and if the current command is too slow for you, my suggestions would be:

  1. Try neighbour joining (e.g. DecentTree, which we just released)
  2. Try FastTree2
  3. Try RaxML-NG (which can be quicker for some datasets)
  4. Try a simpler model in IQ-TREE, e.g. something like -m Q.mammal+F+I+G

One of those should help!

Wenwen012345 commented 2 years ago

Hello @roblanf @thomaskf

Thank you for your careful solution. I started experimenting with the new parameters two days ago, and it seems to be working well so far. I would appreciate your contributions to my research. Thank you!

Wenwen012345 commented 2 years ago

HI @wensulin93

Just a quick note that from your log file, I saw your command was: iqtree2 -s ty1.mafft -v -T 30 -bb 1000 -nt AUTO -m Q.mammal+F+R5 . In fact, the option "-nt AUTO" overrode "-T 30". And the option "-v" is for verbose mode, which is not needed. Thus your command should be changed to: iqtree2 -s ty1.mafft -bb 1000 -nt 30 -m Q.mammal+F+R5

In your log file, I also saw the message: "BEST NUMBER OF THREADS: 5". Thus the program might only used 5 CPU threads.

If necessary, please refer to https://github.com/iqtree/iqtree2/wiki/Command-Reference#general-options for the meaning of the options.

I was stupid, I thought this option was to print more detailed information, didn't realize it would slow down...