rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Specifying threads #66

Closed bioinfo007 closed 5 days ago

bioinfo007 commented 10 months ago

Is it possible to specify threads and perform parallel processing, i have a large data set that I want to submit via sbatch on HPC, but there is no clear help for specifying threads. Please respond with a suitable solution. As processing a single file with around 600 sequences is taking much time, even with specifying 80 cores. It seems muscle5 is not able to use all the 80 cores. Thank you

rcedgar commented 10 months ago

The -threads N option specifies the number of threads, default is min(20, number_of_CPU_cores).

You can verify muscle is using multiple threads using the top command, e.g. it will show 2000% CPU if it us using 20 cores.

How long are the sequences? time and memory scale like L^2 where L is the sequence length, if they are much longer than around 1,000 letters, then it may be slow.

zersalsion commented 1 month ago

It seems to be that -super5 ignores the threads option.

rcedgar commented 1 month ago

It should support the -threads option, I have tested this extensively. please double-check, if you still believe there is a problem please provide test data and script to reproduce the symptom.

zersalsion commented 1 month ago

You are right. muscle only has a relative long startup phase that is single threaded. After a couple of minutes, it starts to use the defined number of cores.