rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Improve multi-threading efficiency? #42

Closed lukaszsobala closed 1 year ago

lukaszsobala commented 1 year ago

Hello,

I use MUSCLE quite often, but I think the speed of the program suffers a bit from the multi-threading algorithm.

It is quite common for me that after a some time (e.g. 50% of the total time needed for one pass of the Consistency step) more than half of the threads will have finished and it takes a long time for the last 2-3 threads to finish the work. I attach a btop screenshot with the pattern of core usage of a muscle (5.1.linux64) run: Screenshot from 2022-10-27 11-37-57 The two large large "blocks" of activity on the right are the Consistency steps.

For short jobs it does not matter that much, but for alignments that take hours, 2 h vs 4 h makes a difference. And the longer the job, the longer the proportion of time spent waiting for the trailing threads seems to be. Is there a way to dynamically reassign parts of the work to cores that sit idle, or maybe before assigning the work, using some method to assess the "difficulty" of the alignment and weigh the work using this?

Thank you, Lukasz

rcedgar commented 1 year ago

It might be possible, but not trivial. The main loop for consistency is consflat.cpp line 11, you can see this is load-balanced by OMP so if some threads are idle there must individual calls to ConsPair() which take a long time. The inner loops in ConsPair() are the RelaxFlat_xx_xx() function in relaxflat.cpp, it's not obvious how these could be parellized within the overall consistency loop in any simple way.

lukaszsobala commented 1 year ago

Thanks for the quick reply. Forgive me but know next to nothing about cpp programming, so your explanation went over my head. But I understand that it would be difficult to implement.