rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Request: don't rearrange sequence order #45

Open SethMusker opened 1 year ago

SethMusker commented 1 year ago

Hi Robert,

Is it possible to keep the order of the sequences in the output file the same as in the input? (I've been getting this with the latest windows and linux pre-compiled binaries, with -align, with each producing a different, seemingly random order).

Cheers, Seth

p.s. I've been really happy with the quality of alignments from muscle5!

rcedgar commented 1 year ago

Thanks for the feedback. I'm aware of the "seemingly random" order issue. I was thinking the output should be sorted to bring similar sequences together (e.g. this can be done with a post-order traversal of the guide tree), but preserving the input order also makes sense as an option. On my list, but not sure when I will be able to work on it. In the mean time it's a few lines of python to re-arrange the sequences following the order of the input FASTA or a text file with a list of sequence identifiers.

permia commented 6 months ago

The esl-alimanip (in HMMER software) can solve the two situation easily. The option --tree in esl-alimanip may be used to bring similar sequences together. The option --reorder may be used to keep the order as the input fasta.