rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Segmentation fault while running muscle #49

Closed navkahlon240 closed 1 year ago

navkahlon240 commented 1 year ago

muscle -in combined.fasta -out align.fasta

MUSCLE v3.8.1551 by Robert C. Edgar

http://www.drive5.com/muscle This software is donated to the public domain. Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

combined 74665 seqs, lengths min 38, max 11532, avg 307 Segmentation fault

Can you help me out with this?

rcedgar commented 1 year ago

this repo is v5, you're using v3. this set is too large for v3. at a guess v5 should be able to handle this set, though this is a large set and the sequence length is very variable, are you sure these are globally alignable?

navkahlon240 commented 1 year ago

What kind of data is globally alignable? How can we check whether data is globally alignable or not?

rcedgar commented 1 year ago

global alignment = all letters from all sequences are included in the order they appear in the sequence. if there are any of the following: transversions, inversions, or long duplications, insertions or deletion events between two sequences, then a global alignment will be somewhere between (a) align some sequence incorrectly and (b) totally wrong.