Segmentation fault issue while running Muscle 5.1.linux_intel64

AlexShein commented 2 years ago

Hello!

I have downloaded muscle version 5.1,linux_intel64 and ran alignment of this sequences file.

Command: ./muscle5.1.linux_intel64 -align corr_analysis/data/35_sequences.fasta -output corr_analysis/aligned_aligned_sequences.fa -refineiters 2 Output:

❯ ./muscle5.1.linux_intel64 -align corr_analysis/data/35_sequences.fasta -output corr_analysis/aligned_aligned_sequences.fa -refineiters 2

muscle 5.1.linux64 [12f0e2]  32.6Gb RAM, 12 cores
Built Jan 13 2022 23:17:13
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 35 seqs, avg length 29834, max 31491

00:00 5.1Mb  CPU has 12 cores, running 12 threads
[1]    22676 segmentation fault (core dumped)  ./muscle5.1.linux_intel64 -align corr_analysis/data/35_sequences.fasta -outpu

Same issue happens with muscle_v5.0.1428_linux.

Is there any known workaround for the issue?

rcedgar commented 2 years ago

SARS-Cov-2 genomes? I've seen this before, the sequences are too long. I'm not sure exactly why the segfault, the code is designed to fail with an informative error message, but I'm pretty sure it runs out of memory. I don't know a workaround, sorry.

AlexShein commented 2 years ago

Hello! Thank you for your response. Correct, those are genomes of various coronaviruses. What (approximately) memory requirements are there for relatively large alignment?

P.s. I managed to get the alignment using some muscle 3.8 version.

rcedgar commented 2 years ago

Memory scales something like O(N^2 L^2) for N sequences of length L, so this is not a memory-efficient algorithm. IMO well worth it for the improved accuracy. Hopefully I can add some heuristics to reduce memory, something to think about for later versions.

AlexShein commented 2 years ago

Understood, thank you for replying! :) P.s. and huge thanks for the tool itself.

rcedgar / muscle

Segmentation fault issue while running Muscle 5.1.linux_intel64 #29