rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

allocflat.cpp(15) assert failed: uint64(Size) == Size64 #44

Closed NicMAlexandre closed 1 year ago

NicMAlexandre commented 1 year ago

I have installed the most recent version of muscle with conda and from source and have run into the same issue.

I have a couple hundred thousand sequences so will potentially need to run with the -super5 argument. However, when I run with the -super5 algorithm, I get the following error:

muscle 5.1.linux64 [] 132Gb RAM, 24 cores

Input: 374249 seqs, length avg 1935 max 102417

WARNING: Sequence length >5k may require excessive memory 00:13 1.4Gb 100.0% Derep 335593 uniques, 38655 dupes 00:14 1.5Gb CPU has 24 cores, defaulting to 20 threads

muscle -super5 combined_seq.fasta -output super5_iter2.afa Elapsed time 00:14

Max memory 1.5Gb ---Fatal error--- allocflat.cpp(15) assert failed: uint64(Size) == Size64

I should note that I am also running with -align which has not put out any issues.

rcedgar commented 1 year ago
Input: 374249 seqs, length avg 1935 max 102417
WARNING: Sequence length >5k may require excessive memory

102k is too long, this isn't going to work in muscle any time soon. Regardless, with a dataset like this I would look for ways to trim the sequences to a well-defined region before making a multiple alignment, I doubt any MSA algorithm can do a good job with a large dataset with such variable fragment lengths.

NicMAlexandre commented 1 year ago

Hi Robert,

Thank you, I will try and find a solution.

Nicolas

On Fri, Dec 23, 2022 at 10:55 AM Robert Edgar @.***> wrote:

Input: 374249 seqs, length avg 1935 max 102417 WARNING: Sequence length >5k may require excessive memory

102k is too long, this isn't going to work in muscle any time soon. Regardless, with a dataset like this I would look for ways to trim the sequences to a well-defined region before making a multiple alignment, I doubt any MSA algorithm can do a good job with a large dataset with such variable fragment lengths.

— Reply to this email directly, view it on GitHub https://github.com/rcedgar/muscle/issues/44#issuecomment-1364151549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFB633ZATJO3SRAGFQ426ILWOXRPVANCNFSM6AAAAAATH5TIWA . You are receiving this because you authored the thread.Message ID: @.***>

-- Best,

Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley @. @.>