rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Memory Object >4Gb error #60

Closed czhou135 closed 1 year ago

czhou135 commented 1 year ago

Hi, I'm not very familiar with super5, but I ran an alignment with 12k sequences w ~900aa on average per sequence. I gave the program 20G of memory to work with. I got the following error:

Elapsed time 10:14:33 Max memory 6.1Gb

---Fatal error--- Memory object >4Gb, probably due to long sequences

Does this error mean that super5 is unable to run an alignment on this dataset or is there a fix? Should I switch to a different alignment software? Thanks!

rcedgar commented 1 year ago

Note "...probably due to long sequences "

What is the maximum length? R

On Wed, Jun 14, 2023 at 2:29 PM czhou135 @.***> wrote:

Hi, I'm not very familiar with super5, but I ran an alignment with 12k sequences w ~900aa on average per sequence. I gave the program 20G of memory to work with. I got the following error:

Elapsed time 10:14:33 Max memory 6.1Gb

---Fatal error--- Memory object >4Gb, probably due to long sequences

Does this error mean that super5 is unable to run an alignment on this dataset or is there a fix? Should I switch to a different alignment software? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/rcedgar/muscle/issues/60, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4UI7GFO2VJ6AOF23P66H3XLIUKNANCNFSM6AAAAAAZG6IJGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

czhou135 commented 1 year ago

The maximum length was 2868 aa, would that be too long?

rcedgar commented 1 year ago

Could be, I can check if you post the sequences. What are the sequences? Can't you trim them to a globally alignable region? The length variation is pretty big to get a good MSA from any algorithm.

czhou135 commented 1 year ago

These are representative tRNA synthetases from across the tree of life, but I need the alignment to construct a phylogenetic tree. sequences.txt

rcedgar commented 1 year ago

The memory allocation fails because the alignment exceeds 50,000 columns at an intermediate step. The final alignment would probably have far more columns. These sequences are too far diverged to make a reasonable alignment, there is no chance of building a good tree from them.

czhou135 commented 1 year ago

Thanks for the responses, just wanted to update we've narrowed the study down to a specific domain with less length variation and muscle works for us now!