Closed AshSies closed 2 years ago
I also have this same issue with the same error flag. Large bacterial contigs, about 5.4mb each. All good up until the MSA merging step.
@kclambi1 asked me to look at this. Root cause appears to be muscle
crashing on some chunks. Is muscle
running out of memory?
$ muscle -align 000000000000.fasta -output test_msa.fasta
muscle 5.1.linux64 [] 8.2Gb RAM, 6 cores
Built Feb 24 2022 03:16:15
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com
Input: 3 seqs, avg length 151666, max 231004
00:01 8.1Mb CPU has 6 cores, running 6 threads
00:01 20.8Gb 100.0% Calc posteriors
Segmentation fault
$ echo $?
139
I believe I am also getting this error:
File "/home/user/Data/miniconda3/envs/trycycler/bin/trycycler", line 10, in
@kclambi1 @AshSies @sgblanch were any of you able to resolve this issue?
FYI Downgrading Muscle from 5.1 to 3.8 resolved this issue 👍
Using Muscle 3.8 also resolved the issue on my end.
This is strange, as version 0.5.2 added support for muscle versions 3.x and 5.x.
Muscle crash when has too long sequence to align and run out of memory. Different between version 3 and 5 is that muscle 5 create empty file with msa even when crash. Trycycler checking this step only by looking for lacks of msa files so when you use version 5 everything looks fine. It should chcek also if files is empty . If you want avoid running out of memmory try change --lookahead
or --step
to smaller value, this should affect to max piece size in partitioning step and therefore on memmory usage by muscle.
https://github.com/rrwick/Trycycler/blob/9cc62a521e14a264ae8397277e2f8b09c2988c66/trycycler/msa.py#L121-L131
Thanks to everyone here for the investigation!
I've just pushed (42be6d7) a small fix to the problem that @abomba pointed out, so Trycycler will now recognise empty MUSCLE files as being problematic. This should at least result in better error messages.
I've also made a note on the Software Requirements page of the wiki to say that MUSCLE v3 is preferred.
@marade in #42 pointed out that Muscle v5 defaults to lots of threads, and Trycycler runs many instances of Muscle, so this may explain the out-of-memory issues. I've just pushed a fix (886fdd5) which limits Muscle v5 to one thread per instance to help with this, but I still recommend Muscle v3 for speed.
Hi there,
I am running Trycycler 0.5.3, and near the end of the msa step in the pipeline, I am getting a KeyError thrown with certain clusters. In the environment I have set up, the msa makes use of MUSCLE v5.1. The traceback directs me to line 175 in msa.py:
_line 175, in merge_pieces aligned_seq_parts[n].append(parts[n].upper()) KeyError: 'A_contig1'
This has only been an issue with clusters including larger (> 2 Mb) contigs. Other clusters with smaller tigs, produced from the same 'trycycler cluster' step and assemblers, have worked just fine.
Any input would be appreciated - thank you for your time!