steineggerlab / Metabuli

Metabuli: specific and sensitive metagenomic classification via joint analysis of DNA and amino acid.
GNU General Public License v3.0
118 stars 10 forks source link

More segmentation faults #15

Closed sean-workman closed 1 year ago

sean-workman commented 1 year ago

Hi there,

As with #10 I am experiencing segmentation faults at the stage of "Extracting query metamers ...".

I am getting these errors whether I build from source with:

git clone https://github.com/steineggerlab/Metabuli.git 
module load gcc cmake (I am on a Digital Research Alliance of Canada cluster, I need to load these modules to build)
cd Metabuli
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release .. -DCMAKE_INSTALL_PREFIX=/home/sdwork/software/metabuli
make -j4
make install

I ran the command:

metabuli classify OCH16_1_val_1.fq OCH16_2_val_2.fq /home/sdwork/scratch/metagenomics/gtdb fq_och16 fq_och16 --threads 32

When I tried looking at the core dump with gdb I saw:

`Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00000000004568d3 in SeqIterator::fillQueryKmerBuffer(char const*, int, QueryKmerBuffer&, unsigned long&, unsigned int, unsigned int) ()`

I tried just using a pre-compiled binary on the cluster and saw the same error.

I tried downloading/installing using conda on one of our local machines and I encounter the exact same problem. I tried changing the permissions as was suggested in #10 and I see the same issues. I downloaded GTDB database locally with:

metabuli databases GTDB207 gtdb tmp

and am trying to run the command:

metabuli classify OCH16_1.fq OCH16_2.fq gtdb och16_out och16 --threads 14 --max-ram 50

The output I see is:

Number of threads: 14
Query file 1: OCH16_1.fq
Query file 2: OCH16_2.fq
Database directory: gtdb
Output directory: och16_out
Job ID: och16
Loading nodes file ... Done, got 406311 nodes
Loading merged file ... Done, added 0 merged nodes.
Loading names file ... Done
Init RMQ ...Done
The rest RAM: 51808043008
Indexing query file ...Done
Total number of sequences: 75074832
Total read length: 22393292176nt
Extracting query metamers ...
Segmentation fault (core dumped)
jaebeom-kim commented 1 year ago

Thank you for providing great details. We have found an error related to how query files are ended, and it showed similar logs. Could you send us the results of tail -n 8 OCH16_1.fq > OCH16_1_8.fq and tail -n 8 OCH16_2.fq > OCH16_2_8.fq? Then, we will check if you are facing the same error.

sean-workman commented 1 year ago

Please find the output of the two tail commands below:

tail -n 8 OCH16_1.fq > OCH16_1_8.fq

@NOVASEQ1:462:HJ2CNDSX5:4:2678:30083:37059 1:N:0:CGGTTACG+CTATAGTC TTCCCAAGCAGACTAAGCAGAAAAGAGACAGAGAGCCAAGAGAGGAAGAGGGCATAAATTACCAATATCAGAAATGAAAGGGACATTCCTACAGATCCTACAGATATTAAGCGGGTAACAAAGCACTATAAGGAACTGAATGCCAAG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFF @NOVASEQ1:462:HJ2CNDSX5:4:2678:31584:37059 1:N:0:CGGTTACG+CTATAGTC GGGTATAGGCAAATGAGAAACAGTGCTCTGTTATAGTTACTAGGTATTAAAAATAAACTTGACCAAGGCTAACGCTGTCTCTTATACACATCTCCGAGCCCACGAGACCGGTTACGGCAACGCGTATGCCGGCGTCGGCTGGAAAAGGGG + FFFFF:F:FF:FF,,F:FFFF,FF:F:F,:,FF,F,FFF::F,FFF:FFFFFF,FFF:,F,FFFFF,:FFFF,:FFFF:,F:FFFF:FFF,,,:F,F:FFF,F,F,FFF:,FFFF,FF:,F,,F,FF,,F:,,,F,:,F,,,,,,:,,FF


tail -n 8 OCH16_2.fq > OCH16_2_8.fq

@NOVASEQ1:462:HJ2CNDSX5:4:2678:30083:37059 2:N:0:CGGTTACG+CTATAGTC TCTTTGTATGTCAGTTTTGGTAGCTTGTGTTTGTGAAAAATTTGTCTGTTTCATCTACATTTTCTCTTGGCATTCAGTTCCTTATAGTGCTTTGTTACCCGCTTAATATCTGTAGGATCTGTAGGAATGTCCCTTTCATTTCTGATATTG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFF @NOVASEQ1:462:HJ2CNDSX5:4:2678:31584:37059 2:N:0:CGGTTACG+CTATAGTC CGTTAGCCTTGGTCAAGTTTATTTTTAATACCTAGTAACTATAACAGAGCACAGTTTCTCATTTGCCTATACCCCCGTCTCTTATACACATCTGACGCTGCCGACGACTATAGTCTTGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFF:

I have a set of files from the same run that have all behaved the same way, but this one is still fairly large. I am decompressing the smallest pair right now to ensure that I get the same behaviour and could likely share that. I just want to double check with my PI before I share complete sets of raw unpublished data around the globe. :)

jaebeom-kim commented 1 year ago

Thanks! Could you attach the OCH16_1_8.fq and OCH16_2_8.fq here? I want to check if EOF is located right after the last character.

sean-workman commented 1 year ago

Ah sorry about that! Here they are.

I've gotten the go ahead to share the input files as well - what would be easiest to get them to you? R1/R2 are about 5GB each.

OCH16_2_8.fq.txt OCH16_1_8.fq.txt

jaebeom-kim commented 1 year ago

Thank you! I have checked the two files, and the files are ended properly.

For sharing the files, we can try anyway you are familiar with. However, before sharing the whole file, if you can reproduce the same error with small subsets, I think you can just upload here. So, could you run head -n 80000 OCH16_1.fq > OCH16_1_80000.fq and head -n 80000 OCH16_2.fq > OCH16_2_80000.fq and test with the two small query files?

sean-workman commented 1 year ago

Here they are! I hope this helps. Please let me know if there is anything else I can do on my end.

OCH16_1_80000.fq.txt

OCH16_2_80000.fq.txt

jaebeom-kim commented 1 year ago

Thanks again! I was able to reproduce the SegFault. Let me inspect the error during the weekend. I think you provided all the things I need to solve the problem! So, please just wait for me :)

sean-workman commented 1 year ago

Great to hear! Good luck, I look forward to trying out Metabuli once the bugs are fixed. :)

jaebeom-kim commented 1 year ago

With your help, I was able to find the problem! A very short query sequence was causing problems. There are sequences of the length of about 20 in file OCH16_2_80000.fq.txt (one case in line 72346), and I found a problem with the function that filters out such cases. I'll fix it soon and post an updated version. Thank you again :)

jaebeom-kim commented 1 year ago

I think I solved the issue related to reads that are too short to perform a six-frame translation. Please compile the latest Metabuli and test it.

sean-workman commented 1 year ago

Hi there,

I am running this now (did not get a chance last week) and I am indeed seeing overflow!!! in the printed log. I can't imagine an overflow is good thing, but it sounds expected at least! I had to restart the run because my allocated resources on the cluster I'm using were going to expire before the job finished, but it is going again now and I will keep you informed about how it goes! :)

sean-workman commented 1 year ago

I'm now running with some reads that were not adapter trimmed and I see no overflow!!!, which I think is the expected behaviour.

I am wondering if the alignment used in Metabuli is sensitive to adapter content or not?

jaebeom-kim commented 1 year ago

Metabuli extracts k-mers from the whole region of query reads. So, k-mers from the adaptors are also extracted and compared to reference k-mers. If matches are found between them, the adaptor sequence region can affect the classification. Thus, it is recommended to trim your sequences before running Metabuli :)

About the overflow!!! The overflow!!! signal arises when there are too many matches between query and reference metamers. And such a situation occurs due to low-complexity sequences. I checked OCH16_1_80000.fq.txt and found that some reads have very low complexities.

Here are some examples,
@NOVASEQ1:462:HJ2CNDSX5:1:1101:31656:35180 1:N:0:CGGTTACG+CTATAGTC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFF,FFF::FF:F::F,FFF,FFFFFFFFFF,::F:F,:::F:F

@NOVASEQ1:462:HJ2CNDSX5:1:1101:20356:2550 1:N:0:CGGTTACG+CTATAGTC
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGTTGTGG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF::FFFFF,::F:FFFFFF::F::,,FFFFFF,F:,F,,:,,F::,,,:,,,,,,:,,,,,,,,,,,F

@NOVASEQ1:462:HJ2CNDSX5:1:1101:24849:35227 1:N:0:CGGTTACG+CTATAGTC
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTAAAAAAAAAAACACCCCCCCCCCCCGAGAAAAAAAAAAAAGTGTGAAGGAATGGGGTGAAAGAATAGGTGGGGGGGGGGGGGGGGGGGGGGGGG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,,,,,,,,,FFFFFF,,:,:,:::,,,,,,,,,,,,:,,,,,,,,,,,F,,,,,,,,,,,,,,,,,,,,::,,,,,,F:,,FFFFFFFFFFFFFFFFFFF:

I think you can avoid the overflow signal if you remove the low-complexity sequences.

However, Metabuli should be able to handle such cases instead of just giving the overflow!!! massage. So, we will update Metabuli in that way soon.

Let me close this issue because the segmentation fault error is solved, and I will open another issue for the overflow!!!

Thank you so much for testing our tool! It is helping us a lot 👍