jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
357 stars 78 forks source link

Illegal division by zero #654

Closed blaizereal closed 1 year ago

blaizereal commented 1 year ago

First of all, let me congratulate you on the pipeline, it allows you to work very efficiently, it has practically all the features you need for your daily metagenomics analysis. Yesterday I installed the latest version for testing purposes, but any sample I load into it (including the included example raw files) dies at step 10 with the following error message:

 Reading contig length from /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/intermediate/01.Hadza.lon
Illegal division by zero at /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/10.mapsamples.pl line 444.
Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally

Died at ./SqueezeMeta.pl line 941.

Syslog:

[3 minutes, 6 seconds]: STEP2 -> 02.rnas.pl

[3 minutes, 19 seconds]: STEP3 -> 03.run_prodigal.pl

[3 minutes, 30 seconds]: STEP4 -> 04.rundiamond.pl

[16 minutes, 23 seconds]: STEP5 -> 05.run_hmmer.pl

[23 minutes, 28 seconds]: STEP6 -> 06.lca.pl

[24 minutes, 58 seconds]: STEP7 -> 07.fun3assign.pl

[24 minutes, 58 seconds]: STEP9 -> 09.summarycontigs3.pl

[25 minutes, 9 seconds]: STEP10 -> 10.mapsamples.pl
Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally

The issue occurs with sequential and coassembly modes, too. The pipe is installed perfectly fine according to the test script, it works nicely until the mentioned step (Illumina readsets, default Trimmomatic filtering). OS: Ubuntu 22.04.2 LTS, on dual Epyc CPUs (192 cores), 1Tb RAM.

Thanks in advance for any help: Blaize

fpusan commented 1 year ago

How many cores are you using in the analysis? In case you are telling SqueezeMeta to use the 192 (!) cores, try running it with only 12.

fpusan commented 1 year ago

Also can you send us the syslog file inside the project's directory?

blaizereal commented 1 year ago

How many cores are you using in the analysis? In case you are telling SqueezeMeta to use the 192 (!) cores, try running it with only 12.

32 cores was set.

blaizereal commented 1 year ago

Also can you send us the syslog file inside the project's directory?

I will send it ASAP, thank you.

fpusan commented 1 year ago

Also what version were you using? I think there was a bug with trimmomatic that we fixed not so long ago.

blaizereal commented 1 year ago

I try to run the latest version. Syslog attached (from default sample files project), now. Thank you! syslog.zip

jtamames commented 1 year ago

Hello! What happens if you run: /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/bin/bowtie2/bowtie2 -x /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/Hadza.bowtie -1 /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/temp/Hadza.SRR1927149.current_1.gz -2 /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/temp/Hadza.SRR1927149.current_2.gz --quiet -p 32 -S /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/bam/Hadza.SRR1927149.sam --very-sensitive-local

blaizereal commented 1 year ago

The bowtie2 completes the task without any errors and creates the Hadza.SRR1927149.sam file!

jtamames commented 1 year ago

Hum... and what happens if you restart now the project? SqueezeMeta.pl -p Hadza --restart -step 10

blaizereal commented 1 year ago

Here is the details: (I've already tried it without any success, but give it a shot again) :(

`` ./SqueezeMeta.pl -p Hadza --restart -step 10


SqueezeMeta v1.6.2, March 2023 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Mon Mar 27 12:57:21 2023 in coassembly mode 2 metagenomes found: SRR1927149 SRR1929485

[0 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl Reading samples from /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/00.Hadza.samples Metagenomes found: 2 Mapping with Bowtie2 (Langmead and Salzberg 2012, Nat Methods 9(4), 357-9) Creating reference from contigs Working with sample 1: SRR1927149 Getting raw reads Aligning to reference with bowtie [bam_sort_core] merging from 0 files and 32 in-memory blocks... Calculating contig coverage Reading contig length from /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/intermediate/01.Hadza.lon Illegal division by zero at /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/10.mapsamples.pl line 444. Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally

If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/syslog file (zip it first) Died at ./SqueezeMeta.pl line 941. ``

fpusan commented 1 year ago

Do you have a Hadza.SRR1927149.bam file? what is its content? (you can check with samtools view)

blaizereal commented 1 year ago

Yes, I do. Looking more closely at the BAM file, I see that there are 17666 megahit contigs out of 238563 that Bowtie2 could not remap any reads, hence the division by zero error. I have attached the detailed statistics. Hadza.SRR1927149.BAM.stat.zip

fpusan commented 1 year ago

But I think having 0 reads mapped in a few contigs shouldn't trigger that error. It rather looks like there are no reads mapped at all for the whole sample (or the script is somehow not finding them / parsing the files properly).

blaizereal commented 1 year ago

I think this 17666 contigs with 0 remapping coverage is a bit much, but you may be right, it must be a script/parser error, because any custom samples (Iseq, Nextseq etc.) I give as input always gives the same error. :(

blaizereal commented 1 year ago

I think this 17666 contigs with 0 remapping coverage is a bit much, but you may be right, it must be a script/parser error, because any custom samples (Iseq, Nextseq etc.) I give as input always gives the same error. :(

jtamames commented 1 year ago

I agree, that should not trigger an error. Indeed it is usual that minION contigs do not recruit any reads. Please check the SAM file, do you see any mapped reads? Just paste the result of: tail /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/bam/Hadza.SRR1927149.sam

fpusan commented 1 year ago

The thing is that this script works for the test dataset (and others) in my computer (and others). So it must be somehow related to your platform. To be sure, can you share the whole project directory with us? I will make some time to run it in my Ubuntu 20 and also a 22.04 VM.

jtamames commented 1 year ago

I agree, that should not trigger an error. Indeed it is usual that minION contigs do not recruit any reads. Please check the SAM file, do you see any mapped reads? Just paste the result of: tail /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/bam/Hadza.SRR1927149.sam

blaizereal commented 1 year ago

Checking SAM gives exactly the same stat (not surprisingly) just like BAM.

I agree, that should not trigger an error. Indeed it is usual that minION contigs do not recruit any reads. Please check the SAM file, do you see any mapped reads? Just paste the result of: tail /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/bam/Hadza.SRR1927149.sam

tail /media/ngs_lab/6.4Tb-NVRAM/SqueezeMeta-1.6.2/scripts/Hadza/data/bam/Hadza.SRR1927149.sam SRR1927149.14834740 83 megahit_20485 20095 44 100M = 19938 -257 CGACACAGAAGAAGGCTCTACTATTTATGGATATCTGAAAGAAAAGGTTAACAACGACGCTTACTACATGGTTGGTAGCGACAATAGCGACGGAACTCTC FGDGDGGHEGCGGEGGEHGEHGIGHGIHGHIGGEIGIHHIHIIIIHHHIIHIIHIIHIIIHIHIIHIIIIIIIIIIII>IIGIIIIIIIIIIIIIIHIII AS:i:200 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:200 YT:Z:CP SRR1927149.14834740 163 megahit_20485 19938 44 100M = 20095 257 TATGGATGTCGGGCAAGACTTTTGCACCAGATAATGCTGCGCAAAATGCTGCAGACGGCAATATAAACCCTTGTTTTGCAAATACTAAAGTTGTGCTACA IIIIIIIIIIIIIGIIGEGHIIIHIGHIHHIHIIIIIHIIIHIIIIHIHHIHGIEHGI@DEEEHHHHEHFEHHBEIHHEDFFHEDHEEBE@BBBEBDCEB AS:i:200 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:200 YT:Z:CP SRR1927149.14834741 99 megahit_147342 9557 44 100M = 9862 396 GGAGTCGGAAACACGTATTATCTCATCATAAAAAAGCGGCAGTGATTCTTCTTCGTTATAGCAGGGGACAACAACTGAAATAAGCTCCATTTTATCAATA HGGHHGGGDGDGGGGG8EGGHHHDHEHHHHHEHHHHHHG>GGCGGA?CCCE8>CC@GDDDDGDAB>GDGGEGEGEEHFFEBFFC8@ADDBEECECHEGEG AS:i:200 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:182 YT:Z:CP SRR1927149.14834741 147 megahit_147342 9862 44 91M = 9557 -396 TAGTGTAAGCACCTCGTGTTTTGACTTCCTTTACGTTGCTTACGGTCTTGTCATAGATATCCTCCTGAGTTGCGACAGCGATAACGGGCAT ?8?::86018'7;E>DEDBBDD,DFCEEGDEEBBBEEGDG?AA>CA8EEDE>EGDEDBE@GGGDBDAG@G@EAEB8HDIGIGIIHHIII AS:i:182 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:91 YS:i:200 YT:Z:CP SRR1927149.14834742 77 0 0 0 0 CGATAAGGCCGAAAGTTTCGTCCATATCCTTGCCGGTGCCGTAAATGCCGTGGTTCGTCCATACGACAAGACGATGTTCTTTCATTTTTTCGGCGGTAGC HHHHHHHEHBGDGGGHHHEHHDGGHHHDGHB@HGHDDEBEF=EBEEFEEE>GE8AEECECFHHHHHHDEGEGDEB71;17>?=>?EBEBD=,>@>D=<BD YT:Z:UP SRR1927149.14834742 141 0 0 0 0 GGGAAAGTTCACTTCCGAACTGCCTGCGCATCTTATGACGCACACCGTCCGCCTTGCTCAGGATCCCGATCACAGAATCGTTATGCACACTCA GBHIHHBIIGIGIBIIIHDGIIBFIIIIIGIIGG@DGBGEGGEBB>BBB@@+EBECCA3CA@<FEFIHDDECEBC;C@<83);26;?@+?A YT:Z:UP SRR1927149.14834743 77 0 0 0 0 AATTGACAGTCCTCTAGCGAGTACTGCTTGGACGAAAAGTGGTCCATGATAATCGGCCACCTCAGGTCAAATCTACACATCTGAGATTGTGATGCTTTCG E?>EGEGGGGGGGBGGEG88B?DB?>A1>BE3BE?EEA5D???B<BB???>>B>FD2D>?.?>>9.<:9<88;?AA#G?EGDEDBEG@E=G@GGGGGG YT:Z:UP SRR1927149.14834743 141 0 0 0 0 ATTCTGCGATCTACGTATACTGCTGTTGTGCCCAATAACTGGACGAAGGAGCAAACGGACGAGTTGTTCGAGCTTTGT BIIDBIHBBIIFIGGBIIDG@GGGE>EB>A2CC?CCI@FE@EEEG>EFCE88@?:*??;??CEBBE?@AD4B>@8@>@ YT:Z:UP SRR1927149.14834744 83 megahit_204204 387 44 100M = 167 -320 GCCTCCTTGCTGTGCGCGAGTTCCATCATCTTCTTCTCGATAGCAGGGAAATCCGCATCAGTAAGCTGTCTGTCACCAAGGTCAATGTCATAATAGAATC @ADDDBA>BE@E>AABEEBEB@DBBBE@BEB>EBBHBHFFEECECHFEHHHFHHBHEHHHHHHHHGGGFGGGGEBHHHHHGGGGGBGEGDGHHHHHHHHH AS:i:200 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:200 YT:Z:CP SRR1927149.14834744 163 megahit_204204 167 44 100M = 387 320 ACCTCTCCAGTATGCGCCTGCAAGTGAAGTCAGTTTGACAGCCTTTATCTGCTCAGCGCTTCTGAGGTGAGGTCCGCGGCACAAATCGGTGAAGTTGCCG BIIGIDGIGIGHIIEGDIDGEIIIGIIDHHIHHGGGGGIHGIIGHDIIIIHG>IBHIIIDFHIIHHH@C8>A<@CCAAEFBDB@5A9@8:907A=@@@BE AS:i:200 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:200 YT:Z:CP

jtamames commented 1 year ago

Ok, let's do a test. Edit the 10.mapsamples.pl in the scripts directory (if you don know the path to this script, try conda info --envs to find out the path to your conda installation. The script should be in the scripts directory). Around line 420 you will see the following:

    open infile4,"samtools view $bamfile |" || die "Can't open bam file $bamfile\n";
    while(<infile4>) {
        chomp;
        my @t=split(/\t/,$_);
        next if($_=~/^\@/); # not really needed anymore since we don't read the headers with samtools view

Add the following line just after: print ">$_\n"; And then, in line 439, where it reads: $totalreadcount++; Change it for: $totalreadcount++; print "*$_\n";

And see if that produces any output on screen. Best, J

blaizereal commented 1 year ago

This problem has finally been solved! I didn't actually do anything, I just pulled some older real metagenome samples and they were successfully processed by the pipeline. By the way the result is not perfect, because there were problems with the 14-18 binning steps but I'll open a new topic for that. Thank you for your help!

fpusan commented 1 year ago

Just confirmed that the pipeline works ok in a clean Ubuntu 22.04.2 VM, with and without trimmomatic cleaning