jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
348 stars 81 forks source link

restart at step 10 question #703

Closed bresyd closed 1 year ago

bresyd commented 1 year ago

Hi, thanks a lot for the latest SQM release, so far it has been working really well.

I am currently running a SqueezeMeta job using one large coassembly and around 30 samples for the mapping. During step 10, the mapping and counting for all but the last sample worked without issue. For my last samples, the read files are much larger compared to all other samples (~210Mb for each of the paired-end fq.gz files compared to ~5Mb for all other samples). The last lines in the syslog files are these:

Aligning with bowtie: /home/btschits/miniconda3/envs/SqueezeMeta_1.6.2post1/SqueezeMeta/bin/bowtie2/bowtie2 -x /scratch/btschits/sqm_1.6/abc_reseq/data/abc_reseq.b
owtie  -1 /scratch/btschits/sqm_1.6/abc_reseq/temp/abc_reseq.5760.current_1.gz -2 /scratch/btschits/sqm_1.6/abc_reseq/temp/abc_reseq.5760.current_2.gz --quie
t -p 60 -S /scratch/btschits/sqm_1.6/abc_reseq/data/bam/abc_reseq.5760.sam --very-sensitive-local
Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally

During the mapping, a sam file was created and growing in size, but when I checked this morning, the sam file was gone. Does that indicate that the mapping actually finished and that the error occurred afterwards during samtools? I saw the line in the mapping script that removes the sam file after conversion.

I am currently repeating the mapping of this last sample using the same command as above. In case the my error was related to samtools, I would maybe try and repeat the samtool indexing with less cpus (I run my project with 60cpus and I had similar issues in the past with samtools when I used too many threads)

In case I manage to get an indexed bam file, how can I continue my SQM run? Can I just restart at step 10 and SQM will recognize that the mapping and counting was already done for all previous samples and that there is a bam file for the last sample and continue with the counting for the last sample?

All the best

fpusan commented 1 year ago

It is a bit weird. If you look at L188 and below samtools is run right before removing the sam file, but if samtools fails then you should have gotten a message (I guess via stderr) saying Error running samtools. Did you get such message?

bresyd commented 1 year ago

This is what I got printed onto the terminal

  Working with sample 37: 5760
  Getting raw reads
  Aligning to reference with bowtie
[bam_sort_core] merging from 1020 files and 60 in-memory blocks...
[E::hts_open_format] Failed to open file /scratch/btschits/sqm_1.6/abc_reseq/data/bam/abc_reseq.5760.bam
samtools sort: failed to create "/scratch/btschits/sqm_1.6/abc_reseq/data/bam/abc_reseq.5760.bam": Too many open files
  Calculating contig coverage
  Reading contig length from /scratch/btschits/sqm_1.6/abc_reseq/intermediate/01.abc_reseq.lon
[E::hts_open_format] Failed to open file "/scratch/btschits/sqm_1.6/abc_reseq/data/bam/abc_reseq.5760.bam" : No such file or directory
samtools view: failed to open "/scratch/btschits/sqm_1.6/abc_reseq/data/bam/abc_reseq.5760.bam" for reading: No such file or directory
Illegal division by zero at /home/btschits/miniconda3/envs/SqueezeMeta_1.6.2post1/SqueezeMeta/scripts/10.mapsamples.pl line 444.
Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally
fpusan commented 1 year ago

samtools sort: failed to create "/scratch/btschits/sqm_1.6/abc_reseq/data/bam/abc_reseq.5760.bam": Too many open files That one is new. What is the output of cat /proc/sys/fs/file-max ?

bresyd commented 1 year ago
cat /proc/sys/fs/file-max
315986072
bresyd commented 1 year ago

I also just realized that I already have a sorted and indexed bam file for this particular sample that I generated a while ago using bbmap. Relating to the last part of my original comment, can I just move that .bam and .bam.bai files into the data/bam/ directory and restart at step 10? Will SQM then repeat the counting for all the previous samples?

fpusan commented 1 year ago

It won't repeat the mapping, but it will repeat the counting. Regarding the issue, it can be related to the high number of threads. There are several suggested solutions in here https://www.biostars.org/p/496790/

bresyd commented 1 year ago

yes, the high number of threads was also my prime suspect here, thanks for the link. if the issue was really with samtools and not bowtie, it is a bit unfortunate that the sam file got removed even though the conversion to bam did not work. thank you again for your help

fpusan commented 1 year ago

Your are right, I just commited cc32a96 so that the SAM is not removed unless conversion to BAM has worked.