t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
39 stars 23 forks source link

alleyoop dedup, samtools failed to create index because reads are not sorted #142

Closed algaebrown closed 9 months ago

algaebrown commented 9 months ago

Hi t-neumann,

Thanks for creating this awesome tool.

I was trying to run alleyoop after slamdunk all.

command:

# How I run slamdunk all
REF_FA=/tscc/nfs/home/hsher/gencode_coords/GRCh38.primary_assembly.genome.fa
singularity exec --bind /tscc \
slamdunk_0_4_3.sif \
slamdunk all -r  $REF_FA\
    -b SlamSeq_3UTR.bed \
    -o . \
    -t 64 \
    -5 12 \
    -n 100 \
    -m \
    -rl 100 SlamSeqManifest.csv

# How I run allyoop
for f in map/*mapped.bam
do
singularity exec --bind /tscc \
slamdunk_0_4_3.sif \
alleyoop dedup -o map/ \
-t 24 \
$f
done

environment: docker://tobneu/slamdunk:v0.4.3

error:

Running alleyoop dedup for 1 files (24 threads)
[E::hts_idx_push] NO_COOR reads not in a single block at the end 2 -1
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 600, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/alleyoop.py", line 90, in runDedup
    deduplicator.Dedup(bam, outputBAM, tcMutations, log)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/deduplicator.py", line 91, in Dedup
    pysamIndex(outputBAM)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 214, in pysamIndex
    pysam.index(outputBam)  # @UndefinedVariable
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "map/Ngn2-8-3_S6_L001_R1_001.fastq_slamdunk_mapped_dedup.bam"\n'
"""

I noticed not all bams fail though. Some failed, Some didn't. NO_COOR reads not in a single block at the end 2 -1 suggest dedup.bam is not sorted.

Please help me understand why it is.

Thank you

algaebrown commented 9 months ago

When I try to manually sort and index, it seems some files are truncated: command:

module load samtools
for f in map/*_mapped_dedup.bam
do
samtools sort $f -o ${f%.bam}.sorted.bam
samtools index ${f%.bam}.sorted.bam
done
module unload samtools

error:

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read_block] Failed to read BGZF block data at offset 32497415 expected 16344 bytes; hread returned 8423
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Maybe this suggests too many threads going out of mem in alleypoop dedup??

algaebrown commented 9 months ago

reducing number threads and running with sorted bams was able to fix it. Thanks