yyoshiaki / VIRTUS2

A bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.
Other
16 stars 6 forks source link

Strange samview fatal error in some samples #37

Closed xapple closed 1 month ago

xapple commented 5 months ago

I've been using VIRTUS2 on a couple of datasets, and for one bunch of sample it suddenly stopped working for an unknown reason, and produces many errors such as this one:

  ...
INFO [job star_mapping_pe_human] completed success
INFO [step star_mapping_pe_human] completed success
INFO [workflow ] starting step samtools_view
INFO [step samtools_view] start
INFO ['singularity', 'pull', '--force', '--name', 'yyasumizu_bam_filter_polyx:1.3.sif', 'docker://yyasumizu/bam_filter_polyx:1.3']
INFO:    Using cached SIF image
INFO [job samtools_view] /tmp/s534n_63$ singularity \
    --quiet \
    exec \
    --contain \
    --ipc \
    --cleanenv \
    --userns \
    --home \
    /tmp/s534n_63:/TiwXYE \
    --bind \
    /tmp/c6y9bwed:/tmp \
    --bind \
    /tmp/gv1ovtfx/humanAligned.sortedByCoord.out.bam:/var/lib/cwl/stg8737f712-74a6-4776-8dde-0a685a105743/humanAligned.sortedByCoord.out.bam:ro \
    --bind \
    /tmp/gv1ovtfx/humanLog.final.out:/var/lib/cwl/stg8737f712-74a6-4776-8dde-0a685a105743/humanLog.final.out:ro \
    --bind \
    /tmp/gv1ovtfx/humanSJ.out.tab:/var/lib/cwl/stg8737f712-74a6-4776-8dde-0a685a105743/humanSJ.out.tab:ro \
    --bind \
    /tmp/gv1ovtfx/humanLog.out:/var/lib/cwl/stg8737f712-74a6-4776-8dde-0a685a105743/humanLog.out:ro \
    --pwd \
    /TiwXYE \
    /long_path_in_the_file_system/virtus/yyasumizu_bam_filter_polyx:1.3.sif \
    /bin/sh \
    -c \
    'samtools_view_removemulti.sh'  16  4 /var/lib/cwl/stg8737f712-74a6-4776-8dde-0a685a105743/humanAligned.sortedByCoord.out.bam > /tmp/s534n_63/human.unmapped.b
am
[main_samview] fail to read the header from "-".
WARNING [job samtools_view] exited with status: 1
WARNING [job samtools_view] completed permanentFail
WARNING [step samtools_view] completed permanentFail
INFO [workflow ] completed permanentFail
{
    "Log.out_human": {
    ...

Though from my end I've fed as usual with paired end FASTA files, so hard to say what could be the issue?

yyoshiaki commented 5 months ago

Sorry, I'm not sure how to deal with it. [main_samview] fail to read the header from "-". tells us the problem is in the header in the bam file humanAligned.sortedByCoord.out.bam at least.

sirrgang commented 5 months ago

I am encountering exactly the same problem on half of my samples (produced by the same laboratory).

STAR --runMode alignReads --genomeDir /SryAMm/STAR_index_human --runThreadN 36 --outFileNamePrefix cfRNA-22 --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --readFilesIn /var/lib/cwl/stg12899fac-3788-4100-8 e7b-521a4b61bf62/S8433Nr8_R1.fastp.fastq /var/lib/cwl/stg706f04d3-fb6e-4c29-826c-426a4ef36f65/S8433Nr8_R2.fastp.fastq Jan 26 21:50:29 ..... started STAR run Jan 26 21:50:29 ..... loading genome Jan 26 21:50:39 ..... started mapping Jan 26 22:09:31 ..... finished mapping Jan 26 22:09:32 ..... started sorting BAM Jan 26 22:13:45 ..... finished successfully INFO [job star_mapping_pe_human] Max memory used: 0MiB INFO [job star_mapping_pe_human] completed success INFO [step star_mapping_pe_human] completed success
INFO [workflow ] starting step samtools_view INFO [step samtools_view] start INFO [job samtoolsview] /tmp/7rwe5td$ docker \ run \ -i \ --mount=type=bind,source=/tmp/7rwe5td_,target=/SryAMm \ --mount=type=bind,source=/tmp/qcmxxcuw,target=/tmp \ --mount=type=bind,source=/tmp/6ybi01hv/cfRNA-22Aligned.sortedByCoord.out.bam,target=/var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22Aligned.sortedByCoord.out.bam,readonly \ --mount=type=bind,source=/tmp/6ybi01hv/cfRNA-22Log.final.out,target=/var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22Log.final.out,readonly \ --mount=type=bind,source=/tmp/6ybi01hv/cfRNA-22SJ.out.tab,target=/var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22SJ.out.tab,readonly \ --mount=type=bind,source=/tmp/6ybi01hv/cfRNA-22Log.out,target=/var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22Log.out,readonly \ --workdir=/SryAMm \ --read-only=true \ --log-driver=none \ --user=1000:1000 \ --rm \ --cidfile=/tmp/eov6q01j/20240126231409-615788.cid \ --env=TMPDIR=/tmp \ --env=HOME=/SryAMm \ yyasumizu/bam_filter_polyx:1.3 \ /bin/sh \ -c \ samtools_viewremovemulti.sh 36 4 /var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22Aligned.sortedByCoord.out.bam > /tmp/7rwe5td/human.unmapped.bam [W::bam_hdr_read] EOF marker is absent. The input is probably truncated samtools view: error reading file "/var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22Aligned.sortedByCoord.out.bam" samtools view: error closing "/var/lib/cwl/stg04f5946e-d659-4b42-b4d5-941124e54608/cfRNA-22Aligned.sortedByCoord.out.bam": -1 [main_samview] fail to read the header from "-".

sirrgang commented 5 months ago

The fastq files are too big (9GB and thats causing the bam file to be corrupt, dont know why) - but if you split the files into two it works :)