maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
378 stars 85 forks source link

Error with DNA mapping workflow: [main_samview] fail to read the header from "-" #886

Closed chondammab closed 1 year ago

chondammab commented 1 year ago

Hello everyone,

I'm using snakepipes to analyse ChIP-seq data. However, I have an error in the DNA mapping pipeline which I'm unable to rectify. For DNA mapping, I have paired end fastq files with the extension 'R1_fastq.gz' and 'R2_fastq.gz'. I also have a GRCm38 mm10 indexed genome. The script I use is as follows.

DNA-mapping -i /home/cbollachettira/fastq -o /home/cbollachettira/data/test --mapq 30 --j 4 --dedup GRCm38_gencode_release19

I get the following error message in the Bowtie2.err file.

** rule Bowtie2: input: FASTQ/CB1_001_R1.fastq.gz, FASTQ/CB1_001_R2.fastq.gz output: Bowtie2/CB1_001.Bowtie2_summary.txt, Bowtie2/CB1_001.sorted.bam log: Bowtie2/logs/CB1_001.sort.log jobid: 0 benchmark: Bowtie2/.benchmark/Bowtie2.CB1_001.benchmark reason: Missing output files: Bowtie2/.benchmark/Bowtie2.CB1_001.benchmark, Bowtie2/CB1_001.sorted.bam, Bowtie2/CB1_001.Bowtie2_summary.txt wildcards: sample=CB1_001 threads: 24 resources: mem_mb=2730, disk_mb=2730, tmpdir=/home/cbollachettira/temp

Activating conda environment: ../../miniconda3/envs/e60637fa9c311b97b1a1dd66f7de1b98 [main_samview] fail to read the header from "-".

Error in rule Bowtie2: jobid: 0 input: FASTQ/CB1_001_R1.fastq.gz, FASTQ/CB1_001_R2.fastq.gz output: Bowtie2/CB1_001.Bowtie2_summary.txt, Bowtie2/CB1_001.sorted.bam log: Bowtie2/logs/CB1_001.sort.log (check log file(s) for error message) conda-env: /home/cbollachettira/miniconda3/envs/e60637fa9c311b97b1a1dd66f7de1b98 shell:

        TMPDIR=/home/cbollachettira/temp/
        MYTEMP=$(mktemp -d ${TMPDIR:-/tmp}/snakepipes.XXXXXXXXXX);
        bowtie2             -X 1000             -x /data/processing2/sikora/GRCm38_gencode_release19/BowtieIndex/genome -1 FASTQ/CB1_001_R1.fastq.gz -2 FASTQ/CB1_001_R2.fastq.gz               --fr             --rg-id CB1_001             --rg DS:CB1_001 --rg PL:ILLUMINA --rg SM:CB1_001             -p 24             2> Bowtie2/CB1_001.Bowtie2_summary.txt |             samtools view -Sb - |             samtools sort -m 2G -T $MYTEMP/CB1_001 -@ 2 -O bam - > Bowtie2/CB1_001.sorted.bam 2> Bowtie2/logs/CB1_001.sort.log;
        rm -rf $MYTEMP

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job Bowtie2 since they might be corrupted: Bowtie2/CB1_001.Bowtie2_summary.txt, Bowtie2/CB1_001.sorted.bam Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message

**

Could you please let me know if you have any suggestions to fix this error?

Thanks a lot!

Best, Chondamma

katsikora commented 1 year ago

Hi Chondamma,

this error means that bowtie mapping has failed. This can have various reasons, e.g. when the fastq files are empty. Can you e.g. zcat FASTQ/CB1_001_R1.fastq.gz | head and see if there are indeed sequences in there? And the same for the other read file?

Best regards, Katarzyna

katsikora commented 1 year ago

Closing due to no activity. Feel free to reopen, if your issues persists and you still need help with it.

Best,

Katarzyna

jtxyz16 commented 1 year ago

Hi @katsikora , I encountered the same problem and I checked that there are sequences in my fastq files . Can you guide me on the next steps..

Here is my specific error message:


[Mon Jun 26 16:55:47 2023]
rule Bowtie2:
    input: FASTQ/230614Gal_D23-8138_1_sequence.fastq.gz, FASTQ/230614Gal_D23-8138_2_sequence.fastq.gz
    output: Bowtie2/230614Gal_D23-8138.Bowtie2_summary.txt, Bowtie2/230614Gal_D23-8138.sorted.bam
    log: Bowtie2/logs/230614Gal_D23-8138.sort.log
    jobid: 0
    benchmark: Bowtie2/.benchmark/Bowtie2.230614Gal_D23-8138.benchmark
    reason: Missing output files: Bowtie2/230614Gal_D23-8138.Bowtie2_summary.txt, Bowtie2/.benchmark/Bowtie2.230614Gal_D23-8138.benchmark, Bowtie2/230614Gal_D23-8138.sorted.bam
    wildcards: sample=230614Gal_D23-8138
    threads: 24
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/state/partition1/slurm_tmp/23231156.4294967291.0

            TMPDIR=/data1/groups/galloway/data/NGS_DATA/ATAC/tmp
            MYTEMP=$(mktemp -d ${TMPDIR:-/tmp}/snakepipes.XXXXXXXXXX);
            bowtie2             -X 1000             -x /data/repository/organisms/GRCm38_ensembl/BowtieIndex/genome -1 FASTQ/230614Gal_D23-8138_1_sequence.fastq.gz -2 FASTQ/230614Gal_D23-8138_2_sequence.fastq.gz               --fr             --rg-id 230614Gal_D23-8138             --rg DS:230614Gal_D23-8138 --rg PL:ILLUMINA --rg SM:230614Gal_D23-8138             -p 24             2> Bowtie2/230614Gal_D23-8138.Bowtie2_summary.txt |             samtools view -Sb - |             samtools sort -m 2G -T $MYTEMP/230614Gal_D23-8138 -@ 2 -O bam - > Bowtie2/230614Gal_D23-8138.sorted.bam 2> Bowtie2/logs/230614Gal_D23-8138.sort.log;
            rm -rf $MYTEMP

Activating conda environment: ../../../../conda_envs/snakePipes_envs/5e5fac085be1c53aa94048b787f7dfb2
[main_samview] fail to read the header from "-".