maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
383 stars 85 forks source link

Error in ATAC-seq pipeline: /usr/bin/bash: line 22: printf: MACS2_QC/logs/BL611.MACS2_peak_qc.log: invalid number #862

Closed jurummel closed 1 year ago

jurummel commented 1 year ago

Hi everyone,

I run into an error during the ATAC-seq pipeline.

/home/jrummel/anaconda3/envs/snakePipes/bin/ATAC-seq -d /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping_trim/BL611_filtered --local BL6_108

TMPDIR=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp PYTHONNOUSERSITE=True /home/jrummel/anaconda3/envs/snakePipes/bin/snakemake --use-conda --conda-prefix /home/jrummel/anaconda3/envs --latency-wait 300 --snakefile /home/jrummel/anaconda3/envs/snakePipes/lib/python3.10/site-packages/snakePipes/workflows/ATAC-seq/Snakefile --jobs 5 --directory /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping_trim/BL611_filtered --configfile /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping_trim/BL611_filtered/ATAC-seq.config.yaml --keep-going

---- This analysis has been done using snakePipes version 2.1.1 ----
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 5
Rules claiming more threads will be scaled down.
Job stats:
job                           count    min threads    max threads
--------------------------  -------  -------------  -------------
MACS2_peak_qc                     1              1              1
all                               1              1              1
bamCompare_subtract               1              5              5
callOpenChromatin                 1              5              5
filterCoveragePerScaffolds        1              5              5
filterFragments                   1              5              5
filterMetricsToHtml               1              1              1
plotFingerprint                   1              5              5
total                             8              1              5

Select jobs to execute...

[Fri Nov 25 13:51:19 2022]
rule plotFingerprint:
    input: filtered_bam/BL611.filtered.bam, filtered_bam/BL611.filtered.bam.bai
    output: deepTools_ATAC/plotFingerprint/plotFingerprint.metrics.txt
    log: deepTools_ATAC/logs/plotFingerprint.out, deepTools_ATAC/logs/plotFingerprint.err
    jobid: 5
    benchmark: deepTools_ATAC/.benchmark/plotFingerprint.benchmark
    threads: 5
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/c0ef7d14915637604625f129db7eb75f
[Fri Nov 25 13:56:09 2022]
Finished job 5.
1 of 8 steps (12%) done
Select jobs to execute...

[Fri Nov 25 13:56:09 2022]
rule filterFragments:
    input: filtered_bam/BL611.filtered.bam
    output: short_bams/BL611.short.bam, short_bams/BL611.short.metrics
    log: short_bams/logs/BL611.filterFragments.log
    jobid: 3
    wildcards: sample=BL611
    threads: 5
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/c0ef7d14915637604625f129db7eb75f
[Fri Nov 25 14:19:55 2022]
Finished job 3.
2 of 8 steps (25%) done
Select jobs to execute...

[Fri Nov 25 14:19:55 2022]
rule filterCoveragePerScaffolds:
    input: short_bams/BL611.short.bam
    output: short_bams/BL611.chrom.whitelist, short_bams/BL611.short.bam.bai, short_bams/BL611.short.cleaned.bam, short_bams/BL611.short.cleaned.bam.bai
    log: short_bams/logs/BL611.filterCoveragePerScaffolds.log
    jobid: 2
    wildcards: sample=BL611
    threads: 5
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/c0ef7d14915637604625f129db7eb75f
Removing temporary output file short_bams/BL611.short.bam.
Removing temporary output file short_bams/BL611.short.bam.bai.
[Fri Nov 25 14:21:25 2022]
Finished job 2.
3 of 8 steps (38%) done
Select jobs to execute...

[Fri Nov 25 14:21:26 2022]
rule bamCompare_subtract:
    input: short_bams/BL611.short.cleaned.bam, short_bams/BL611.short.cleaned.bam.bai
    output: deepTools_ATAC/bamCompare/BL611.filtered.bw
    log: deepTools_ATAC/logs/bamCompare.BL611.filtered.out, deepTools_ATAC/logs/bamCompare.BL611.filtered.out
    jobid: 6
    benchmark: deepTools_ATAC/.benchmark/deepTools_ATAC/logs/bamCompare.BL611.filtered.benchmark
    wildcards: sample=BL611
    threads: 5
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/c0ef7d14915637604625f129db7eb75f
[Fri Nov 25 14:25:37 2022]
Finished job 6.
4 of 8 steps (50%) done
Select jobs to execute...

[Fri Nov 25 14:25:37 2022]
rule callOpenChromatin:
    input: short_bams/BL611.short.cleaned.bam
    output: MACS2/BL611.filtered.short.BAM_peaks.narrowPeak, MACS2/BL611.filtered.short.BAM_peaks.xls
    log: MACS2/logs/callOpenChromatin/BL611_macs2.out, MACS2/logs/callOpenChromatin/BL611_macs2.err
    jobid: 1
    wildcards: sample=BL611
    threads: 5
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/5add684f4e6b1634166e15ef698a1897
[Fri Nov 25 14:29:15 2022]
Finished job 1.
5 of 8 steps (62%) done
Select jobs to execute...

[Fri Nov 25 14:29:15 2022]
rule filterMetricsToHtml:
    input: short_bams/BL611.short.metrics
    output: Filtering_metrics/Filtering_report.html
    log: Filtering_metrics/logs/produce_report.err, Filtering_metrics/logs/produce_report.out
    jobid: 7
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

[Fri Nov 25 14:29:15 2022]
rule MACS2_peak_qc:
    input: short_bams/BL611.short.cleaned.bam, short_bams/BL611.short.cleaned.bam.bai, MACS2/BL611.filtered.short.BAM_peaks.xls
    output: MACS2_QC/BL611.filtered.BAM_peaks.qc.txt
    log: MACS2_QC/logs/BL611.MACS2_peak_qc.log
    jobid: 4
    benchmark: MACS2_QC/.benchmark/ATAC_qc.BL611.filtered.benchmark
    wildcards: sample=BL611
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/5add684f4e6b1634166e15ef698a1897
Activating conda environment: /home/jrummel/anaconda3/envs/1499c078167105eb0838781b319f325f
[Fri Nov 25 14:29:27 2022]
Finished job 7.
6 of 8 steps (75%) done
/usr/bin/bash: line 22: printf: MACS2_QC/logs/BL611.MACS2_peak_qc.log: invalid number
[Fri Nov 25 14:29:33 2022]
Error in rule MACS2_peak_qc:
    jobid: 4
    output: MACS2_QC/BL611.filtered.BAM_peaks.qc.txt
    log: MACS2_QC/logs/BL611.MACS2_peak_qc.log (check log file(s) for error message)
    conda-env: /home/jrummel/anaconda3/envs/5add684f4e6b1634166e15ef698a1897
    shell:

        # get the number of peaks
        peak_count=`cat MACS2/BL611.filtered.short.BAM_peaks.narrowPeak | wc -l`

        # get the number of mapped reads
        mapped_reads=`samtools view -c -F 4 short_bams/BL611.short.cleaned.bam`

        # calculate the number of alignments overlapping the peaks
        # exclude reads flagged as unmapped (unmapped reads will be reported when using -L)
        reads_in_peaks=`samtools view -c -F 4 -L MACS2/BL611.filtered.short.BAM_peaks.narrowPeak short_bams/BL611.short.cleaned.bam`

        # calculate Fraction of Reads In Peaks
        frip=`bc -l <<< "$reads_in_peaks/$mapped_reads"`

        # compute peak genome coverage
        peak_len=`awk '{total+=$3-$2}END{print total}' MACS2/BL611.filtered.short.BAM_peaks.narrowPeak`
        genome_size=`awk '{total+=$3-$2}END{print total}' /projects/allelespecificchromatinmouse/work/MouseSeqData/Indices_BL6/genome_fasta/genome.fa.fai`
        genomecov=`bc -l <<< "$peak_len/$genome_size"`

        # write peak-based QC metrics to output file
        printf "peak_count      FRiP    peak_genome_coverage
%d      %5.3f   %6.4f
" $peak_count $frip $genomecov > MACS2_QC/BL611.filtered.BAM_peaks.qc.txt >2 MACS2_QC/logs/BL611.MACS2_peak_qc.log

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job MACS2_peak_qc since they might be corrupted:
MACS2_QC/BL611.filtered.BAM_peaks.qc.txt
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping_trim/BL611_filtered/.snakemake/log/2022-11-25T135114.141689.snakemake.log

 !!! ERROR in the ATAC-seq open chromatin workflow! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The file BL611.MACS2_peak_qc.log is empty.

I previously ran the DNA mapping workflow without error using the following command: DNA-mapping -i /projects/allelespecificchromatinmouse/work/ATAC/X204SC21092819-Z01-F001_02/raw_data/BL611/ -o /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping_trim/BL611_filtered/ --local --ext .fq.gz --reads '_1' '_2' BL6_108 --fastqc --trim --trimmer 'trimgalore' --dedup

If you need more information let me know.

Thanks a lot.

Best, Julian

jurummel commented 1 year ago

Hello :)

This line creates the error:

# write peak-based QC metrics to output file
        printf "peak_count      FRiP    peak_genome_coverage
%d      %5.3f   %6.4f
" $peak_count $frip $genomecov > MACS2_QC/BL611.filtered.BAM_peaks.qc.txt >2 MACS2_QC/logs/BL611.MACS2_peak_qc.log

Is it possible that it should be more like this?:

# write peak-based QC metrics to output file
        printf "peak_count      FRiP    peak_genome_coverage
%d      %5.3f   %6.4f
" $peak_count $frip $genomecov > MACS2_QC/BL611.filtered.BAM_peaks.qc.txt >> MACS2_QC/logs/BL611.MACS2_peak_qc.log

Because a file 2 gets created with the following input:

peak_count      FRiP    peak_genome_coverage
9610    0.562   0.0000
peak_count      FRiP    peak_genome_coverage
0       0.000   0.0000

Thanks again!

Best, Julian

jurummel commented 1 year ago

or more like:

 write peak-based QC metrics to output file
        printf "peak_count      FRiP    peak_genome_coverage
%d      %5.3f   %6.4f
" $peak_count $frip $genomecov > MACS2_QC/BL611.filtered.BAM_peaks.qc.txt 2> MACS2_QC/logs/BL611.MACS2_peak_qc.log
WardDeb commented 1 year ago

Hi,

Could you try a later version of snakePipes ? This should be fixed.

Kind regards,

Warddeb

jurummel commented 1 year ago

yes, this is fixed. Thanks a lot :)