maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
383 stars 85 forks source link

Error during DNA-Mapping workflow: bowtie2.index.err: No output file specified #861

Closed jurummel closed 1 month ago

jurummel commented 2 years ago

Hi everyone,

i am trying to analyse ATAC-seq data with snakepipes. Currently, I am facing an error during the DNA-mapping workflow. Before that, I have created an index for GRCm38. I have crossed samples from CAST_EiJ and C57BL_6NJ. Thus, I use the allelic-mapping mode and pass a SNP file to the pipeline:

DNA-mapping -i /projects/allelespecificchromatinmouse/work/ATAC/X204SC21092819-Z01-F001_01/raw_data/F2022YB/ -o /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping/F2022YB/ --local --mode allelic-mapping --VCFfile /projects/allelespecificchromatinmouse/work/MouseSeqData/help_data/mgp.v5.merged.snps_all.dbSNP142.vcf --strains 'CAST_EiJ,C57BL_6NJ' --ext .fq.gz --reads '_1' '_2' GRCm38_105_Mapping

I get the following Error-message:

rule bowtie2_index:
    input: snp_genome/CAST_EiJ_C57BL_6NJ_dual_hybrid.based_on_GRCm38_105_Mapping_N-masked
    output: snp_genome/bowtie2_Nmasked/Genome.1.bt2
    log: snp_genome/bowtie2_Nmasked/bowtie2.index.out, snp_genome/bowtie2_Nmasked/bowtie2.index.err
    jobid: 7
    threads: 5
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Activating conda environment: /home/jrummel/anaconda3/envs/97bf6a3bf6520594fcbd63a07735fa20
[Tue Oct 18 17:05:16 2022]
Error in rule bowtie2_index:
    jobid: 7
    output: snp_genome/bowtie2_Nmasked/Genome.1.bt2
    log: snp_genome/bowtie2_Nmasked/bowtie2.index.out, snp_genome/bowtie2_Nmasked/bowtie2.index.err (check log file(s) for error message)
    conda-env: /home/jrummel/anaconda3/envs/97bf6a3bf6520594fcbd63a07735fa20
    shell:
        bowtie2-build --threads 5  snp_genome/bowtie2_Nmasked/Genome > snp_genome/bowtie2_Nmasked/bowtie2.index.out 2> snp_genome/bowtie2_Nmasked/bowtie2.index.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /projects/allelespecificchromatinmouse/work/MouseSeqData/results/DNA-Mapping/F2022YB/.snakemake/log/2022-10-18T160358.600796.snakemake.log

 !!! ERROR in DNA mapping workflow! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

bowtie2.index.err:

No output file specified!
      Bowtie 2 version 2.3.5.1 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
      Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
          reference_in            comma-separated list of files with ref sequences
          bt2_index_base          write bt2 data to files with this dir/basename
      *** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
      Options:
          -f                      reference files are Fasta (default)
          -c                      reference sequences given on cmd line (as
                                  <reference_in>)
          --large-index           force generated index to be 'large', even if ref
                                  has fewer than 4 billion nucleotides
          --debug                 use the debug binary; slower, assertions enabled
          --sanitized             use sanitized binary; slower, uses ASan and/or UBSan
          --verbose               log the issued command
          -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
          -p/--packed             use packed strings internally; slower, less memory
          --bmax <int>            max bucket sz for blockwise suffix-array builder
          --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
          --dcv <int>             diff-cover period for blockwise (default: 1024)
          --nodc                  disable diff-cover (algorithm becomes quadratic)
          -r/--noref              don't build .3/.4 index files
          -3/--justref            just build .3/.4 index files
          -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
          -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
          --threads <int>         # of threads
          --seed <int>            seed for random number generator
          -q/--quiet              verbose output (for debugging)
          -h/--help               print detailed description of tool and its options
          --usage                 print this usage message
          --version               print version information and quit

bowtie2.index.out is empty.

Do you have any idea how to fix this? If you need more information just let me know.

Thanks a lot for your help :)

(I have seen issue #517 regarding the same problem. Unfortunately that did not help.)

Best Julian

katsikora commented 1 year ago

Hi Julian,

thanks for reporting this issue. It looks like the N-masked fasta file required for the bowtie index is missing. It should have been generated in the previous step. Did rule create_snpgenome produce any errors?

Best,

Katarzyna

jurummel commented 1 year ago

Hi Katarzyna,

thanks for the quick reply. I don't get an error message for rule create_snpgenome.

rule create_snpgenome:
    input: /projects/allelespecificchromatinmouse/work/MouseSeqData/Indices_Mapping/genome_fasta
    output: snp_genome/CAST_EiJ_SNP_filtering_report.txt, snp_genome/C57BL_6NJ_SNP_filtering_report.txt, snp_genome/CAST_EiJ_C57BL_6NJ_dual_hybrid.based_on_GRCm38_105_Mapping_N-masked, snp_genome/all_C57BL_6NJ_SNPs_CAST_EiJ_reference.based_on_GRCm38_105_Mapping.txt
    log: SNPsplit_createSNPgenome.out, SNPsplit_createSNPgenome.err
    jobid: 8
    resources: tmpdir=/projects/allelespecificchromatinmouse/work/MouseSeqData/temp

Both files, SNPsplit_createSNPgenome.out & SNPsplit_createSNPgenome.err, are empty. The directory snp_genome/CAST_EiJ_C57BL_6NJ_dual_hybrid.based_on_GRCm38_105_Mapping_N-masked contains N-masked fasta files for all chromosomes.

Thanks again :)

Best, Julian