nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
MIT License
264 stars 664 forks source link

new module: dragen #4026

Open marrip opened 9 months ago

marrip commented 9 months ago

Is there an existing module for this?

Is there an open PR for this?

Is there an open issue for this?

Are you going to work on this?


### Tasks
- [x] Provide ped for for [trio]( @marrip
- [x] Run trio analysis on [HG002, HG003, HG004]( @xuyangyuio
- [x] Add remaining output files to Dragen module @marrip


### Tasks
- [x] Check if we already have a list of output files @asr081
- [ ] Check if `.gtf` file is present in Dragen references folder @xuyangyuio
- [x] Find RNA samples @marrip
- [ ] Get access to dragen @marrip
- [ ] run test data @marrip


### Tasks
- [x] Evaluate [test data]( @asr081
- [ ] run test data @marrip


### Tasks

Single Cell

### Tasks
marrip commented 9 months ago

Options we might not need:

  --fastq-list arg                                        CSV file specifying list of FASTQs for input
  --fastq-list-sample-id arg                              Only process entries whose 'RGSM' entry matches the given Sample ID parameter (for fastq-list.csv input)
  --fastq-list-all-samples arg                            Process all samples in fastq-list file, even when there are multiple 'RGSM' (Sample ID) values
  --tumor-fastq-list arg                                  CSV file specifying list of tumor FASTQs for input
  --tumor-fastq-list-sample-id arg                        Only process entries in the tumor fastq list input whose 'RGSM' entry matches the given Sample ID parameter (for fastq-list.csv input)
  --tumor-fastq-list-all-samples arg                      Process all samples in tumor-fastq-list file, even when there are multiple 'RGSM' (Sample ID) values
  --variant-list arg                                      file specifying list of Variants for input
  --output-file-prefix arg                                Output filename prefix
  --run-info arg                                          Path to RunInfo.xml file (default root of BCL input
marrip commented 9 months ago


  -1 [ --fastq-file1 ] arg                                FASTQ file to send to card (may be gzipped)
  -2 [ --fastq-file2 ] arg                                Second FASTQ file with paired-end reads (may be gzipped)
  --tumor-fastq1 arg                                      FASTQ file of tumor reads for somatic mode
  --tumor-fastq2 arg                                      Second FASTQ file of tumor reads for somatic mode
  -b [ --bam-input ] arg                                  Input BAM file to send to card
  --tumor-bam-input arg                                   Input BAM file of tumor reads for somatic mode
  --ml-recalibration-input-vcf arg                        A VCF or gVCF file containing small variant calls to recalibrate
  --bcl-input-directory arg                               Input BCL directory for BCL conversion (must be specified for BCL input)
  --sample-sheet arg                                      For BCL input, path to SampleSheet.csv file (default searched
 for in --bcl-input-directory)
  -a [ --annotation-file ] arg                            Transcript annotation file (RNA)
  --enable-rna-gene-fusion arg                            Enable the RNA gene fusion detection algorithm
  --rna-gf-input-file arg                                 Input chimeric junctions file, for standalone RNA gene fusion
  --rna-gf-restrict-genes arg                             Ignore genes with biotype other than protein coding or lncRNA for gene fusions
  --amplicon-target-bed arg                               The DNA amplicon target regions in bed format (required 4th column is amplicon name, optional 5th column is GeneID)
  --repeat-genotype-specs arg                             Repeat variant catalog file
marrip commented 9 months ago


 -p [ --pair-by-name ] arg                               Whether to use read names to identify read pairs. Valid only for BAM input.
 --append-read-index-to-name arg                         Whether to append /1 or /2 to read names for paired-end
  --pair-suffix-delimiter arg                             Character that delimits paired-end suffixes, e.g. / for /1 and /2
  --aws-s3-region arg                                     Specify the geographical region of AWS S3 buckets
  --strip-input-qname-suffixes arg                        Whether to strip /1 or /2 from input read names
  --enable-vcf-compression arg                            Enable compression of VCF output files (Default=true)
  --RGID arg                                              Read group ID
  --RGLB arg                                              Read group library
  --RGPL arg                                              Read group sequencing technology
  --RGPU arg                                              Read group platform unit
  --RGSM arg                                              Read group sample name
  --RGCN arg                                              Read group sequencing center name
  --RGDS arg                                              Read group description
  --RGDT arg                                              Read group run date
  --RGPI arg                                              Read group predicted insert size
  --RGID-tumor arg                                        Read group ID for tumor input
  --RGLB-tumor arg                                        Read group library for tumor input
  --RGPL-tumor arg                                        Read group sequencing technology for tumor input
  --RGPU-tumor arg                                        Read group platform unit for tumpr input
  --RGSM-tumor arg                                        Read group sample name for tumor input
  --RGCN-tumor arg                                        Read group sequencing center name for tumor input
  --RGDS-tumor arg                                        Read group description for tumor input
  --RGDT-tumor arg                                        Read group run date for tumor input
  --RGPI-tumor arg                                        Read group predicted insert size for tumor input
  --prepend-filename-to-rgid arg                          Internally prepend the file name to the RGID tag in cases of having the same RGID for different read groups across multiple bams
  --bcl-only-lane arg                                     For BCL input, convert only specified lane number (default all lanes)
  --strict-mode arg                                       For BCL input, abort if any files are missing (false by default)
  --first-tile-only arg                                   For BCL conversion, only convert first tile of input (for testing & debugging)
  --tiles arg                                             For BCL conversion, process only a subset of tiles by a regular expression
  --exclude-tiles arg                                     For BCL conversion, exclude set of tiles by a regular expression
  --bcl-sampleproject-subdirectories arg                  For BCL conversion, output to subdirectories based upon sample sheet 'Sample_Project' column
  --sample-name-column-enabled arg                        Use sample sheet 'Sample_Name' column when naming fastq files & subdirectories
  --fastq-gzip-compression-level arg                      For BCL input, set fastq output compression level 0-9 (default 1)
  --shared-thread-odirect-output arg                      Use linux native asynchronous io (io_submit) for file output (Default=false)
  --bcl-num-parallel-tiles arg                            For pure BCL conversion to FASTQ, # of tiles to process in parallel (default 1)
  --bcl-num-conversion-threads arg                        For pure BCL conversion to FASTQ, # of threads for conversion (per tile, default # cpu threads)
  --bcl-num-compression-threads arg                       For pure BCL conversion to FASTQ, # of threads for fastq.gz output compression (per tile, default # cpu threads, or HW+12)
  --bcl-num-decompression-threads arg                     For pure BCL conversion to FASTQ, # of threads for bcl/cbcl input decompression (per tile, default half # cpu threads, or HW+8. Only applies when preloading files)
  --bcl-only-matched-reads arg                            For pure BCL conversion, do not output files for 'Undetermined' [unmatched] reads (output by default)
  --no-lane-splitting arg                                 For pure BCL conversion to FASTQ, do not split FASTQ file by lane (false by default)
  --num-unknown-barcodes-reported arg                     For pure BCL conversion to FASTQ, # of Top Unknown Barcodes to output (1000 by default)
  --bcl-validate-sample-sheet-only arg                    For BCL conversion, only validate RunInfo.xml & SampleSheet files
  --bcl-num-ora-compression-threads-per-file arg          # of threads for ora compression per file (default 10)
  --bcl-num-ora-compression-parallel-files arg            # of files to process in parallel for ora compression (default 6)
  --output-legacy-stats arg                               For BCL conversion, also output stats in legacy (bcl2fastq2) format (false by default)
  --no-sample-sheet arg                                   BCL: Enable legacy no-sample-sheet operation (No demux or trimming. No settings supported. False by default, not recommended
  --enable-map-align arg                                  Enable the mapper/aligner (Default=true)
  --enable-map-align-output arg                           Enable the output from mapper/aligner
  --enable-rna arg                                        Enable the mapper/aligner RNA pipeline
  --rna-gf-restrict-genes arg                             Ignore genes with biotype other than protein coding or lncRNA for gene fusions
  --enable-auto-multifile arg                             Import subsequent segments of *_001.fastq files (Default=true)
  --combine-samples-by-name arg                           Import all fastq files with same sample name as given file (even across lanes) (Default=false)
  --enable-bam-indexing arg                               Output a .bai index file along with the output .bam
  --enable-sort arg                                       Enable sorting after mapping/alignment   (Default=true)
  --enable-duplicate-marking arg                          Enable marking or removal of duplicate alignment records (Default=false)
  --remove-duplicates arg                                 Remove duplicates instead of marking them with flag 0x400 (Default=false)
  --fastq-offset arg                                      FASTQ quality offset value. Set to 33 or 64 (Default=33)
  --fastq-n-quality arg                                   FASTQ quality to output for N base calls
  --ref-sequence-filter arg                               Output only reads mapping to this reference sequence
  --generate-md-tags arg                                  Whether to generate MD tags for alignment output records
  --generate-zs-tags arg                                  Whether to generate ZS tags for alignment output records
  --generate-xq-tags arg                                  Whether to generate xq:i tags (extended MAPQ) for alignment output records
  --preserve-bqsr-tags arg                                If true, pass through BI/BD tags (default=true)
  --methylation-protocol arg                              Library protocol for methylation analysis. (none|directional|non-directional|directional-complement|pbat)
  --methylation-match-bismark arg                         When running methyl-seq analysis, try to match Bismark output
  --methylation-TAPS arg                                  Set to true if input assays are generated by TAPS, rather than typical bisulfite-conversion-based methylation assays.
  --methylation-keep-ref-cytosine arg                     Set to true to keep all reference cytosines in the CX_report, even if they don't appear in the input reads. (Default=False)
  --methylation-compress-cx-report arg                    Set to true to enable compression of the CX_report. (Default=False)
  --enable-methylation-calling arg                        If true, merge methyl-seq runs and add tags. If false, methyl-seq just writes a BAM file per aligner run
  --methylation-generate-cytosine-report arg              Whether to generate a genome-wide cytosine methylation report
  --methylation-generate-mbias-report arg                 Whether to generate a per-sequencer-cycle methylation bias report
  --methylation-reports-only arg                          Skip methylation analysis and generates reports. Requires dragen methylated BAM input
  --methylation-mapping-implementation arg                What implementation to use during methylation mapping. (single-pass|multi-pass)
  --preserve-map-align-order arg                          Preserve the order of mapper/aligner output to produce deterministic results.  Impacts performance
  --filter-flags-from-output arg                          Filter output alignments with any bits set in 'val' present in the flags field.  Hex & decimal values accepted
  --umi-library-type arg                                  Batch option for read collapsing [random-duplex, random-simplex, nonrandom-duplex, non-umi]
  --umi-enable arg                                        Enable UMI-based read processing
  --umi-min-supporting-reads arg                          Minimum number of supporting reads required for a family. Applied independently to read1 and read2
  --umi-emit-multiplicity arg                             Consensus read output type: both or duplex only or simplex only: [both, duplex, simplex], Default: both
  --enable-positional-collapsing arg                      Enable positional collapsing. (Default = false)
  --enable-pgx arg                                        Batch option for enabling all PGx callers (e.g. Star Allele, CYP2D6, CYP2B6). VC will be enabled.
  --enable-dna-amplicon arg                               Enable DNA amplicon mode for alignment and variant calling
  --enable-rna-amplicon arg                               Enable RNA amplicon mode (Default=false)
  --repeat-genotype-enable arg                            Enable calling of repeat-expansion variants
marrip commented 9 months ago


  -r [ --ref-dir ] arg                                    Directory with reference and hash tables
  -c [ --config-file ] arg                                Configuration file
marrip commented 9 months ago

license stuff:

  --sse-key arg                                           Set server-side encryption [AES256]
  --lic-server arg                                        set license server for cloud sites: http://<base64_user>:<base64_pass>@<path>
  --lic-credentials arg                                   License configuration file.
  --lic-instance-id-location arg                          set cloud instance ID location
marrip commented 9 months ago

running docker run -v /var/run/docker.sock:/var/run/docker.sock --rm alpine/dfimage -sV=1.36 etycksen/dragen4:4.2.4

CMD ["/bin/bash"]
RUN RUN yum install unzip wget -y # buildkit
RUN RUN yum install which -y # buildkit
RUN RUN yum config-manager --enable ol8_codeready_builder # buildkit
RUN RUN yum install oracle-epel-release-el8 -y # buildkit
RUN RUN yum install git -y # buildkit
RUN RUN yum install perl -y # buildkit
RUN RUN yum install R -y # buildkit
RUN RUN yum install bc dkms gdb rsync smartmontools sos time -y # buildkit
RUN RUN yum install kernel kernel-devel -y # buildkit
RUN RUN yum install hostname -y # buildkit
COPY ./ /usr/bin/uname # buildkit

COPY . # buildkit

RUN RUN /bin/sh; rm -rf  \
        && rm -rf /dragen_software # buildkit