A pipeline to process Nanopore reads and transfer the results to the end users.
git clone git@github.com:maxplanck-ie/nanoporeReads_dataTransfer.git
cd nanoporeReads_dataTransfer
mamba env create -n ont -f env.yaml
mamba activate ont
pip install .
For Apple M1/M2 (arm64) many conda packages are not yet available. Use instead:
CONDA_SUBDIR=osx-64 mamba create -n ont -f env.yaml
The key functionality is achieved using snakemake workflows. From version 2.0.0 two different snakemake rule sets are supported which are centered around two different basecallers:
rules_dorado
: a dorado-based workflow.A wrapper python script (ont.py
) implements
The main configuration file (config.yaml
) specifies:
rulesPath: rules
or rules_dorado
), generic parameters (basecalling, mapping)
Notice that the generic configuration defined by this file is expanded by project-specific entries for each incoming flowcell
Additional configuration files are:
env.yaml
(for conda installation of all dependencies)multiqc_config.yaml
(to customize multiqc output)ont -c config.yaml
The workflow connects and relies on three main data locations:
offloadDir
) is screened for the arrival of new and unprocessed flowcellsoutputDir
) is used for various processing steps (merging, basecalling, demultiplexing, alignment, quality controls)groupDir
) receives the analysis results in a project-wise manner.The details are rule-set dependent. Annotated examples for rules_dorado
is given below
offloadDir
)This directory is generated by the sequencing machine and may change in response to technological developments.
../path/to/flowcell/
.
├── bam_pass # from fast basecalling
├── barcode_alignment_PAS33554_6b0029ab_a0fbcf5b.tsv
├── fastq_pass # from fast basecalling
├── final_summary_PAS33554_6b0029ab_a0fbcf5b.txt
├── other_reports
├── pod5_pass # pod5 format
├── pore_activity_PAS33554_6b0029ab_a0fbcf5b.csv
├── report_PAS33554_20230928_1016_6b0029ab.html
├── report_PAS33554_20230928_1016_6b0029ab.json
├── report_PAS33554_20230928_1016_6b0029ab.md
├── SampleSheet.csv # sample sheet information
├── sample_sheet_PAS33554_20230928_1016_6b0029ab.csv
├── sequencing_summary_PAS33554_6b0029ab_a0fbcf5b.txt
└── throughput_PAS33554_6b0029ab_a0fbcf5b.csv
outputDir
)../path/to/flowcell
.
├── analysis.done # flag to signal that this folowcell has been fully processed
├── bam # output from basecalling in bam format (including modificaytion calls)
├── bam_demux # demulitplex samples (empty if no barcoding)
├── benchmarks # benchmarks for each rule
├── benchmarks_combined.tsv # combined benchmark file
├── flags # directory with flags from snakemake rules
├── log # log files (rule-specific)
├── pipeline_config.yaml # configfile (snakemake & more)
├── pod5 # directory with merged pod5 file (from offloadDir)
├── reports # directory with reports and SampleSheet.csv (from offloadDir)
├── summary # summary files (DAG, disk status)
└── transfer # analysis output that will be transferred)
transfer/
└── Project_projectID_User_Group
├── Analysis_mouse_dna # analysis directory (exists only if genome is known)
│ ├── 23L000329_WT_rep1.align.bam # alignment
│ ├── 23L000329_WT_rep1.align.bam.bai # index
│ └── 23L000329_WT_rep1.align.bed.gz # modification calls
├── Data
│ ├── 23L000329_WT_rep1.bam # basecalled sequences
│ ├── 23L000329_WT_rep1.fastq.gz # basecalled sequences (fastq - deprecated)
│ ├── 23L000329_WT_rep1_porechop.fastq.gz # adaptors, barcodes trimmed
│ └── 23L000329_WT_rep1.seqsum # sequencing summaries (for pycoQC etc )
└── QC
├── multiqc
│ ├── multiqc_data
│ └── multiqc_report.html # multiqc report
├── sample_names.tsv # dictionary sampleID-sampleName
└── Samples # samples-wise quality controls
├── 23L000329_WT_rep1.align.flagstat
├── 23L000329_WT_rep1.align_pycoqc.html
├── 23L000329_WT_rep1.align_pycoqc.json
├── 23L000329_WT_rep1_fastqc.html
├── 23L000329_WT_rep1_fastqc.zip
├── 23L000329_WT_rep1_kraken.report
├── 23L000329_WT_rep1_porechop.info
├── 23L000329_WT_rep1_pycoqc.html
├── 23L000329_WT_rep1_pycoqc.json
├── all_porechop.best_end
├── all_porechop.best_start
└── all_porechop.trimmed
groupDir
)../user_path/to/flowcell/ (identical to outputDir/transfer)
.
├── metadata.yaml
└── Project_projectID_User_Group
├── Analysis_mouse_dna
├── Data
└── QC