ON-rep-seq is a molecular method where bacterial (or yeast) selective intragenomic fragments generated with Rep-PCR are sequenced using Oxford Nanopore Technologies.
This apporoch allows for species and sub-species level identification but also often strain level discrimination of bacterial and yeast isolates at very low cost.
Current version of ON-rep-seq allows for analysis of up to 192 isolates in one R9 flow cell but will give most cost effective results by using flongle <https://nanoporetech.com/products/flongle>
_ for which it wass initially designed.
You can follow the installation guide <https://docs.anaconda.com/anaconda/install/>
_ .
Clone github repo and enter directory::
git clone https://github.com/lauramilena3/On-rep-seq cd On-rep-seq
Create On-rep-seq virtual environment and activate it::
conda env create -n On-rep-seq -f On-rep-seq.yaml source activate On-rep-seq
Go into On-rep-seq directory and create variables to your basecalled data and the results directory of your choice::
fastqDir="/path/to/your/basecalled/data" reusultDir="/path/to/your/desired/results/dir"
If you are using os then you need to edit the config file to set a new directory for canu::
sed -i'.bak' -e 's/Linux-amd64/Darwin-amd64/g' config.yaml
View the number of avaliable cores in your machine and set a number::
nproc nCores="n"
If you are using your laptop we suggest you to leave 2 free cores for other system tasks.
Download kraken database. Notice this step can take up to 48 hours (!needs to be done only once)::
kraken2-build --download-taxonomy --db db/NCBI-bacteria --threads $nCores #4h kraken2-build --download-library bacteria --db db/NCBI-bacteria --threads $nCores #33h kraken2-build --build --db db/NCBI-bacteria --threads $nCores #4h
ON-rep-seq is under regular updates. For better results, please keep your local installation up to date::
cd On-rep-seq git pull
The input data is basecalled fastq files. Please check Guppy basecaller <https://community.nanoporetech.com/downloads>
_
For best performance we strongly recommend basecalling on GPU (tested on GTX 1080Ti and RTX 2080).
Run the snakemake pipeline with the desired number of cores::
snakemake -j $nCores --use-conda --config basecalled_dir=$fastqDir results_dir=$reusultDir
Limiting memory ...............
You can limit the memory resources (in Megabytes) used per core by using the resources directive as follows::
snakemake -j $nCores --use-conda --config basecalled_dir=$fastqDir results_dir=$reusultDir --resources mem_mb=$max_mem
View dag of jobs to visualize the workflow ++++++++++++++++++++++++++++++++++++++++++
To view the dag run::
snakemake --dag | dot -Tpdf > dag.pdf
All results are stored in the Results folder as follows::
Results
├── 01_porechopped_data
│ └── {barcode}_demultiplexed.fastq # Demultiplexed fastq per barcode
├── 02_LCPs
│ ├── LCP_clustering_heatmaps.ipynb # Clustering jupyter notebook
│ ├── LCP_plots.pdf # Plots
│ ├── {barcode}.txt # All LCPs
│ └── LCPsClusteringData
│ └── {barcode}.txt # LCPs used for clustering
├── 03_LCPs_peaks
│ ├── 00_peakconsensus
│ │ └── fixed{barcode}_{peak}.fasta # Corrected consensus fasta of peaks
│ ├── 01_taxonomic_assignments
│ │ ├── taxonomyassignments.txt # Taxonomy of all barcodes
│ │ └── taxonomy{barcode}.txt # Taxonomy per Barcode
│ └── peaks_{barcode}.txt # File with the peaks of each barcode
└── check.txt # Final file "On-rep-seq succesfuly executed"
bioRxiv <https://www.biorxiv.org/content/10.1101/402156v1>
_