replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link
artic basecalling bioinformatics nanopore nanopore-data sars-cov-2 workflow

poreCov | SARS-CoV-2 Workflow for nanopore sequencing data

Twitter Follow

Citation:

poreCov - an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing
Christian Brandt, Sebastian Krautwurst, Riccardo Spott, Mara Lohde, Mateusz Jundzill, Mike Marquet, Martin Hölzer
https://www.frontiersin.org/articles/10.3389/fgene.2021.711437/full

What is this Repo?

Table of Contents

1. Quick Setup (Ubuntu)

1.1 Nextflow (the workflow manager)

Note, that with Singularity the following environment variables are automatically passed to the container to ensure execution on HPCs: HTTPS_PROXY, HTTP_PROXY, http_proxy, https_proxy, FTP_PROXY and ftp_proxy.

Conda (not recommended)

2. Run poreCov

2.1 Test run

# for a Docker installation
nextflow run replikation/poreCov -profile test_fastq,local,docker -r 1.1.0 --update

# or for Singularity or conda installation
nextflow run replikation/poreCov -profile test_fastq,local,singularity -r 1.1.0 --update

2.2 Quick run examples

nextflow run replikation/poreCov --fastq_pass 'fastq_pass/' -r 1.1.0 \
    --cores 32  -profile local,docker --update --primerV primers.bed
# rename barcodes automatically by providing an input file, also using another primer scheme
nextflow run replikation/poreCov --fast5 fast5_dir/ --samples sample_names.csv \
   --primerV V1200 --output results -profile local,docker --update

2.3 Extended Usage

Important input flags (choose one)

Custom primer bed files

MN908947.3  30  54  nCoV-2019_1_LEFT    nCoV-2019_1 +
MN908947.3  1183    1205    nCoV-2019_1_RIGHT   nCoV-2019_1 -
MN908947.3  1100    1128    nCoV-2019_2_LEFT    nCoV-2019_2 +
MN908947.3  2244    2266    nCoV-2019_2_RIGHT   nCoV-2019_2 -
MN908947.3  2153    2179    nCoV-2019_3_LEFT    nCoV-2019_1 +
MN908947.3  3235    3257    nCoV-2019_3_RIGHT   nCoV-2019_1 -
MN908947.3  3144    3166    nCoV-2019_4_LEFT    nCoV-2019_2 +
MN908947.3  4240    4262    nCoV-2019_4_RIGHT   nCoV-2019_2 -

Sample sheet

Example comma separated file (don't replace the header):

_id,Status,Description
Sample_2021,barcode01,good
2ndSample,BC02,bad

Pangolin Lineage definitions

3. Quality Metrics (default)

4. Workflow

5. Literature / References to cite

If you are using poreCov please also check the used software to cite in your work:

6. Troubleshooting

Singularity

7. Time to results

Table 1: Execution speed of poreCov on different Ubuntu 20 Systems using a single sample file with 167,929 reads. Command used: nextflow run replikation/poreCov -r 0.9.4 -profile test_fastq,local,docker.

Hardware First time with download (DB+container)¹ Default settings
2 CPUs 4 GB RAM 1h 2min 32 min 30s ²
2 CPUs 8 GB RAM 46 min 21m 20s
4 CPUs 16 GB RAM 40 min 12m 48s
8 CPUs 32 GB RAM 35 min 11m 39s
16 CPUs 64 GB RAM 30 min 9m 39s

¹ time depends mostly on available internet speed
² was not able to execute read classification due to limited hardware, but generated and classified SARS-CoV-2 genomes

Table 2: Execution speed of poreCov on different Ubuntu 20 Systems using 24 fastq samples. Command used: nextflow run replikation/poreCov -r 0.9.4 --fastq "*.fastq.gz" --primerV V1200 --samples samplenames.csv -profile local,docker. Time meassured by the start of the workflow.

Hardware Default settings
2 CPUs 4 GB RAM 13h 33m ¹
2 CPUs 8 GB RAM 7h 56m ¹
4 CPUs 16 GB RAM 4h 10 min
8 CPUs 32 GB RAM 2h 15 min
16 CPUs 64 GB RAM 1h 25 min

¹ was not able to execute read classification due to limited hardware, but generated and classified SARS-CoV-2 genomes

8. Credits

The key steps of poreCov are carried out using the ARTIC Network field bioinformatics pipeline. Kudos to all amazing developers for your incredible efforts during this pandemic! Many thanks to all others who have helped out and contributed to poreCov as well.