mgimenez720 / plaSquid

Nextflow pipeline for plasmid detection and classification from metagenomic data
GNU General Public License v3.0
7 stars 1 forks source link

plaSquid: Plasmid Sequences Identification in metagenomic assemblies.

Description

Pipeline overview

Installation

git clone https://github.com/mgimenez720/plaSquid/
cd plaSquid/

You need nextflow installed in order to run plaSquid. Documentation is available here

PlaSquid can be ran using docker or conda.

If you want to generate a permanent conda environment (recommended) you can try:

conda env create -f environments/plaSquid.yml

If you want to generate a permanent docker image you can try:

docker pull mgimenez720/plasquid:latest

Dependencies

All dependencies are provided within the containers available. Manual installation is discouraged.

hmmer 3.3.1, infernal 1.1.3, minimap2 2.17, prodigal 2.6.3, R packages: dplyr 1.0.4, tidyverse 1.3.0, seqinr 4.2.5, biostrings 2.58.0.

Usage

nextflow run main.nf --contigs {testdata/test.fasta} --outdir {plaSquid_result} 

arguments:

--contigs       Path to input assemblies.
--mmi           Path to Minimap2 indexed (.mmi) or fasta (.fasta/.fna) plsdb database.
--outdir        Path to output directory where results are written.
--help          Print help message and exit

subworkflows:

--minidist      Run mapping of contigs against plsdb database.
--repsearch     Run search and classification of RIP and MOB (Rel) genes.
--ripextract    Extract replication initiator proteins sequences.
--mobextract    Extract relaxases sequences.

profiles:

-profile conda  Installs dependencies using a conda environment
-profile docker Installs dependencies within a docker image
-profile server runs using 15 cpus and 50 Gb
-profile test   tests dependencies and normal functioning

Authors:

Matías Giménez
Ignacio Ferrés
Gregorio Iraola

Microbial Genomics Laboratory
Institut Pasteur Montevideo (Uruguay)

Output

"Contig": contig id for plaSquid
"name": contig name in the assembly file
"Sim-dist": S value obtained by Minidist workflow
"plsdb_match": plasmid matched at plsdb database
"Match_length": length of the plasmid matched at plsdb
"RIP_domain": RIP-domain found in that contig
"MOB_group": MOB group classification of relaxase found in that contig
"Rep_type": Rep-type classifiation of the contig detected
"Contig_length": size of the contig detected

Citation

Available preprint https://doi.org/10.1101/2022.08.04.502827

If you are going to use plaSquid results for further analysis, consider citing the following tools:

MOBscan (https://castillo.dicom.unican.es/mobscan_about/) RepliconFinder (https://journals.asm.org/doi/pdf/10.1128/aac.02412-14) PLSDB (https://ccb-microbe.cs.uni-saarland.de/plsdb/) GTDB-tk (https://github.com/Ecogenomics/GTDBTk)

Note

This is a beta version, please report bugs or misfunctions detected.