rkajitani / MetaPlatanus

De novo metagenome assembler
GNU General Public License v3.0
12 stars 1 forks source link

MetaPlatanus README.md

Description

MetaPlatanus is a de novo assembler for metagenome (microbiome). The features of this tool are as follows: (1) It can utilize various types of long-range information such as Oxford-Nanopore/PacBio long reads, mate-pairs (jumping libraries), and 10x linked reads (experimental). (2) Coverage depths, k-mer frequencies and results of the binning tool are also employed to extend sequences and correct mis-assemblies, reducing inter-species misassemblies. (3) Contig-assembly, scaffolding, gap-closing and binning are automatically executed at once. (4) MetaPlatanus requires at least one short-read paired-end library.

Version

v1.3.1

Web site

https://github.com/rkajitani/MetaPlatanus
http://platanus.bio.titech.ac.jp

Author

Rei Kajitani at Tokyo Institute of Technology wrote key source codes.
Address for this tool: platanus@bio.titech.ac.jp

Installation

Currently MetaPlatanus can be executed in Linux. There are three ways.

1. Using Miniconda or Anaconda

MetaPlatanus is registered in Bioconda channel; Linux only.

conda install -c conda-forge -c bioconda metaplatanus

2. Using Docker

For any OS if Docker is available.

docker pull rkajitani/metaplatanus
# Start a container interactively,
docker run -it -v $(pwd):/work -w /work rkajitani/metaplatanus
# or run metaplatanus in one command
docker run -v $(pwd):/work -w /work --rm rkajitani/metaplatanus metaplatanus ...

3. Build from source

Install the dependencies above. If the following commands are available, you will be able to run metaplatanus.

Compile (make), and copy metaplatanus and sub_bin to a directory listed in PATH (e.g. $HOME/bin).

make
cp -r sub_bin metaplatanus $HOME/bin

The main program is "metaplatanus". Note that the directory "sub_bin", which consists of Perl-scripts and other tools, should be specified (-sub_bin option) or put in the same directory of metaplatanus. There are two ways to install metaplatanus.

Synopsis

Inputs

Commands

metaplatanus -t 8 -m 64 -IP1 PE_1.fq PE_2.fq -ont ONT.fq >log.txt 2>&1
# num_threads, 8; memory_limit, 64GB

Output

The files below in out_result (directory). The prefix "out" can be changed using "-o" option.

Dependency

Usage

Command

metaplatanus -IP1 short_R1.fastq(a) short_R2.fastq(a) [Options] ...

Options

-IP{INT} FWD1 REV1 [FWD2 REV2 ...] : lib_id inward_pair_files (reads in 2 files, fasta or fastq; at least one library required)
-OP{INT} FWD1 REV1 [FWD2 REV2 ...] : lib_id outward_pair_files (reads in 2 files, fasta or fastq; aka mate-pairs or jumping-library)
-binning_IP{INT} FWD1 REV1 ...     : lib_id inward_pair_files for binning process. (reads in 2 files, fasta or fastq; the data are usually from another sample)
-p FILE1 [FILE2 ...]               : PacBio long-read file (fasta or fastq)
-ont FILE1 [FILE2 ...]             : Oxford Nanopore long-read file (fasta or fastq)
-x PAIR1 [PAIR2 ...]               : barcoded_pair_files (10x Genomics) (reads in 1 file, interleaved, fasta or fastq)
-X FWD1 REV1 [FWD2 REV2 ...]       : barcoded_pair_files (10x Genomics) (reads in 2 files, fasta or fastq)
-t INT                             : number of threads (<= 1; default, 1)
-m INT                             : memory limit for making kmer distribution (unit, GB; default, 64)
-o STR                             : prefix of output files (default "out")
-tmp DIR                           : directory for temporary files (default, ".")
-sub_bin DIR                       : directory for sub-executables, such as mata_plantaus and minimap2 (default, directory-of-this-script/sub_bin)
-min_cov_contig INT                : k-mer coverage cutoff for contig-assembly of MetaPlatanus (default, 4 with MEGAHIT, 2 otherwise)
-min_map_idt_binning FLOAT         : minimum identity (%) in read mapping for binning (default, 97)
-no_megahit                        : do not perfom MEGAHIT (default, off)
-no_binning                        : do not perfom binning (default, off)
-no_re_scaffold                    : do not perfom re-scaffolding (default, off)
-no_tgsgapcloser                   : do not use TGS-GapCloser and NextPolish (default, off)
-no_nextpolish                     : do not use NextPolish (default, off)
-overwrite                         : overwrite the previous results, not re-start (default, off)
-h, -help                          : display usage
-v, -version                       : display version 

Outputs:

PREFIX_result (directory)

PREFIX is specified by -o

Publication

Kajitani, R., Noguchi, H., Gotoh, Y., Ogura, Y., Yoshimura, D., Okuno, M., Toyoda, A., Kuwahara, T., Hayashi, T. and Itoh, T. (2021) MetaPlatanus: a metagenome assembler that combines long-range sequence links and species-specific features. Nucleic Acids Research, gkab831. https://doi.org/10.1093/nar/gkab831

Notes

Inward-pair (usually called "paired-end", accepted in options "-IP" or "-ip"):

FWD --->
    5' -------------------- 3'
    3' -------------------- 5'
                    <--- REV 

Outward-pair (usually called "mate-pair", accepted in options "-OP" or "-op"):

                    ---> REV 
    5' -------------------- 3'
    3' -------------------- 5'
FWD <---

Example inputs:

Inward-pair (separate, insert=300)   : PE300_1.fq PE300_2.fq
Outward-pair (separate, insert=2k)   : MP2k_1.fa MP2k_2.fq

Corresponding options:

-IP1 PE300_1_pair.fq PE300_2.fq \
-OP2 MP2k_1.fq MP2k_2.fq

To utilize multiple-samples data, MetaPlatanus can accept the short-reads that exclusivelly used for the binning process through -binning-IP# options. e.g.,

metaplatanus \
    -IP1 sample1_R1.fq sample1_R2.fq \
    -binning_IP1 sample1_R1.fq sample1_R2.fq \
    -binning_IP2 sample2_R1.fq sample2_R2.fq \
    -binning_IP3 sample3_R1.fq sample3_R2.fq \
...