MetaPONT runs Centrifuge and Minimap in a docker container. For the installation a working Docker installation is required (https://docs.docker.com/get-docker/).
git clone
cd metapont
docker build -t metapont .
docker run metapont --help
A prebuild library (from the 2018 procaryotic centrifuge library) can be downloaded from here: https://owncloud.gwdg.de/index.php/s/QFWoe44UKYGrbGQ
For better results we suggest building a custom library, depending on you usecase.
To build a usable database, you need a Centrifuge database. Please follow the instructions on https://github.com/DaehwanKimLab/centrifuge
To get the relationship SeqID->taxID run:
centrifuge-inspect --conversion-table <base> > conversionTable.txt
This file can be very redundant, so let's go ahead and uniq it:
cat conversionTable.txt | sort | uniq > conversionTable_tmp.txt
mv conversionTable_tmp.txt conversionTable.txt
nohup centrifuge-inspect <base>
Base is the short name of your database (it's the equivalent of p+h+v, p_compressed, etc. on the Centrifuge GitHub). We use nohup here because this can take some time. You can of course use alternatives. In the case of nohup be sure to rename nohup.out to for example p+h+v_all_seqs.fasta.
mv nohup.out <base>.fasta
nohup cdbfasta <base>.fasta
In the next step we will build the taxID-fasta-files and gzip them. Please be aware that this will need a lot of disk
space and run for a long time, depending on the database size. The skript mmpDBbuilder.py
will generate the fasta
files.
nohup python3 mmpDBbuilder.py </path/to/conversiontable> </path/to/fasta_index> <processes>
</path/to/conversiontable>
absolute path to output of 1.</path/to/fasta_index>
absolute path to output of 3.<processes>
is the number of subprocesses you want to start (how many files should be build at the same time)If you are done and sure, that you do not want to redo certain steps of the procedure you can remove the output of step 2 and 3 (i.e. the big fasta file, the cdbfasta index, and all nohup.out files). MeTaPONT does not need these files to work.
Run docker run metapont --help
to get the Information below.
Docker run -v /path/to/database:/database -v /path/to/input:/input -v /path/to/output/directory:/output ContainerName [Parameters]
Change /path/to/database
to the absolute path of the database directory you want to use.
Change /path/to/input
to your absolute path of the input fastq file or a directory that contains fastq files (all
subdirectories will be searched for *.fastq-files and they will be the input).
Change /path/to/output
to the absolute path of the directory you want the output in.
-c
just Centrifuge will be used [Default: Centrifuge and Minimap2 will be used].
-k <Int>
max number of TaxIDs returned by Centrifuge, only important if -c [Default: 5].
-t <Int>
max number of threads used by Centrifuge and MMP2 [Default: 1].
--threshold <Int>
taxIDs in centrifuge_out.tsv are omitted when score is lower than threshold [Default: 150].
--filter <Int:Int>
thresholds for minimap2 output. First int is the normalized Smith-Waterman-AS threshold, second
is coverage threshold (0-100). [Default: 1000:50]
--filter-benchmarking <int,...,int:int,int:...>
two lists of minimap cutoff values of type int. Every combination
will be saved.
--normalize < 0 | 1 >
decides if Smith Waterman score should be normalized with the alignment length (1) or not (
0) [Default: 1].
--redundance <Int>
readIDs with centrifuge hits to at least redundance
taxIDs will be dropped, only if not
-c [Default: 50].
--verbose
gives more files as output (mostly for debugging purposes).
In our published article "Comprehensive wet-bench and bioinformatics workflow for complex microbiota using Oxford Nanopore Technologies" in mSystems (2021) we present a comprehensive analysis pipeline with sampling, storage, DNA extraction, library preparation and bioinformatical evaluation for complex microbiomes sequenced with ONT (DOI:10.1128/mSystems.00750-21).
This study consists of 6 different experiments. Here we provide download links for fastq-files and metadata (zipped together). During classification all fastq files will be merged/concatenated. To reorder the trimmed reads to their belonging samples we provide barcode.txt files for every experiment. With the following commands these files can be downloaded:
Fastq-files and metadata:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/U7SJM6NlrZNgmkz/download"
barcodes_swabfinding_16S:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/humpM6O2hJ1AWob/download"
Fastq-files and metadata:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/6aGux5csVk7nzpg/download"
barcodes_dailyprofiles_16S:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/NTW7jDQCJV6QEew/download"
Fastq-files and metadata:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/tHQkhM4SkrDXTvi/download"
barcodes_gradeofstool_metagenomics:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/RxVC2AgMabTl5yy/download"
Fastq-files and metadata:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/H4HIweMfZwPUdQc/download"
barcodes_alpha_16S:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/II2ZNtg9R8F2Dfh/download"
Fastq-files and metadata:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/yW2o3kAarT42j8Y/download"
barcodes_alpha_metagenomics:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/5qxAlNX8nCTonvk/download"
Fastq-files:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/VprOlYE8ak232Ou/download"
The simulated datasets used in the paper can be downloaded with:
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/vfaBobqMOP6Zwkj/download"
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/Delfe5tur3ke51Q/download"
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/2jRIHIHTGCkvLe4/download"
wget --content-disposition "https://owncloud.gwdg.de/index.php/s/bRI2vLSsnMXjwZ0/download"
More datasets with corresponding metadata are publicly available in Qiita (study number: 13720).