moritzbuck / mOTUlizer

Utility to analyse a group of closely related MAgs/Genomes/bins/SUBs of more or less dubious origin
GNU General Public License v3.0
12 stars 4 forks source link

mOTUlizer

DISCLAIMER, there is an other tool out there called mOTUs that creates OTU-tables directly from reads, if you are looking for that tool, this is the wrong page, you want to go 'here', but while you on my page, why don't you check out mOTUlizer, it's cool, I swear

Utility to analyse a group of closely related MAGs/Genomes/bins/SUBs of more or less dubious origin. Right now it is composed of a number of programs:

a number of example files are to be found in the example_files-folder, the fasta- and gff-files are the ones used for all the other files, these are generated by the always fantastic prokka. Also there is some reading material in the mOTUlizer/doc (a poster, a presentation and a very early paper draft, but at least it has the maths in it), the paper will eventually be available there!

INSTALL

With conda:

conda install -c bioconda  motulizer

With pip:

pip install mOTUlizer

manually:

git clone https://github.com/moritzbuck/mOTUlizer.git
cd mOTUlizer
python setup.py install

USAGE

mOTUlize

To make OTUs and get some stats, needs fastANI in the PATH if you do not provide a file for --similarities. To bypass fastANIs memory greedy nature, it runs it in blocks if needed.

simply run with:

mOTUlize.py --fnas example_files/fnas/*.fna -o output.tsv

Loads of little options if you do : mOTUlize.py -h

Key options:

mOTUpan

An intro video here:

mOTUpan for beginners

mOTUpan.py -h

Simplest command to run (needs mmseqs2 installed), but many options:

mOTUpan.py --faas *.faa -o output.tsv

Key options:

Check all flags in with --help, but here are some keys ones a bit more explained

anvi-run-motupan

You need an anvi'o pangenome-database, and if you have it the genome-storage (for completenesses), great otherwise simply:

# if you want just a tsv :

anvi-run-motupan.py -p MYPANGENOME-PAN.db -g MYGENOMES.db -o MY_OUTPUT.tsv

# if you want to update the db, so it show up in anvi-display-pan

anvi-run-motupan.py -p MYPANGENOME-PAN.db -g MYGENOMES.db --store-in-db

mOTUconvert

A small program generating appropriate input files for mOTUpan.py from the output of some of my favorite, or the public's favorite programs. It assumes the IDs in your protein fasta-file to be ${genome_name}_[0-9]* so genome-name separated from a number by an underscore. The gene name could have an underscore in it... But it might be risky, I did not code this very cleanly...

Runs as :

# check possible input file-types within

mOTUconvert.py --list

# running it
mOTUconvert.py  --in_type INFILE_TYPE INFILE > OUTPUT
# or
mOTUconvert.py  --in_type INFILE_TYPE -o OUTPUT INFILE

# you can then run mOTUpan as

mOTUpan.py --cog_file OUTPUT

In the example_files-folder a number of example input and output file are available.

Citing and additional doc

Preprint for mOTUpan available on bioRxiv:

mOTUpan: a robust Bayesian approach to leverage metagenome assembled genomes for core-genome estimation Moritz Buck, Maliheh Mehrshad, and Stefan Bertilsson bioRxiv 2021.06.25.449606; doi: https://doi.org/10.1101/2021.06.25.449606

A draft of a release note for mOTUlize is in the doc-folder, as well as the source of the previously mentioned mOTUpan paper and some slides