mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
55 stars 7 forks source link
annotation cenote-taker2 cenote-unlimited-breadsticks discovery genbank hallmark hhblits hhsearch metagenomes prophage prophage-prediction virome virus virus-discovery virus-evolution virus-sequences

DEPRECATED

This repo is deprecated.

If you need help finishing a project using Cenote-Taker 2, I will still field questions/troubleshoot (open an issue).

Otherwise:

Please use Cenote-Taker 3. It's great!!

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

######### ######### ######### ######### ######### ######### ######### ######### #########

Cenote-Taker 2

Cenote-Taker 2 is a dual function bioinformatics tool. On the one hand, Cenote-Taker 2 can discover/predict virus sequences from any kind of genome or metagenomic assembly. On the other hand, virus sequences/genomes (perhaps predicted by another tool?) can be annotated with a variety of sequences features, genes, and taxonomy. Either the discovery or the the annotation module can be used independently.

+ The code is currently functional. Feel free to use Cenote-Taker 2 at will.
+ Major update on May 6th 2022: Version 2.1.5
+ Cenote-Taker 2.1.5 has an easier, more reliable installation and database downloads. 
+ Some packages that have given many users issues have been replaced. Taxonomy is more flexible. See release notes.
+ "virion" is now default database

If you just want to discover/predict virus sequences and get a report on those sequences, use Cenote Unlimited Breadsticks, also provided in the Cenote-Taker 2 repo.

If you just want to annotate your virus sequences and make genome maps, run Cenote-Taker 2 using -am True.

An ulterior motive for creating and distributing Cenote-Taker 2 is to facilitate annotation and deposition of viral genomes into GenBank where they can be used by the scientific public. Therefore, I hope you consider depositing the submittable outputs (.sqn) after reviewing them. I am not affiliated with GenBank.

See "Use Cases" below, and read the Cenote-Taker 2 wiki for useful information on using the pipeline (e.g. expected outputs) and screeds on myriad topics. Using a HPC with at least 16 CPUs and 16g of dedicated memory is recommended for most runs. (Annotation of a few selected genomes or virus discovery on smaller databases can be done with less memory/CPU in a reasonable amount of time).

To update from v2.1.3 (note that biopython and bedtools are now required):

conda activate cenote-taker2_env
pip install phanotate
conda install -c conda-forge -c bioconda hhsuite last=1282 seqkit
cd Cenote-Taker2
git pull
#Then update the BLAST database (see instructions below).

Update to HMM databases (hallmark genes) occurred on June 16th, 2021. Update to the BLAST (taxonomy) database occurred on May 6th, 2022. See instructions below to update your database.

Read the manuscript in Virus Evolution

If you cannot or do not want to install and run this on the command line, Cenote-Taker 2 v 2.1.3 is freely available to run with point-and-clink interface on the CyVerse Discovery Environment.

alt text

Install Using Conda


** Databases will require between 8GB (most basic) and 75GB (all the optional databases) of storage.
** Don't install without checking conda version first.
** Install on machine running on Linux (with a reasonably new OS). Support for MacOS is forthcoming.

If you just want a lightweight (7GB), faster, NON-ANNOTATING virus discovery tool, use Cenote Unlimited Breadsticks. The Unlimited Breadsticks module is included in the Cenote-Taker 2 repo, so no need to install it if you already have Cenote-Taker 2 (you may need to update from older versions Cenote-Taker2)

- ALERT *** If you choose to install all optional databases for HHsuite, 
- installation will take about 2 hours due to slow download speeds for pdb70
- AND require about 75GB of storage space. 
  1. Change to the directory you'd like to be the parent to the install directory

  2. Ensure Conda is installed and working (required for installation and execution of Cenote-Taker 2). Use version 4.10 or better. Note: instructions for installing Conda are probably specific to your university's/organization's requirements, so it is always best to ask your IT professional or HPC administrator. Generally, you will want to install Miniconda in your data directory.

    conda -V
  3. Clone the Cenote-Taker 2 github repo.

git clone https://github.com/mtisza1/Cenote-Taker2.git
  1. Install the conda environment (phanotate, last, and hhsuite don't play nice with the .yml file, so they need special commands)
conda env create --file cenote-taker2_env.yml
# follow conda prompts to allow install

conda activate cenote-taker2_env

pip install phanotate

conda install -c conda-forge -c bioconda hhsuite last=1282
  1. Change to the Cenote-Taker2 repo directory OR a different location where you want the databases to be stored. (NOTE: if you install the databases in a custom location you will need to specify this directory each time you run the tool) Download the databases.
conda activate cenote-taker2_env
cd Cenote-Taker2

**choose one of the following**

# with all the options (75GB). The PDB database (--hhPDB) takes about 2 hours to download.
python update_ct2_databases.py --hmm True --protein True --rps True --taxdump True --hhCDD True --hhPFAM True --hhPDB True

# substantially smaller but with some hhsuite DBs (20GB). I recommend this if you are unsure which you want.
python update_ct2_databases.py --hmm True --protein True --rps True --taxdump True --hhCDD True --hhPFAM True

# only the required DBs, No hhsuite (8GB)
python update_ct2_databases.py --hmm True --protein True --rps True --taxdump True

Bioconda installation

A user has packaged Cenote-Taker 2 in Bioconda for use by their institute. However, installation can be done by anyone using their package with a few commands. All the above alerts, requirements, and warnings still apply. This will also require a user to have 32GB of storage in their default conda environment directory.

Commands:

conda create -n cenote-taker2 -c hcc -c conda-forge -c bioconda -c defaults cenote-taker2=2020.04.01

conda activate cenote-taker2

download-db.sh

The Krona database directory will then need to be manually downloaded and set up. This should work:

CT2_DIR=$PWD
KRONA_DIR=$( which python | sed 's/bin\/python/opt\/krona/g' )
cd ${KRONA_DIR}
sh updateTaxonomy.sh
cd ${KRONA_DIR}
sh updateAccessions.sh
cd ${CT2_DIR}

Discussion: LINK

Updating databases

As of now, the HMM database has been updated from the original (update on June 16th, 2021), and the BLAST database has been updated (May 6th, 2022). This update should only take a minute or two. Here's how you update (modify if your conda environment is different than below example):

# update Cenote-Taker 2 (change to main repo directory):
git pull

# load your conda environment:
conda activate cenote-taker2_env

#change to Cenote-Taker2 directory
cd Cenote-Taker2

# run the update script:
python update_ct2_databases.py --hmm True --protein True

Schematic

alt text

Running Cenote-Taker 2

Cenote-Taker 2 currently runs in a python wrapper.

  1. Activate the Conda environment.

Check environments:

conda info --envs

#Default:

conda activate cenote-taker2_env

#Or if you've put your conda environment in a custom location:

conda activate /path/to/better/directory/cenote-taker2_env
  1. Run the python script to get the help menu (see options below).
# quick help menu
python /path/to/Cenote-Taker2/run_cenote-taker2.py

# full help menu
python /path/to/Cenote-Taker2/run_cenote-taker2.py -h
  1. Run some contigs. For example:
    
    python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_CONTIGS.fasta -r my_contigs1_ct -m 32 -t 32 -p true -db virion

Or, if you want to save a log of the run, add "2>&1 | tee output.log" to the end of the command:

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_CONTIGS.fasta -r my_contigs1_ct -m 32 -t 32 -p true -db virion 2>&1 | tee output.log


### Use Case Suggestions/Settings
#### *Annotation*

If you just want to annotate your pre-selected virus sequences and make genome maps, run Cenote-Taker 2 using `-am True`.

Example:

clip and wrap circular sequences

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_VIRUSES.fasta -r viruses_am_ct -m 32 -t 32 -p False -am True

do not wrap circular sequences, but label DTR regions

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_VIRUSES.fasta -r viruses_am_ct -m 32 -t 32 -p False -am True --wrap False


For very divergent genomes, setting `-hh hhsearch` will marginally improve number of genes that are annotated. This setting increasese the run time quite a bit. On the other hand, setting `-hh none` will skip the time consuming hhblits step. With this, you'll still get pretty good genome maps, and might be most appropriate for very large virus genome databases, or for runs where you just want to do a quick check.

#### *Discovery + Annotation*

**Virus-like particle (VLP) prep assembly:**

`-p False -db standard`

You might apply a size cutoff for linear contigs as well, e.g. ` --minimum_length_linear 3000` OR `--minimum_length_linear 5000`. Changing length minima does not affect false positive rates, but short linear contigs may not be useful, depending on your goals.

Example:

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_VLP_ASSEMBLY.fasta -r my_VLP1_ct -m 32 -t 32 -p False -db standard --minimum_length_linear 3000


**Whole genome shotgun (WGS) metagenomic assembly:**

`-p True -db virion --minimum_length_linear 3000 --lin_minimum_hallmark_genes 2`

While you should definitely ***definitely*** prune virus sequences from WGS datasets, [CheckV](https://bitbucket.org/berkeleylab/checkv/src/master/) also does a very good job (I'm still formally comparing these approaches) and you could use `--prune_prophage False` on a metagenome assembly and feed the unpruned contigs from Unlimited Breadsticks into `checkv end_to_end` if you prefer. My suggestion is to prune with `Cenote-Taker 2`, then run `CheckV`.

Example with prune:

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_WGS_ASSEMBLY.fasta -r my_WGS1_ct -m 32 -t 32 -p True -db virion --minimum_length_linear 3000 --lin_minimum_hallmark_genes 2


**Bacterial isolate genome or MAG**

`-p True -db virion --minimum_length_linear 3000 --lin_minimum_hallmark_genes 2`

Using `--lin_minimum_hallmark_genes 1 -db virion` with WGS or bacterial genome data will (in my experience) yield very few sequences that appear to be false positives, however, there are lots of "degraded" prophage sequences in these sequencing sets, i.e. some/most genes of the phage have been lost. That said, sequence with just 1 hallmark gene is neither a guarantee of a degraded phage (especially in the case of ssDNA viruses) nor is 2+ hallmark a guarantee of of a complete phage.

Example:

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_BACTERIAL_GENOME.fasta -r my_genome1_ct -m 32 -t 32 -p True -db virion --minimum_length_linear 3000 --lin_minimum_hallmark_genes 2


**RNAseq assembly of any kind (if you only want RNA viruses)**

`-p False -db rna_virus`

If you also want DNA virus transcripts, or if your data is mixed RNA/DNA sequencing, you might do a run with `-db rna_virus`, then, from this run, take the file "other_contigs/non_viral_domains_contigs.fna" and use it as input for another run with `-db virion`.

Example:

python /path/to/Cenote-Taker2/run_cenote-taker2.py -c MY_METATRANSCRIPTOME.fasta -r my_metatrans1_ct -m 32 -t 32 -p False -db rna_virus


## Prepare files for Vcontact2

[Vcontact2](https://bitbucket.org/MAVERICLab/vcontact2/src/master/) is a popular downstream tool for clustering phage genomes into genus-level bins. Here's an example of how to prepare files from `Cenote-Taker 2`. 

change directory to a Cenote-Taker 2 output directory

specify summary file (name based on run title):

ls *_CONTIG_SUMMARY.tsv SUMMARY="cenote_out_CONTIG_SUMMARY.tsv"

make files for VContact2

if [ -s vcontact2_gene_to_genome1.csv ] || [ -s vcontact2_all_proteins.faa ] ; then echo "vcontact2 files already exist. NOT overwriting." ; else echo "protein_id,contig_id,keywords" > vcontact2_gene_to_genome1.csv ; tail -n+2 $SUMMARY | cut -f2,4 | while read VIRUS END ;do if [[ "$END" == "DTR" ]] ; then AA=$( find . -type f -name "${VIRUS}.rotate.AA.sorted.fasta" ) ; else AA=$( find . -type f -name "${VIRUS}.AA.sorted.fasta" ) ; fi ; grep -F ">" $AA | cut -d " " -f1 | sed 's/>//g' | while read LINE ; do echo "${LINE},${VIRUS}" ; done >> vcontact2_gene_to_genome1.csv ; cat $AA >> vcontact2_all_proteins.faa ; done ; fi


## All arguments:

usage: run_cenote-taker2.py [-h] -c ORIGINAL_CONTIGS -r RUN_TITLE -p PROPHAGE -m MEM -t CPU

                        [-am ANNOTATION_MODE]
                        [--template_file TEMPLATE_FILE] 
                        [--reads1 F_READS]
                        [--reads2 R_READS]
                        [--minimum_length_circular CIRC_LENGTH_CUTOFF]
                        [--minimum_length_linear LINEAR_LENGTH_CUTOFF]
                        [-db VIRUS_DOMAIN_DB]
                        [--lin_minimum_hallmark_genes LIN_MINIMUM_DOMAINS]
                        [--circ_minimum_hallmark_genes CIRC_MINIMUM_DOMAINS]
                        [--known_strains HANDLE_KNOWNS]
                        [--blastn_db BLASTN_DB]
                        [--enforce_start_codon ENFORCE_START_CODON]
                        [-hh HHSUITE_TOOL] 
                        [--crispr_file CRISPR_FILE]
                        [--isolation_source ISOLATION_SOURCE]
                        [--Environmental_sample ENVIRONMENTAL_SAMPLE]
                        [--collection_date COLLECTION_DATE]
                        [--metagenome_type METAGENOME_TYPE]
                        [--srr_number SRR_NUMBER]
                        [--srx_number SRX_NUMBER] 
                        [--biosample BIOSAMPLE]
                        [--bioproject BIOPROJECT] 
                        [--assembler ASSEMBLER]
                        [--molecule_type MOLECULE_TYPE]
                        [--data_source DATA_SOURCE]
                        [--filter_out_plasmids FILTER_PLASMIDS]
                        [--scratch_directory SCRATCH_DIR]
                        [--blastp BLASTP] 
                        [--orf-within-orf ORF_WITHIN]
                        [--cenote-dbs CENOTE_DBS] [--wrap WRAP]
                        [--hallmark_taxonomy HALLMARK_TAX]

Cenote-Taker 2 is a pipeline for virus discovery and thorough annotation of viral contigs and genomes. Visit https://github.com/mtisza1/Cenote-Taker2#use-case-suggestionssettings for suggestions about how to run different data types and https://github.com/mtisza1/Cenote-Taker2/wiki to read more. Version 2.1.5

optional arguments:

-h, --help show this help message and exit

REQUIRED ARGUMENTS for Cenote-Taker2 :

-c ORIGINAL_CONTIGS, --contigs ORIGINAL_CONTIGS

                    Contig file with .fasta extension in fasta format - OR

                    - assembly graph with .fastg extension. Each header

                    must be unique before the first space character

-r RUN_TITLE, --run_title RUN_TITLE

                    Name of this run. A directory of this name will be

                    created. Must be unique from older runs or older run

                    will be renamed. Must be less than 18 characters,

                    using ONLY letters, numbers and underscores (_)

-p PROPHAGE, --prune_prophage PROPHAGE

                    True or False. Attempt to identify and remove flanking

                    chromosomal regions from non-circular contigs with

                    viral hallmarks (True is highly recommended for

                    sequenced material not enriched for viruses. Virus

                    enriched samples probably should be False (you might

                    check with ViromeQC). Also, please use False if

                    --lin_minimum_hallmark_genes is set to 0)

-m MEM, --mem MEM example: 56 -- Gigabytes of memory available for

                    Cenote-Taker2. Typically, 16 to 32 should be used.

                    Lower memory will work in theory, but could extend the

                    length of the run

-t CPU, --cpu CPU Example: 32 -- Number of CPUs available for Cenote-

                    Taker2. Approximately 32 CPUs should be used

                    moderately sized metagenomic assemblies. For large

                    datasets, increased performance can be seen up to 120

                    CPUs. Fewer than 16 CPUs will work in theory, but

                    could extend the length of the run. See GitHub repo

                    for suggestions.

OPTIONAL ARGUMENTS for Cenote-Taker2. Most of which are important to consider!!! GenBank typically only accepts genome submission with ample metadata. See https://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#ModifiersPage for more information on GenBank metadata fields:

-am ANNOTATION_MODE, --annotation_mode ANNOTATION_MODE

                    Default: False -- Annotate sequences only (skip

                    discovery). Only use if you believe each provided

                    sequence is viral

--template_file TEMPLATE_FILE

                    Template file with some metadata. Real one required

                    for GenBank submission. Takes a couple minutes to

                    generate: https://submit.ncbi.nlm.nih.gov/genbank/temp

                    late/submission/

--reads1 F_READS Default: no_reads -- ILLUMINA READS ONLY: First Read

                    file in paired read set - OR - read file in unpaired

                    read set - OR - read file of interleaved reads. Used

                    for coverage depth determination.

--reads2 R_READS Default: no_reads -- ILLUMINA READS ONLY: Second Read

                    file in paired read set. Disregard if not using paired

                    reads. Used for coverage depth determination.

--minimum_length_circular CIRC_LENGTH_CUTOFF

                    Default: 1000 -- Minimum length of contigs to be

                    checked for circularity. Bare minimun is 1000 nts

--minimum_length_linear LINEAR_LENGTH_CUTOFF

                    Default: 1000 -- Minimum length of non-circualr

                    contigs to be checked for viral hallmark genes.

-db VIRUS_DOMAIN_DB, --virus_domain_db VIRUS_DOMAIN_DB

                    default: virion -- 'standard' database: all virus (DNA

                    and RNA) hallmark genes (i.e. genes with known

                    function as virion structural, packaging, replication,

                    or maturation proteins specifically encoded by virus

                    genomes) with low false discovery rate. 'virion'

                    database: subset of 'standard', hallmark genes

                    encoding virion structural proteins, packaging

                    proteins, or capsid maturation proteins (DNA and RNA

                    genomes) with LOWEST false discovery rate. 'rna_virus'

                    database: For RNA virus hallmarks only. Includes RdRp

                    and capsid genes of RNA viruses. Low false discovery

                    rate.

--lin_minimum_hallmark_genes LIN_MINIMUM_DOMAINS

                    Default: 1 -- Number of detected viral hallmark genes

                    on a non-circular contig to be considered viral and

                    recieve full annotation. WARNING: Only choose '0' if

                    you have prefiltered the contig file to only contain

                    putative viral contigs (using another method such as

                    VirSorter or DeepVirFinder), or you are very confident

                    you have physically enriched for virus particles very

                    well (you might check with ViromeQC). Otherwise, the

                    duration of the run will be extended many many times

                    over, largely annotating non-viral contigs, which is

                    not what Cenote-Taker2 is meant for. For unenriched

                    samples, '2' might be more suitable, yielding a false

                    positive rate near 0.

--circ_minimum_hallmark_genes CIRC_MINIMUM_DOMAINS

                    Default:1 -- Number of detected viral hallmark genes

                    on a circular contig to be considered viral and

                    recieve full annotation. For samples physically

                    enriched for virus particles, '0' can be used, but

                    please treat circular contigs without known viral

                    domains cautiously. For unenriched samples, '1' might

                    be more suitable.

--known_strains HANDLE_KNOWNS

                    Default: do_not_check_knowns -- do not check if

                    putatively viral contigs are highly related to known

                    sequences (via MEGABLAST). 'blast_knowns': REQUIRES '

                    --blastn_db' option to function correctly.

--blastn_db BLASTN_DB

                    Default: none -- Set a database if using '--

                    known_strains' option. Specify BLAST-formatted

                    nucleotide datase. Probably, use only GenBank 'nt'

                    database downloaded from ftp://ftp.ncbi.nlm.nih.gov/

                    or another GenBank formatted .fasta file to make

                    databse

--enforce_start_codon ENFORCE_START_CODON

                    Default: False -- For final genome maps, require ORFs

                    to be initiated by a typical start codon? GenBank

                    submissions containing ORFs without start codons can

                    be rejected. However, if True, important but

                    incomplete genes could be culled from the final

                    output. This is relevant mainly to contigs of

                    incomplete genomes

-hh HHSUITE_TOOL, --hhsuite_tool HHSUITE_TOOL

                    default: hhblits -- hhblits will query PDB, pfam, and

                    CDD to annotate ORFs escaping identification via

                    upstream methods. 'hhsearch': hhsearch, a more

                    sensitive tool, will query PDB, pfam, and CDD to

                    annotate ORFs escaping identification via upstream

                    methods. (WARNING: hhsearch takes much, much longer

                    than hhblits and can extend the duration of the run

                    many times over. Do not use on large input contig

                    files). 'no_hhsuite_tool': forgoes annotation of ORFs

                    with hhsuite. Fastest way to complete a run.

--crispr_file CRISPR_FILE

                    Tab-separated file with CRISPR hits in the following

                    format: CONTIG_NAME HOST_NAME NUMBER_OF_MATCHES. You

                    could use this tool:

                    https://github.com/edzuf/CrisprOpenDB. Then reformat

                    for Cenote-Taker 2

--isolation_source ISOLATION_SOURCE

                    Default: unknown -- Describes the local geographical

                    source of the organism from which the sequence was

                    derived

--Environmental_sample ENVIRONMENTAL_SAMPLE

                    Default: False -- True or False, Identifies sequence

                    derived by direct molecular isolation from an

                    unidentified organism

--collection_date COLLECTION_DATE

                    Default: unknown -- Date of collection. this format:

                    01-Jan-2019, i.e. DD-Mmm-YYYY

--metagenome_type METAGENOME_TYPE

                    Default: unknown -- a.k.a. metagenome_source

--srr_number SRR_NUMBER

                    Default: unknown -- For read data on SRA, run number,

                    usually beginning with 'SRR' or 'ERR'

--srx_number SRX_NUMBER

                    Default: unknown -- For read data on SRA, experiment

                    number, usually beginning with 'SRX' or 'ERX'

--biosample BIOSAMPLE

                    Default: unknown -- For read data on SRA, sample

                    number, usually beginning with 'SAMN' or 'SAMEA' or

                    'SRS'

--bioproject BIOPROJECT

                    Default: unknown -- For read data on SRA, project

                    number, usually beginning with 'PRJNA' or 'PRJEB'

--assembler ASSEMBLER

                    Default: unknown_assembler -- Assembler used to

                    generate contigs, if applicable. Specify version of

                    assembler software, if possible.

--molecule_type MOLECULE_TYPE

                    Default: DNA -- viable options are DNA - OR - RNA

--data_source DATA_SOURCE

                    default: original -- original data is not taken from

                    other researchers' public or private database.

                    'tpa_assembly': data is taken from other researchers'

                    public or private database. Please be sure to specify

                    SRA metadata.

--filter_out_plasmids FILTER_PLASMIDS

                    Default: True -- True - OR - False. If True, hallmark

                    genes of plasmids will not count toward the minimum

                    hallmark gene parameters. If False, hallmark genes of

                    plasmids will count. Plasmid hallmark gene set is not

                    necessarily comprehensive at this time.

--scratch_directory SCRATCH_DIR

                    Default: none -- When running many instances of

                    Cenote-Taker2, it seems to run more quickly if you

                    copy the hhsuite databases to a scratch space

                    temporarily. Use this argument to set a scratch

                    directory that the databases will be copied to (at

                    least 100GB of scratch space are required for copying

                    the databases)

--blastp BLASTP Do not use this argument as of now.

--orf-within-orf ORF_WITHIN

                    Default: False -- Remove called ORFs without HMMSCAN

                    or RPS-BLAST hits that begin and end within other

                    ORFs? True or False

--cenote-dbs CENOTE_DBS

                    Default: cenote_script_path -- If you downloaded and

                    setup the databases in a non-standard location,

                    specify path

--wrap WRAP Default: True -- Wrap/rotate DTR/circular contigs so

                    the start codon of an ORF is the first nucleotide in

                    the contig/genome

--hallmark_taxonomy HALLMARK_TAX

                    Default: False -- Get hierarchical taxonomy

                    information for all hallmark genes? This report

                    (*.hallmarks.taxonomy.out) is not considered in the

                    final taxonomy call.

## Directory Tree
![Directory Tree Image](../master/cenote-taker2_directory_tree2.png)

## Citation
Michael J Tisza, Anna K Belford, Guillermo Domínguez-Huerta, Benjamin Bolduc, Christopher B Buck, Cenote-Taker 2 democratizes virus discovery and sequence annotation, Virus Evolution, Volume 7, Issue 1, January 2021, veaa100, https://doi.org/10.1093/ve/veaa100