qiyunlab / HGTector

HGTector2: Genome-wide prediction of horizontal gene transfer based on distribution of sequence homology patterns.
BSD 3-Clause "New" or "Revised" License
131 stars 35 forks source link
evolution genomics horizontal-gene-transfer

HGTector2

The development of HGTector is now at qiyunlab. Versions starting from 2.0b3 will be released from this repo. Please access HGTector using the new URL: https://github.com/qiyunlab/HGTector.

HGTector2 is a completely re-engineered software tool, featuring a fully automated analytical pipeline with smart determination of parameters which requires minimum human involvement, a re-designed command-line interface which facilitates standardized scientific computing, and a high-quality Python 3 codebase.

HGTector is a computational pipeline for genome-wide detection of putative horizontal gene transfer (HGT) events based on sequence homology search hit distribution statistics.

Documentation

What's New

Installation

Tutorials

References

Quick start

Set up a Conda environment and install dependencies:

conda create -n hgtector -c conda-forge python=3 pyyaml pandas matplotlib scikit-learn bioconda::diamond
conda activate hgtector

Install HGTector2:

pip install git+https://github.com/qiyunlab/HGTector.git

Then you will be able to type hgtector to run the program. Here are more details of installation.

Build a reference database using the default protocol:

hgtector database -o db_dir --default

Or download a pre-built database as of 2023-01-02, and compile it.

Prepare input file(s). They should be multi-Fasta files of amino acid sequences (faa). Each file represents the whole protein set of a complete or partial genome.

Perform homology search:

hgtector search -i input.faa -o search_dir -m diamond -p 16 -d db_dir/diamond/db -t db_dir/taxdump

Perform HGT prediction:

hgtector analyze -i search_dir -o analyze_dir -t hgtdb/taxdump

Examine the prediction results under the analyze_dir directory.

It is recommended that you read the first run, second run and real runs pages to get familiar with the pipeline, the underlying methodology, and the customization options.

License

Copyright (c) 2013-2023, Qiyun Zhu and Katharina Dittmar. Licensed under BSD 3-clause. See full license statement.

Citation

Zhu Q, Kosoy M, Dittmar K. HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers. BMC Genomics. 2014. 15:717.