takaram / kofam_scan

CLI tool to annotate genes with KOfam
https://www.genome.jp/tools/kofamkoala/
MIT License
66 stars 11 forks source link

KofamScan

KofamScan is a gene function annotation tool based on KEGG Orthology and hidden Markov model. You need KOfam database to use this tool. Online version is available on https://www.genome.jp/tools/kofamkoala/ .

Requirements

Usage

  1. Download KOfam database from ftp://ftp.genome.jp/pub/db/kofam/ and decompress it. You will get profile HMMs in profiles/ directory and ko_list.
  2. Create config.yml in the same directory as exec_annotation script. See below for details.
  3. Execute exec_annotation.
$ ./exec_annotation -o result.txt query.fasta

Query file

A query file is a FASTA file with one or more amino acid sequences. You cannot use nucleotide sequences. Each sequence must have a unique name. A name of a sequence is a string between the header symbol (">") and the first blank character (whitespace, tab, line break, etc.). Do not put a whitespace right after ">".

Profiles

Specify the path of the profile database you downloaded by giving --profile option to the command or writing it to config.yml. The path can be a directory, .hmm file, or .hal file. If it is a directory, .hmm files in the directory will be used. If a .hmm file, only the file will be used. If a .hal file, files listed in the .hal file will be used. File paths in a .hal file are either absolute or relative to the directory of the file. Lines start with # are ignored.

KOfam has prokaryote.hal and eukaryote.hal in profiles directory. They are lists of profiles excluding eukaryote- and prokaryote-specific KOs respectively. If you are interested in only several KOs, you can make your original .hal file and use it as a database. It will reduce computation time.

Options

config.yml

The following variables can be set by config.yml.

License

This software is released under the MIT License.