KofamScan is a gene function annotation tool based on KEGG Orthology and hidden Markov model. You need KOfam database to use this tool. Online version is available on https://www.genome.jp/tools/kofamkoala/ .
profiles/
directory and ko_list
.config.yml
in the same directory as exec_annotation
script. See below for details.exec_annotation
.$ ./exec_annotation -o result.txt query.fasta
A query file is a FASTA file with one or more amino acid sequences. You cannot use nucleotide sequences. Each sequence must have a unique name. A name of a sequence is a string between the header symbol (">") and the first blank character (whitespace, tab, line break, etc.). Do not put a whitespace right after ">".
Specify the path of the profile database you downloaded by giving --profile
option to the command or writing it to config.yml
. The path can be a directory, .hmm file, or .hal file.
If it is a directory, .hmm files in the directory will be used.
If a .hmm file, only the file will be used.
If a .hal file, files listed in the .hal file will be used. File paths in a .hal file are either absolute or relative to the directory of the file. Lines start with # are ignored.
KOfam has prokaryote.hal
and eukaryote.hal
in profiles
directory. They are lists of profiles excluding eukaryote- and prokaryote-specific KOs respectively.
If you are interested in only several KOs, you can make your original .hal file and use it as a database. It will reduce computation time.
-o FILE
FILE
. It defaults to stdout
.-p
, --profile=PROFILE
PROFILE
as a profile database. See Profiles-k
, --ko-list=FILE
FILE
as a KO list.--cpu=N
hmmsearch
processes started simultaneously to N
. It defaults to 1 unless it is set in config.yml
.-c FILE
FILE
as a config file instead of config.yml
in the same directory as exec_annotation
.--tmp-dir=DIR
DIR
as a temporary directory where hmmsearch results are. It will be created if not exist. It defaults to ./tmp
.-E
, --e-value=VALUE
VALUE
. If not, an asterisk will not be added in detail
format or the hit will not be reported in other formats.-T
, --threshold-scale=VALUE
VALUE
. For example, with -T2
option, the thresholds become twice as strict.-f
, --format=FORMAT
FORMAT
. Three formats below are available.detail
detail-tsv
detail
format.mapper
mapper-oneline
mapper
, but when more than one KO are assigned to a gene, all assigned KO are shown in one line separated by tabs.--[no-]report-unannotated
--report-unannotated
option, gene names are shown even when no KO is assigned (default when --format=mapper(-oneline)
). With --no-report-unannotated
such genes are not shown at all (default when --format=detail
).--create-alignment
hmmsearch
's normal outputs per profile are stored in the temporary directory. In addition, domain information and alignments in the outputs will be rearranged per query.--reannotation
-r
, --reannotation
hmmsearch
and assume that hmmsearch
outputs are already in the temporary directory. This will help you to make an output in a different format or redo annotation changing thresholds.--create-alignment
-h
, --help
The following variables can be set by config.yml
.
--profile
option takes precedence.--ko-list
option takes precedence.hmmsearch
processes started simultaneously.--cpu
option takes precedence.hmmsearch
executable. If not given, it will be searched for PATH.parallel
executable. If not given, it will be searched for PATH.This software is released under the MIT License.