takaram / kofam_scan

CLI tool to annotate genes with KOfam
https://www.genome.jp/tools/kofamkoala/
MIT License
66 stars 11 forks source link

Guidance for KOfam score threshold adjustment (threshold-scale) #30

Open Bernhard-Steindl opened 1 year ago

Bernhard-Steindl commented 1 year ago

Hello,

I have a question regarding the adaptive score threshold of the KOfam database and the possibility to adjust it in KofamScan with the option -T, --threshold-scale.

-T, --threshold-scale=VALUE
The score thresholds are multiplied by VALUE. For example, with -T2 option, the thresholds become twice as strict.

Do you have some guidance or experience whether to adjust the score threshold or how to chose it?

In the paper of Takuya Aramaki et al. with the title "KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold" (DOI https://doi.org/10.1093/bioinformatics/btz859) I read that the KOfam database of profile hidden Markov models (pHMMs) contains an adaptive score threshold that is pre-computed for every KO family and if a new sequence has a hmmsearch score above this threshold it is considered a reliable match and KoFam scan highlights it with an asterisk (*) in the output file. It is described that the adaptive score threshold is determined by maximizing the F-measure over a positive and negative datasets' sequence similarity scores (bit scores) and computed pHMMs. Thus, the adaptive score threshold is a criterion to assign a KO to new sequences.

I am using KofamScan for my own project and I wonder whether it makes sense or it is advisable to relax the score threshold, and vice-versa to make the threshold more strict with KofamScan's -T, --threshold-scale option?

Can you share some guidance or your experience about this option please? Thanks and BR, Bernhard