2bLCA and top hit (--lca-mode) differ in search sensitivity

soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite

GNU General Public License v3.0

1.32k stars 185 forks source link

I'm comparing MMSeqs2 taxonomic assignment with approx. 2bLCA and top hit and noticed that the later approach classifies more genes than the former. I extracted the alignments using --extract-lines 1 and the top hit search had more hits to the database. All parameters were the same with the exception of --lca-mode.

Example:

mmseqs taxonomy querydb/querydb gtdb_r202/gtdb_r202 taxonomydb/taxonomydb tmp -s 3.0 --lca-mode 3 --tax-output-mode 2 --threads 64

Is behavior expected? If so, what is causing this difference?

I'm using release 13-45111.

Thanks!

soedinglab / MMseqs2

2bLCA and top hit (--lca-mode) differ in search sensitivity #465