Open apcamargo opened 2 years ago
Yes, this code has changed a lot in preparation for the profile-profile search. Martin has fitted new values. I'd recommend to use the new default values for them. There is now also two different pseudo count modes, the new ones is similar to the HHblits pseudocounts and much slower.
Thanks, Milot!
The new mode is --pseudo-cnt-mode 1
(context-specific)? And what are the new --pca
and --pcb
default values? They are not showing up in the help dialogue.
--pca Pseudo count admixture strength []
--pcb Pseudo counts: Neff at half of maximum admixture (range 0.0-inf) []
My limitation is that this is part of a package that will be distributed in Conda, so I need to be compatible with the MMSeqs2 version that is on Conda. Profile databases created with the latest version will fail if I try to search them with 13-45111
. But I could try to use the new default --pca
and --pcb
when creating the profile database with 13-45111
.
Do you guys have plans to push a new GitHub/Conda release in the near future?
Ah that looks like a bug, it should print out the default value.
The new values are:
pca = MultiParam<PseudoCounts>(PseudoCounts(1.1, 1.4));
pcb = MultiParam<PseudoCounts>(PseudoCounts(4.1, 5.8));
The first value is --pseudo-cnt-mode 0
the second one is --pseudo-cnt-mode 1
Profile databases with the newer commits won't work anymore with 13 and before.
Yes, we are planing to make a new release, but there is a lot going on :/ Hopefully soon.
Thanks!
So, if I create a profile database in 13-45111
with a command like this:
mmseqs msa2profile msa_db/msa_db profile_db_pseudo/profile_db --match-mode 1 --match-ratio 0.5 --threads 64 --pca 1.1 --pcb 4.1
It should give me a database with the same pseudocounts as the default parameters of the newer releases? I know that there were other changes in the way profile databases work, but I wanted to improve sensitivity and stay compatible with the Conda release.
I've been evaluating how adding pseudocounts change the sensitivity of profile searches.
I noticed, however, that the search results are different depending on the version of MMSeqs2. If I use the latest GitHub/Conda release (
13-45111
), the search on theprofile_db_pseudo
will provide more results (as expected, given that the alignments are not very diverse). If I use a newer release (92deb92fb46583b4c68932111303d12dfa121364
), the search on the database with pseudocounts will results in less hits.Were there any changes in MMSeqs2's behavior regarding pseudocounts? Also, are there recommendations about how to use the
--pca
parameter?