soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
529 stars 133 forks source link

HHFilter -diff M parameter return less sequences than M #323

Open ZwormZ opened 2 years ago

ZwormZ commented 2 years ago

Hi! I'm using HHFilter 3.3.0 to sample sequences from MSA file which contains tens of thousands of homologous sequences .

The command is: /userhome/anaconda3/envs/proteins/bin/hhfilter -i /userhome/data/TS2/msa/7KVT_B.a2m -o /userhome/data/TS2/7KVT_B_filter.a2m -id 90 -diff 512 -cov 0 -qid 0 -qsc -20.0

I set -diff 512 parameter, which should return 512 or more sequences that maximize diversity (the result is usually close to 512 ) according to the Help manual, but infact I got less than 512 sequences.

I set different numbers for the -diff parameter, and got diffrent results. The results are shown in the following figure, I think may be the number of sequences returned is weird. Is that normal? image