sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
477 stars 79 forks source link

HLL improved estimator #3294

Open jianshu93 opened 3 months ago

jianshu93 commented 3 months ago

Hi @luizirber and @ctb,

For estimating the cardinality of kmer set from a genome, maximum likelihood method is preferred since it is more accurate (theoretical lower bound actually) but slower, is that possible to also have the improved estimator in Ertl 2017 paper (equation 10), which is also implemented in Dashing 1? It is as accurate as the traditional HLL estimator (bias correction is needed for small cardinality for traditional HLL).

Thanks,

Jianshu