add bindash support instead of dashing

Hello Ben,

I was investigating MinHash algorithm heavily in the past several months. In terms of simple minhash, that is to estimate jaccard in traditional manner, b-bit One Permutation MinHash with optimal densification (https://dl.acm.org/doi/abs/10.1145/1772690.1772759, https://proceedings.neurips.cc/paper/2012/file/eaa32c96f620053cf442ad32258076b9-Paper.pdf ,http://proceedings.mlr.press/v70/shrivastava17a.html) represents the most space and time efficient algorithm among all others, including hyperloglog. It was implemented in the bindash software (https://academic.oup.com/bioinformatics/article/35/4/671/5058094), since Xiaofei left academia, it was not further developed as dashing was (dashing 2 for example). However, after several experiments, e.g. all versus all distance computation for all NCBI genomes, bindash is the fastest (I use kmer 16 and sketch size 12000 to have 95% ANI level accuracy) I have ever seen, about 2 times faster than dashing. It supports only nucleotide but not amino acid as dashing and Mash do. I would suggest do not use finch because it is memory inefficient for large number of genomes. What do you think.

Thanks,

Jianshu

wwood / galah

add bindash support instead of dashing #27