zorino / kaamer

kaamer - protein identification based on amino acid kmers
Apache License 2.0
11 stars 3 forks source link

K-mer hash question #26

Open jwcodee opened 1 year ago

jwcodee commented 1 year ago

I have a question regarding the following sentences in the manuscript

The fixed k-mer size of 7 was chosen to fit on 4 bytes and keep a manageable database size while offering good specificity over protein targets.

The first KV store (k-mer store) keeps the association of every k-mer (key) with a hash value (key length: 8 bytes) that is the entry to the combination store. 

Is this saying that 7-mers are first hashed into 4-bytes and then used to associate with an 8-bytes hash value?

zorino commented 1 year ago

Yes, the 8-bytes hash is the key for the combination store. A combination hold the list of genomes which include the 7-mer.