jermp / sshash

A compressed, associative, exact, and weighted dictionary for k-mers.
MIT License
84 stars 17 forks source link

Partitioned phf #37

Closed rickbeeloo closed 10 months ago

rickbeeloo commented 11 months ago

I think we can already add this.

The lookups were tested with https://github.com/rickbeeloo/UniqueMers, with 2^32 unique 17-mer lookup being correct according to the sshash --check (UniqueMers kmers.fasta 17 4294967296 0).

I just added "tail" to the linked repo to add random sequences after each kmer allowing for the navigational queries to be tested, but instead of re-running sshash again for this I think we can better do Fulgor immediately.

rickbeeloo commented 11 months ago

Hey @jermp, what did you actually mean? The lookup struct right (the contig size)?

jermp commented 11 months ago

Hey @jermp, what did you actually mean? The lookup struct right (the contig size)?

Yes, I would default everything to uint64_t.

rickbeeloo commented 11 months ago

There we go, everything to uint_64:

Let me know :)

P.s. I think you should do a read haha cause for example I removed this pragma push as it's now two uint64s but maybe it's necessary in c++ to have it for 8 bytes here.

jermp commented 10 months ago

Sorry for the confusion but in my comment I was referring to the uint32_t inside the lookup_result struct. The other place where uint32_t is used are just fine. So I'm gonna close this pull request and make the change myself to avoid conflicts. Thanks!

jermp commented 10 months ago

Date as of https://github.com/jermp/sshash/commit/fbe00d9f1ca4fa875177deba11a7e9b9e5c2b71a.