iqbal-lab-org / cobs

COBS - Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
https://panthema.net/cobs
MIT License
16 stars 2 forks source link

Format of output #7

Closed graceblackwell closed 8 months ago

graceblackwell commented 3 years ago

Could the percentage/proportion of query kmers present in each sample be reported rather than the number of kmers present?

Zhicheng-Liu commented 3 years ago

Hi @graceblackwell , as a temporary workaround you can safely derive the percentage of matched kmers from the number of kmers present divided by (length of query - length of kmer + 1).

The cobs index search does not account for duplication in query kmers, i.e it does not do de-duplication before searching. This is different from bigsi in which the query kmers are deduplicated. For example, if the query is AAAA and the kmer length is 3, then for cobs index search, there are going to be 2 matched kmers for AAA, while for bigsi index search, the result will be 1 matched kmer.