oxli-bio / oxli

k-mers and the like
BSD 3-Clause "New" or "Revised" License
15 stars 0 forks source link

Question: Migrating from Khmer #88

Open dcdanko opened 17 hours ago

dcdanko commented 17 hours ago

Hi,

My team is currently using Khmer and we want to upgrade to Oxli. We currently use khmer's Nodetable to store binary present/not-present kmer sets. We have a few large precomputed nodetables.

My questions:

Thank you for your help!

Adamtaranto commented 13 hours ago

Hi David,

Atm there are three ways to manually set the counts in a KmerCountTable object.

  1. Populate KmerCountTable from json file.

Oxli supports serialisation of KmerCountTable objects into json format, you can modify this file and load it back into a new object. See wiki description.

2) Set individual kmer values using dictionary syntax.

from oxli import KmerCountTable

# Create new count table
kct = KmerCountTable(ksize=4) # Note: Use "store_kmers=True" only if you need to retrieve a list of all kmers in table. This option slows counting.

# Manually add new kmer and set count
kct['GGGG'] = 1000

# Only canonical kmer is stored
kct.get('GGGG')
>>> 1000

kct.get('CCCC')
>> 1000

3) Add counts with user specified hash.

This might be useful if you only have hashes for canonical kmers stored.

# Add and increment count for hash
kct.count_hash(6779379503393060785)

kct.get("AACC")
>>> 1

kct.get("GGTT")
>>> 1

# Increment count
kct.count_hash(6779379503393060785)

kct.get("AACC")
>>> 2

I could add support for bulk kmer + count upload from a tab delimited file if that would be useful. See #77.

Do you need to store kmers and their reverse complement separately? Oxli currently stores counts under the canonical kmer.

Adamtaranto commented 10 hours ago

@ctb might be worth adding a khmer migration tutorial to the wiki.

lmk what you think is the most efficient way to do this.