Open dcdanko opened 17 hours ago
Hi David,
Atm there are three ways to manually set the counts in a KmerCountTable
object.
KmerCountTable
from json file.Oxli supports serialisation of KmerCountTable
objects into json format, you can modify this file and load it back into a new object. See wiki description.
2) Set individual kmer values using dictionary syntax.
from oxli import KmerCountTable
# Create new count table
kct = KmerCountTable(ksize=4) # Note: Use "store_kmers=True" only if you need to retrieve a list of all kmers in table. This option slows counting.
# Manually add new kmer and set count
kct['GGGG'] = 1000
# Only canonical kmer is stored
kct.get('GGGG')
>>> 1000
kct.get('CCCC')
>> 1000
3) Add counts with user specified hash.
This might be useful if you only have hashes for canonical kmers stored.
# Add and increment count for hash
kct.count_hash(6779379503393060785)
kct.get("AACC")
>>> 1
kct.get("GGTT")
>>> 1
# Increment count
kct.count_hash(6779379503393060785)
kct.get("AACC")
>>> 2
I could add support for bulk kmer + count upload from a tab delimited file if that would be useful. See #77.
Do you need to store kmers and their reverse complement separately? Oxli currently stores counts under the canonical kmer.
@ctb might be worth adding a khmer migration tutorial to the wiki.
lmk what you think is the most efficient way to do this.
Hi,
My team is currently using Khmer and we want to upgrade to Oxli. We currently use khmer's Nodetable to store binary present/not-present kmer sets. We have a few large precomputed nodetables.
My questions:
Thank you for your help!