It is often useful to exclude low abundance (erroneous) or high abundance (repeat associated) kmers from a count table.
As a user I'd expect a method called .min() to return all the kmers with the minimum observed count and .max() to be all kmers with the max observed count.
For thresholding at some cutoff value, maybe something like .mincut() and .maxcut() ?
Suggested use:
table = oxli.KmerCountTable(3)
kmers = ["AAA", "GGG", "GGG"]
for kmer in kmers:
table.count(kmer)
table.mincut(2)
>> "Dropped 1 hash with fewer than 2 counts."
table.get("AAA")
>> 0
table.get("GGG")
>> 2
It is often useful to exclude low abundance (erroneous) or high abundance (repeat associated) kmers from a count table.
As a user I'd expect a method called
.min()
to return all the kmers with the minimum observed count and.max()
to be all kmers with the max observed count.For thresholding at some cutoff value, maybe something like
.mincut()
and.maxcut()
?Suggested use:
@ctb?