Both KMC and Jellyfish have a histo module that calculates frequency counts from a kmer count table. These kmer spectra are useful for many tasks i.e. genome size estimation.
My thoughts on this. Comments welcome.
I guess we would use unique values from the KmerCountTable as keys in a HashMap and then increment the counts for each observation.
This seems like a problem that would benefit from multithreading, though not sure how.
Should this be a separate function that takes a KmerHashTable as input, or a method of KmerHashTable?
I think the output should be something that is easy for the user to cast to a pd.dataframe or np.array and leave writing to file up to them. Maybe a tuple of lists?
Both KMC and Jellyfish have a
histo
module that calculates frequency counts from a kmer count table. These kmer spectra are useful for many tasks i.e. genome size estimation.My thoughts on this. Comments welcome.