Open tseemann opened 8 years ago
Thanks for reporting this. The error rate/Genome size model isn't doing to well with high coverage. This is mostly due to the assumption that k-mers at distance 2 from the truth would not be observed twice by chance. Once you have higher coverage you would need to model that explicitly.
I'm working on extending the model so that it can account better for this. Additionally I'll have to replace the brentp method that is not working in #11
I sense a Catch-22 however - how do I know I have too much depth until I know the genome size?
Pall,
I have a bacterial genome 2866389 bp (2.8 Mbp) which is finished/closed and a set of Illumina PE reads to 90x coverage.
I ran the following for Q = 20, 10 and 0 and then put it through
KmerStreamEstimate
I am using git
HEAD
with the recent commits.I was hoping to see some consistency with the
G
estimates, but it seems to be changing wrtk
?Q=20
Q=10
Q=0