Closed nhartwic closed 6 months ago
Hi,
Yes, this makes a lot of sense to me. We may have to span a broader range to accommodate 'legacy', simplex and duplex data. I would include Q25, so the proposed 9, 13, 17, 21, and 25 seem appropriate.
Thanks! Wouter
I agree it would be nice to update the quality thresholds for the reported stats.
Or perhaps even better: add a new command-line option to specify an arbitrary list of threshold Q values (in which case, the current list of [5, 7, 10, 12, 15] could be left unchanged, as the default).
Quality thresholds have been updated to [10, 15, 20, 25, 30] in nanomath v1.4.0. I understand that adding new command line options would result in increased flexibility, but at the end of the day everything will be a command line option and that would be a mess.
Nanoplot (actually nanomath) computes the percentage of reads with average quality above a threshold. At the moment, those thresholds are 5, 7, 10, 12, and 15. When these packages were being written, these quality thresholds made a lot of sense. But ONT basecalling has improved dramatically since that time. I think the tool would benefit from having these values revised. I'm thinking something like 10, 12, 15, 17, 20 as new values or maybe 9, 13, 17, 21, 25. This should be as simple as updating the array defined at...
https://github.com/wdecoster/nanomath/blob/40aa42a11bd056c268ed10a5bc25a3f99a538317/nanomath/nanomath.py#LL51C18-L51C30
...I can put together the relevant pull requests if you would like. I wanted to post this issue so that we could potentially discuss what values make the most sense, or if other changes make more sense, before I make the PR.