Open LeahRoberts opened 4 years ago
Thanks for the description! I think the best for this case is to go into debug mode and understand why we have this drop of coverage.
Cheers.
For this issue and https://github.com/rmcolq/pandora/issues/209 , before diving into debugging, I was wondering if we could get the expected results by changing some parameters. When using --illumina
parameter, the error rate gets defaulted to 0.001
so it could be too low. Increased to 0.01
, but there was no effect on these two genes.
Any other parameterization ideas before diving into debugging? Note that this is strictly a mapping issue. Also worth noting that Leah noticed this issue in many genes, it is the main issue she has right now. I am very interested in this issue because it seems we are undermapping reads (not sure if this is also true for ONT reads), and thus we are making less calls than we could have. It seems to me that fixing this could push our recall in the 4-way analysis way up.
What do you think?
This doesn't look like a simple bug to me, and as you say is likely some combination of parameter/algorithm effects. Worth noting that I have got some built in overrides which mean that your command line error rate is not allowed to be higher than 0.1 with the --illumina
flag. The --min_cluster_size
is likely to make more of a difference for increasing detection of genes, but will also increase FPs.
I also think this could take weeks to debug, and lead to even more code changes, so I'm keen to get your existing stuff merged in first and the results we need. I think improving our overall recall in the 4-way is an optimization for later.
I have been looking at differences between two almost identical Klebsiella isolates (KN0056A-F and KN0056A-L) in the Pandora vcf output. For several regions in the reference Pandora is suggesting that one isolate has zero coverage (so absent) while the other is present. However, when I check this gene in the de novo assemblies I find it in both isolates (and with zero differences between them).
This is one example:
Thoughts?
De novo assemblies here: /hps/nobackup2/iqbal/projects/pandora/klebs/neonate/data/KpST17_Norway_20190617/contigs/patient-pairs
Pandora output here: /hps/nobackup/iqbal/leandro/klebs_neonate_leah/pandora_compare_results