Closed leoisl closed 9 months ago
Gene: cpe003_contig_3_00225
cpe003_contig_3_00225
: 1605
cpe003_contig_3_00225
even if we mapped to the whole plasmid instead, if not all of them, given that it can multimap reads;cpe003_contig_3_00225
: 718nb_of_reads gene
1 NZ_AP023221.1_00079.fa.msa
265 NZ_CP020848.1_00225.fa.msa
556 NZ_CP020848.1_00226.fa.msa
cpe003_contig_3_00225
if the PanRG is composed only of this gene: 1539 (718 that we can map with both PanRGs + 821 that mapped to other genes)This data shows that pandora
is able to map almost the same number of reads that minimap2
maps to cpe003_contig_3_00225
(minimap2 can map 1605 reads, pandora can map 1539 reads, a loss of 4% reads, which is totally acceptable IMO and likely due to lower density of minimising kmers on the edges). So there is no significant problem with pandora
mapping algorithm or parameters. The main issue is that we don't handle multimapping in pandora
. 822 of the 1539 reads pandora
can map to cpe003_contig_3_00225
end up being mapped to 3 other genes because they map slightly better there. Solution is to implement multimapping in pandora.
Is this open still?
No, undermapping is not an issue IMO, pandora
does map at a similar rate as minimap2
, but we might see undermapping as an effect of pandora
not multimapping to several different genes. Release 0.12.0-alpha.0 improves this slightly, but if we want to fairly compare it to tools like minimap2
we can only do so if we implement multimapping in pandora
. Closing this as the real issue is that we need to implement multimapping, but I am not even sure if this feature is required...
This is happening with one of our roundhound runs, but in general we know pandora undermaps reads to the PRGs. In some cases, heavy undermapping happens and this impacts genotyping. This is a breaking issue for roundhound and in this issue we will discuss some things about it and try to solve it.
In one of our plasmid genes,
minimap2
maps 1605 reads whilepandora
maps only 718. All reads thatpandora
mapsminimap2
also maps. I will thus extract the unmapped reads and debug why.This relates to https://github.com/rmcolq/pandora/issues/316 . Although reads undemap more in the edge of genes, we can find undermapping throughout the whole gene