Open lkerr34 opened 1 year ago
Yes, there is internal trimming. The signal-to-read alignment step can discard the ends of the read, and also the methylation calling algorithm avoids calling at the ends of the signal boundaries. There may be other steps as well but those are the two off the top of my head.
That's great - thank you so much for the quick reply!
I guess that the end of signal boundaries are the read ends? I have found that I also have a few examples where CpGs further into a read are missing from the tsv file (even when surrounding CpGs are included). Can you think of anything that may cause this?
Nanopolish skips regions that have long regions of high CpG density as it is computationally too expensive to call this - is this possible for this case?
I don't think this can be the case here as when I run the calculate methylation frequency script some of the CpGs that seem to be missed on individual reads appear (and I assume Nanopolish would exclude these high-density regions from all reads so that they wouldn't appear in the bulk data).
I've attached a specific example of a short read where the discrepancy occurs in case this is useful.
The first screenshot is the entries in the Nanopolish tsv files associated with a particular read. 6 CpGs are included.
The next screenshot is the corresponding Minimap2 alignment for the read. Here, 7 CpGs are detected (highlighted), with the CpG highlighted in green being the one that is missing in the Nanopolish tsv file.
Finally, this is a snippet from the calculate methylation frequency script, where all 7 of the CpGs in the region are included.
So this CpG is being omitted from this read but not from other reads.
Hm, it might be this check then:
You could try commenting it out to see if the call at 307 appears.
Apologies if this is a silly question but I installed Nanopolish via conda and can't find the nanopolish_basemods.cpp file you mention. Would you expect me to have access to that file if I installed via conda?
Also, can I ask what it is that this part of the script checks? What are the boundaries?
Hi,
I have been using Nanopolish to call methylation and wondered if reads are automatically trimmed by Nanopolish when making these methylation calls? My reason for asking is that when I compare read alignments from Minimap2 to the methylation tsv calls I see that some CpGs towards the start and end of a read appear to be missing from the tsv file.
If the reads aren't trimmed then are there any other reasons that these CpGs (or any other CpGs) could be missing?
I'd really appreciate any info you could give me!
Thanks, Lyndsay