twolinin / longphase

GNU General Public License v3.0
98 stars 6 forks source link

obtaining status of all 28M CpGs #33

Closed ilivyatan closed 6 months ago

ilivyatan commented 7 months ago

Hi, I'm comparing the use of modkit and longphase modcall on Nanopore modified BAMs. The modcall works well and provides a nice output format. But I noticed that it contains only a small subset of the CpGs, and I think it's due to some thresholding options. Which options should I set if I want to see all CpGs in the genome (or at least all that are covered by reads), and their methylation status, or %methylated reads?

Thanks!

ythuang0522 commented 7 months ago

modcall aims to identify "heterozygous" (haplotype-specific/allele-specific) methylations for extending the phasing range (i.e., co-phasing SNPs/indels/SVs/modifications). As such all "homozygous" methylations are ignored since they do not help distinguish paternal/maternal haplotypes. We also discard singleton heterozygous methylations without support from neighboring methylated loci, which are likely false positives. Therefore you will see a much less number of methylations compared with modkit. If you simply want all the possible methylated sites, modkit should be used instead.

PS: we can easily output those discarded homozygous loci and will provide this as an option in the next release.

ythuang0522 commented 6 months ago

Please upgrade to v1.6 #46