nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
128 stars 7 forks source link

Question about CpGs without modification #164

Open VasLem opened 5 months ago

VasLem commented 5 months ago

I am currently comparing different methylation calling protocols (including enzymatic technologies), but for the sake of the experiments I also need to report the coverage per detected CpG. Could you please confirm that the counts of unmodified CpGs across all occurrences do not reside in the produced modkit pileup --cpg bed file ? And if so, is it possible to somehow include them in the output? Thank you in advance.

ArtRand commented 5 months ago

Hello @VasLem,

When using modkit pileup --cpg all CpGs with at least 1 read of valid coverage will be emitted in the output. If you want all CpGs with any coverage at all, you'll have to specify --no-filtering in which case all base modification calls will "pass" and any CpG with at least a single read of coverage will be emitted. There is currently no flag to emit bedMethyl records with 0 valid coverage - but that's not a bad idea. If you have CpGs without any coverage at all, they will also be omitted. If you need these, probably the easiest thing to do is run modkit motif-bed ${fasta} CG 0 then bedtools intersect -loj. Does that answer your question or did I miss something?