nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
144 stars 8 forks source link

bedmethyl Nvalid coverage vs samtools depth coverage #137

Closed osvatic closed 8 months ago

osvatic commented 8 months ago

I am trying to visualize some aspects of the bedmethyl file for specific regions so a reference genome. I used the default setting for modkit pileup to generate the bedmethyl from a sorted bam file.

I also used samtools depth to quickly generate the coverages for the genome in the region.

The Nvalid coverage is substantially lower than that calculated from samtools (see plot, blue = samtools, red = Nvalid).

image

Is there a reason (lots of filtering?) for these to be so different? Would the Nvalid be the only one that makes sense visualizing?

ArtRand commented 8 months ago

Hello @osvatic,

You should see the depth accounted for in the N_fail, N_del, N_nocall, and N_diff columns of the bedmethyl. Also, base modifications only occur on one strand, so the N_valid_cov and the counts I just mentioned are only for reads that are aligned to either the (+) strand or the (-) strand. If you ran modkit with a motif like --cpg or --motif CGCG 2 the maximum N_valid_cov you could achieve (without filtering, DELs, etc.) is the coverage of the reads aligned to the (+) or (-) strand. If you narrow the input to samtools depth to only positive, primary alignments, (e.g. samtools view -bh -F 2320 | samtools depth -) the N_valid_cov and depth values should be more similar.

osvatic commented 8 months ago

Thank you for the explanation.

The positive primarmy alignments are much closer to the sum of all the N_* values mentioned.