Closed osvatic closed 8 months ago
Hello @osvatic,
You should see the depth accounted for in the N_fail
, N_del
, N_nocall
, and N_diff
columns of the bedmethyl. Also, base modifications only occur on one strand, so the N_valid_cov
and the counts I just mentioned are only for reads that are aligned to either the (+) strand or the (-) strand. If you ran modkit
with a motif like --cpg
or --motif CGCG 2
the maximum N_valid_cov
you could achieve (without filtering, DELs, etc.) is the coverage of the reads aligned to the (+) or (-) strand. If you narrow the input to samtools depth
to only positive, primary alignments, (e.g. samtools view -bh -F 2320 | samtools depth -
) the N_valid_cov
and depth values should be more similar.
Thank you for the explanation.
The positive primarmy alignments are much closer to the sum of all the N_* values mentioned.
I am trying to visualize some aspects of the bedmethyl file for specific regions so a reference genome. I used the default setting for modkit pileup to generate the bedmethyl from a sorted bam file.
I also used samtools depth to quickly generate the coverages for the genome in the region.
The Nvalid coverage is substantially lower than that calculated from samtools (see plot, blue = samtools, red = Nvalid).
Is there a reason (lots of filtering?) for these to be so different? Would the Nvalid be the only one that makes sense visualizing?