smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
66 stars 27 forks source link

No. of reads overlapping is different in the methcounts input and output file #89

Closed ycl6 closed 8 years ago

ycl6 commented 8 years ago

Hi,

I used intersectBed to check the number of reads overlapping a particular site. The no. of matched reads in the .mr.sorted file generate by duplicate-remover (used as input for methcounts) and the no. of reads overlapping the site recorded in the .meth file generate by methcounts is different.

I thought the 2 numbers should match. Did I misinterpret the meaning of these 2 output files?

mengzhou commented 8 years ago

The count in methcounts result is not necessarily the true number of reads overlapping that site. If in the read there is a mismatch right at the cytosine site, then that read will not be counted. For example, in the genome there is this C site, TTCAA, but the read is TTAAA. This A readout is neither the original C nor a bisulfite converted T; therefore it will be considered as a mismatch.

So I think you should always see the number reported by methcounts lower than or equal to the number you obtain from intersectBed. If that's not your case, please let me know.

Best regards, Meng

On Mon, Nov 23, 2015 at 6:36 PM, I-Hsuan Lin notifications@github.com wrote:

Hi,

I used intersectBed to check the number of reads overlapping a particular site. The no. of matched reads in the .mr.sorted file generate by duplicate-remover (used as input for methcounts) and the no. of reads overlapping the site recorded in the .meth file generate by methcounts is different.

I thought the 2 numbers should match. Did I misinterpret the meaning of these 2 output files?

— Reply to this email directly or view it on GitHub https://github.com/smithlabcode/methpipe/issues/89.

ycl6 commented 8 years ago

Thanks @mengzhou, the number indeed add-up after taking mismatches into account.