xie186 / ViewBS

ViewBS - a powerful toolkit for visualization of high-throughput bisulfite sequencing data
GNU General Public License v3.0
81 stars 27 forks source link

How is coverage calculated in MethCoverage #26

Closed davidries84 closed 5 years ago

davidries84 commented 5 years ago

Hi, I ran MethCoverage on multiple samples, and for every sample, the values for depth are 0, 1 and 2. This means, that at each and every C, the maximum Number of reads is 2, yes? But in the input file, I clearly find Positions with a higher coverage than 2:

chr7H 295 - 0 0 CHH CCT chr7H 300 + 0 0 CHH CAT chr7H 303 + 7 0 CG CGA chr7H 304 - 4 1 CG CGA chr7H 306 - 0 0 CHH CTC chr7H 307 - 0 0 CHH CCT chr7H 309 + 0 0 CHH CAC chr7H 311 + 0 0 CHH CCA chr7H 312 + 0 0 CHG CAG chr7H 314 - 0 0 CHG CTG chr7H 316 - 0 0 CHH CTC chr7H 318 + 0 0 CHH CCT chr7H 319 + 0 0 CHH CTC chr7H 321 + 0 0 CHH CTT chr7H 324 + 0 0 CHH CAC chr7H 326 + 5 1 CG CGC chr7H 327 - 5 0 CG CGT

What did I get wrong?

Best,

David

xie186 commented 5 years ago

Could you please provide the version of the ViewBS you are using and the command line? Thanks.

davidries84 commented 5 years ago

Sure,

VERSION is 0.1.0

and the command I used: ViewBS MethCoverage --sample A.bis_rep.cov.CX_report.txt.gz,A --prefix test3 --reference /masked_ref/pseudomolecules_masked.fasta

Best,

David

xie186 commented 5 years ago

I'll take a look tomorrow. Could you please try the latest version? Thanks.

davidries84 commented 5 years ago

Hi, I downloaded and used Version 0.1.6 (which by the way still tells me it's Version 0.1.0), but the result is the same: max coverage is 2.0.

EDIT: When I extract the coverage for C's in CG context, I can clearly see, that I have coverage > 2. So maybe I completely misunderstand what MethCoverage is about?

grep '\sCG\s' A.bis_rep.cov.CX_report.txt | head -n 100000 | awk '{print $4 + $5}' | sort -n | uniq -c

56566 0 11765 1 7575 2 5408 3 3871 4 3062 5 2242 6 1866 7 1407 8 1136 9 909 10 614 11 594 12 473 13 409 14

Best, David

xie186 commented 5 years ago

I'll check this and get back to you.

xie186 commented 5 years ago

Sorry for the late response. The result generated by MethCoverage is the reverse cumulative plot with x-axis representing the coverage and y-axis representing the percentage of sites across genome.

davidries84 commented 5 years ago

Hello again. I wanted to report that everything works fine now. I just seems that MethCoverage has some cutoff where it stops to plot the data. This makes perfectly sense of course, but I got confused, because I had most bases only covered with one or two reads, so this is all I saw, although there were bases with more coverage. But with a reasonable amount of data, one also gets reasonable pictures.

xie186 commented 5 years ago

Thanks for letting us know. We'll update the manual and help message accordingly.