zhou-lab / biscuit

BISulfite-seq CUI Toolkit
Other
62 stars 24 forks source link

Collapse strands at CpGs in vcf2bed #29

Closed chrisamiller closed 5 years ago

chrisamiller commented 5 years ago

While it's nice to have per-strand readout in some contexts, most CpGs are methylated symmetrically. As a result, having separate lines for the forward and reverse contexts makes it a pain to extract overall methylation status for each CpG site.

Would it be possible to modify vcf2bed to support collapsing the strands into a single position, depth, and beta value for each CpG?

chrisamiller commented 5 years ago

To be clear, looking at these two lines:

chr1  770501   .  C  .  24 PASS  NS=1;CX=CG;N5=CACGT  GT:GL1:GQ:DP:SP:CV:BT   0/0:-1,-7,-39:24:8:C7Y1:4:0.75
chr1  770502   .  G  .  24 PASS  NS=1;CX=CG;N5=AACGT  GT:GL1:GQ:DP:SP:CV:BT   0/0:-1,-7,-39:24:8:G6R2:4:0.50

I get a bed output of:

chr1    770500  770501  0.750   4
chr1    770501  770502  0.500   4

When what would be often be more useful is just one entry for the whole CpG:

chr1    770500  770501  0.625   8
zwdzwd commented 5 years ago

Hi @chrisamiller, Sorry for the tardy response. In the original version, vcf2bed indeed merge complementary cytosines for each cpgs. In the late version, this functionality is separated to the mergecg subcommand to allow more flexibility. I think this is better because sometimes, I got cytosine count from other sources and I wished to use mergecg standalone. Now you just need to pipe the output of vcf2bed to mergecg. Let me know if this sounds reasonable to people. Thanks!

chrisamiller commented 5 years ago

oh man - I missed that new command completely. Yeah, that solves the problem - thanks!