zhou-lab / biscuit

BISulfite-seq CUI Toolkit
Other
63 stars 24 forks source link

Dose biscuit can used to calculate the ratio of HCG and GCH in each genome site for human? Can I use "biscuit vcf2bed -k 10 -t hcg/gch"? And what's the meaning of colnom 4? #25

Closed liuzhe93 closed 5 years ago

liuzhe93 commented 5 years ago

1.Script: biscuit align -t 10 Homo_sapiens_assembly18.fasta SRR567398.fastq | samtools sort -T . -O bam -o SRR567398.bam samtools index SRR567398.bam samtools flagstat SRR567398.bam > SRR567398.bam.flagstat biscuit tview -g chr19:7525080 SRR567398.bam Homo_sapiens_assembly18.fasta biscuit markdup SRR567398.bam SRR567398_rmpcrdup.bam samtools index SRR567398_rmpcrdup.bam biscuit pileup Homo_sapiens_assembly18.fasta SRR567398_rmpcrdup.bam -o SRR567398.vcf -q 20 -N -m 30 gzip SRR567398.vcf biscuit vcf2bed -k 10 -t hcg SRR567398.vcf.gz > IMR90.hcg.bed biscuit vcf2bed -k 10 -t gch SRR567398.vcf.gz > IMR90.gch.bed 2. header of IMR90.hcg.bed chr1 2574795 2574796 0.000 10 chr1 2803654 2803655 0.700 10 chr1 4899545 4899546 1.000 10 chr1 5652519 5652520 0.460 13 chr1 5652541 5652542 0.000 12 chr1 5652551 5652552 0.540 13 chr1 5653021 5653022 0.300 10 chr1 16814548 16814549 0.800 10 chr1 26328594 26328595 0.400 10

zwdzwd commented 5 years ago

Hi, Column 4 is read depth. Have you tried biscuit vcf2bed -k 0? I think it will give you all the HCG and GCH sites (then you can just count the number of lines). I have used the method to calculate the number of CpGs. I am not sure that's the most straightforward way to calculate this ratio though.

Thanks,

liuzhe93 commented 5 years ago

Dear,

Thanks for your reply. I want to know, could biscuit can calculate the No. of CpGs and GpCs in separately? Looking forward to your answer.

Best wishes, Zhe

https://github.com/zwdzwd/biscuit/issues/25#issuecomment-445646484

liuzhe93 commented 5 years ago

I'm analyzing NOMe-seq data now. And also, I'm the newcomer of it. I see your description of biscuit, which can extract HCG and GCH separately. But I still don't know which column represents for HCG, and which column represents for GCH in NOMe-seq.

$ biscuit vcf2bed -k 10 -t cg input.vcf.gz -t can also take snp - SNP information c - all cytosines hcg - HCG for NOMe-seq gch - GCH for NOMe-seq

liuzhe93 commented 5 years ago

In fact, I tried biscuit vcf2bed -k 0? the result is as follows: chr1 468 469 1.000 2 chr1 471 472 0.000 1 chr1 483 484 1.000 3 chr1 484 485 1.000 2 chr1 488 489 0.670 3 chr1 492 493 0.670 3 chr1 496 497 0.670 3 chr1 497 498 1.000 4 chr1 524 525 0.750 4 chr1 541 542 1.000 3 chr1 542 543 1.000 5 chr1 562 563 1.000 2 chr1 570 571 1.000 2 chr1 576 577 1.000 1 chr1 579 580 1.000 2 chr1 588 589 1.000 1

It gave me all the HCG or GCH sites. However, I have to obtain the No. of methylation HCG in each site. Could you help me figure it out?

liuzhe93 commented 5 years ago

for example, a.bed chr1 1 2 0.2 3 chr1 3 4 0.4 5 chr1 5 6 0.1 0 chr1 9 10 0.2 0 The ratio of methylation: 3/(3+5+2) it's right?

zwdzwd commented 5 years ago

Sorry for not being able to get back to you quickly. To calculate methylation level as a whole, I would do (0.2x3+0.4x5)/(3+5) following your example (though I don't think numbers in your example are possible due to non-integer methylated read counts, but you get the idea, the fourth column is the ratio of methylation for a locus while the fifth is the number of reads covering the locus).

To calculate the context number, you can simply do "-e" (in the latest version) to add context and count lines for specific context.

Hope this helps.

liuzhe93 commented 5 years ago
      Sorry for not being able to get back to you quickly. To calculate methylation level as a whole, I would do

(0.2x3+0.4x5)/(3+5) following your example (though I don't think numbers in your example are possible due to non-integer methylated read counts, but you get the idea, the fourth column is the ratio of methylation for a locus while the fifth is the number of reads covering the locus). To calculate the context number, you can simply do "-e" (in the latest version) to add context and count lines for specific context. Hope this helps.

Thanks for your reply. When I was using "biscuit vcf2bed -k 0 -t gch -e merged.vcf.gz > IMR90.gch.zwd.e.bed". The 8th column is non-integer methylated read counts. May I use (0.388+0.147+0.333+0.147)/(8+7+3+7) for calculating the GCH methylation ratio in genome--chr1[481,490]

chr1 481 482 G GCH CT GGCTG 0.380 8 chr1 482 483 C GCH CC AGCCG 0.140 7 chr1 485 486 G GCH CC GGCCG 0.330 3 chr1 486 487 C GCH CC GGCCC 0.140 7 chr1 490 491 C GCH CC CGCCC 0.140 7 chr1 494 495 C GCH CC CGCCC 0.000 7 chr1 520 521 G GCH CA AGCAC 0.330 12 chr1 521 522 C GCH CT TGCTC 0.000 6 chr1 526 527 C GCH CC CGCCT 0.000 4 chr1 551 552 G GCH CA TGCAC 0.000 6