guoweilong / cgmaptools

toolbox for analysing BS-seq data, advance features in SNV, ASM and DMR
https://cgmaptools.github.io
61 stars 26 forks source link

How can I know the sites methylated level ? #2

Closed jmsong2 closed 6 years ago

jmsong2 commented 6 years ago

Dear Weilong,

     I'm very happiness to find your nice software and paper.

     In my own work, I have one question about the methylation level value.

      I use this command to get Allele-specific DNA methylation site:
     cgmaptools asm -m ass -r Chr12.fa -b Chr12.bam -l Chr12q.vcf -o Chr12.asm -t C 

      This is my result text:

Chr SNP_Pos Ref Allele1 Allele2 C_Pos Allele1_linked_C Allele2_linked_C Allele1_linked_C_met Allele2_linked_C_met pvalue fdr ASM Chr12 9820 G G A 9814 3-3 5-0 0.50 1.00 1.82e-01 4.03e-01 FALSE Chr12 9820 G G A 9869 7-0 5-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9820 G G A 9897 7-0 5-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9820 G G A 9901 7-0 5-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9820 G G A 9908 7-0 4-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9820 G G A 9922 7-0 4-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9826 G G A 9814 3-3 5-0 0.50 1.00 1.82e-01 4.03e-01 FALSE Chr12 9826 G G A 9869 8-0 5-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9826 G G A 9897 8-0 5-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9826 G G A 9901 8-0 5-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9826 G G A 9908 8-0 4-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9826 G G A 9922 8-0 4-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9830 G G A 9814 3-3 5-0 0.50 1.00 1.82e-01 4.03e-01 FALSE Chr12 9830 G G A 9869 9-0 7-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9830 G G A 9897 9-0 7-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9830 G G A 9901 9-0 7-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9830 G G A 9908 9-0 6-0 1.00 1.00 1.00e+00 1.00e+00 FALSE Chr12 9830 G G A 9922 9-0 6-0 1.00 1.00 1.00e+00 1.00e+00 FALSE

      But I find some sites are repetitive and have different allele methylation level beacuse of basing on the different SNV. For now, if i want to know the site's (e.g 9901) two allele absolute methylation level. How can I get it ?

Best, Jiaming

ghost commented 6 years ago

@jmsong2 Thanks for your comment.

The asm function handles one heterozygous SNP site each time and reports the methylation info that derived from mappable reads linked only by this given SNP site.

In your case, there are three heteSNP sites located in this small region but linked different number of mappable reads, which may leads to subtle variance in methylation estimation. However, this is common and normal as methylation level estimation depends on the number of linked reads for each SNP site.

If you want to estimate methylation levels of the site (e.g. 9901) by alleles , I recommend you refer to the record that has more supporting reads, which is reliable, like 9830 linked record.

Best, Ping

jmsong2 commented 6 years ago

Dear Ping,

Thank you a lot for your kind advice. I will try to do it.

Best, Jiaming

hmyh1202 commented 6 years ago

Hi: can you explain the theory of ASM and how to gey more reliable ASM site or region, and how to do further analysis with ASM. Besides, how do methylation LD ? Thanks

ghost commented 6 years ago

Hi, @hmyh1202

The implementation of ASM in CGmapTools comprises three steps:

  1. Assignment of mapped reads to two alleles of a heterozygous SNP site.
  2. Calculate the methylation levels of each cytosines (e.g. CpG sites) by alleles respectively.
  3. Students' t-test is then performed to measure the significance of methylation differences between each allele linked cytosines. Adjusted p-values should be calculated for multiple t-test.

The parameters of asm function include thresholds of minimum sequencing depth and number of cytosines that linked by an allele, and also adjusted p values used to define ASMs, which you could adjust to get more reliable results requiring more stringent thresholds.

Following ASM, to my understanding, you should project them to functional annotations and perform integrative analysis with other signals, choices are gene expression profiles, histone modification and chromatin status and so on.

In CGmapTools, we did not contain any terms about methylation LD. You'd better refer to other available sources. It's also a function that we will probably implement in CGmapTools in the future.

Well, we have not compared CGmapTools with methpipe in ASM module yet. I hope the mentioned implementation of ASM in CGmapTools above may help you understand their differences.

Best,

Ping

hmyh1202 commented 6 years ago

Thanks a lot

The author of methpipe said that: Briefly, the idea is to scan the genome with a sliding window and for each region, fit two models: a one-allele model and a two-allele model. In the one-allele model, the reads and the methylation state of the CpGs on them are not organized into any particular pattern. In the two-allele model, it is possible to partition reads overlapping several CpG sites into an unmethylated group and a methylated group in roughly 50/50 proportion. After fitting, you have likelihoods for each model and can do a likelihood ratio test to determine whether the two-allele model fits significantly better than the one-allele model. At the end of the program, adjacent two-allele model bins are combined into "allelicly methylated regions."

guoweilong commented 6 years ago

Hi Jiaming,

Actually, you may find our discussion with this issue in our paper.

In the field of ASM analysis, some published tools try to learn a statistic model to see whether the methylation status on reads have two different groups or only one group. Actually, as these reads are from multiple cells, we can not tell whether the two groups indicate two alleles, or indicate two sub-groups of cells.

In CGmapTools, we use direct genome information to find the ASM regions, linked by heterogygous SNV evidence. Thus the ASM method by CGmapTools shall be real allele-specific methylated regions, rather than cell-type specific regions.

Best, Weilong