lijingya / ELMER

Enhancer Linking by Methylation/Expression Relationship (ELMER) is package to identify tumor-specific changes in DNA methylation within distal enhancers, and link these enhancers to downstream target genes
6 stars 16 forks source link

Calculation of distance from probe to TSS of gene #22

Open davidjones1993 opened 4 years ago

davidjones1993 commented 4 years ago

Hi, I have a couple of questions regarding ELMER v2. In the section find significant gene-probe pairs, the output file contains a distance column, which is supposed to be the distance from the probe to the gene (among the 10 upstream and downstream genes wrt the probe), with which it is significantly anti-correlated. Since a gene has multiple transcripts, each with its own TSS, how is the distance calculated? Or which transcript does a distance correspond to? For several genes, I have found multiple probes that were significant. However, I have noticed that the distance between, say 2 of these probes and the referred gene is given as 0. By this does it mean that the probes landed right on the TSS? How can 2 different probes have distance 0 from the same gene? (I have checked the coordinates of the probe with the TSS of all transcripts of the genes from Biomart, and they are not matching; the probe is nearabouts the TSS, but not directly on the TSS)

Please clarify this.

tiagochst commented 3 years ago

Hi,

The function get.pair uses as default the distance to gene. There is an option called addDistNearestTSS, which should calculate the minimum distance to the TSS.

Did you set addDistNearestTSS to TRUE? If you did, can you give an example of probe and gene so I can take a better look?

Best regards, Tiago

davidjones1993 commented 3 years ago

Thank you for your reply.

Yes I had set addDistNearestTSS as TRUE.

For instance, take the gene S100A8 (ENSG00000143546). The probes cg01431057 , cg20256009 and cg20335425, all L1 wrt the gene, have distances 0 in the distance column.

One point I would like to mention is that when doing the analysis for the first time, I had taken all differentially methylated probes, and not only the distal ones. I had used Champ package to find the methylated probes, and fed that as the input of the significant gene-probe pairs step.

I did a fresh analysis by omitting probes within 2000 bp from all TSS. I mapped all the probes within (TSS-2000,TSS+2000) regions using bedtools, and then excluded these probes from the previous set.

In both the cases, I have obtained almost exactly similar results for enriched motifs and top transcription factors. I am confused with these results.

Could you please elaborate why we are considering only distal probes in ELMER?

Hope to hear from you soon. Thanks again.

tiagochst commented 3 years ago

I still need to take a look at the problem.

For the question "Could you please elaborate why we are considering only distal probes in ELMER?"

That decision was taken in the ELMER v1 by Benjamin P. Berman and Liijng Yao. I remembering talking with Ben about that. There was intention to add the ELMER promoter analysis, but it seems the promoter-analysis for the pan-cancer analysis (https://doi.org/10.1186/s13059-015-0668-3) did not show significant results as the enhancer/distal analysis. Probably because most of the differentially methylated regions are at distal cis-regulatory regions (Source: https://doi.org/10.1038/nrg.2016.83). I think they decided to focus on the distal analysis on the article/software.