lijingya / ELMER

Enhancer Linking by Methylation/Expression Relationship (ELMER) is package to identify tumor-specific changes in DNA methylation within distal enhancers, and link these enhancers to downstream target genes
6 stars 16 forks source link

Function needed for tcga.pipe #24

Open wp1g19 opened 3 years ago

wp1g19 commented 3 years ago

I work with the ESCA project, looking at Esophageal adenocarcinoma. However, TCGA has placed squamous cell alongside Adenocarcinoma, the diseases are significantly different leading to the immensely painful task of separating out the diseases, in a manner that allows these analysis functions to work.

If the tcga.pipe function was adapted to allow for histological type filtering, it would allow for much quicker analysis of the separate diseases.

KR, Will

tiagochst commented 3 years ago

Hi Will,

tcga.pipe is a wrapper to TCGAbiolinks. You can actually do this type of filtering using TCGAbiolinks as suggested below:

library(TCGAbiolinks)

# Find all samples barcode with DNA methylation
query.DNAmethy <- GDCquery(
    project = "TCGA-ESCA",
    legacy = TRUE,
    data.category = "DNA methylation",
    platform = "Illumina Human Methylation 450"
)

# Get samples information
samples.info <- TCGAbiolinks::colDataPrepare(query.DNAmethy$results[[1]]$sample.submitter_id)

plyr::count(samples.info$primary_diagnosis)

# Get samples by primary diagnosis
barcode.eac <- samples.info %>% filter(primary_diagnosis == "Adenocarcinoma, NOS") %>% pull(sample)
barcode.escc <- samples.info %>% filter(primary_diagnosis == "Squamous cell carcinoma, NOS") %>% pull(sample)

# Download specific cases
library(TCGAbiolinks)
query.DNAmeth.eac <- GDCquery(
    project = "TCGA-ESCA",
    legacy = TRUE,
    data.category = "DNA methylation",
    platform = "Illumina Human Methylation 450",
    barcode = barcode.eac
)
GDCdownload(query.DNAmeth.eac,files.per.chunk = 10)
dnam.eac <- GDCprepare(query.DNAmeth.eac)