Open wp1g19 opened 3 years ago
Hi Will,
tcga.pipe
is a wrapper to TCGAbiolinks. You can actually do this type of filtering using TCGAbiolinks as suggested below:
library(TCGAbiolinks)
# Find all samples barcode with DNA methylation
query.DNAmethy <- GDCquery(
project = "TCGA-ESCA",
legacy = TRUE,
data.category = "DNA methylation",
platform = "Illumina Human Methylation 450"
)
# Get samples information
samples.info <- TCGAbiolinks::colDataPrepare(query.DNAmethy$results[[1]]$sample.submitter_id)
plyr::count(samples.info$primary_diagnosis)
# Get samples by primary diagnosis
barcode.eac <- samples.info %>% filter(primary_diagnosis == "Adenocarcinoma, NOS") %>% pull(sample)
barcode.escc <- samples.info %>% filter(primary_diagnosis == "Squamous cell carcinoma, NOS") %>% pull(sample)
# Download specific cases
library(TCGAbiolinks)
query.DNAmeth.eac <- GDCquery(
project = "TCGA-ESCA",
legacy = TRUE,
data.category = "DNA methylation",
platform = "Illumina Human Methylation 450",
barcode = barcode.eac
)
GDCdownload(query.DNAmeth.eac,files.per.chunk = 10)
dnam.eac <- GDCprepare(query.DNAmeth.eac)
I work with the ESCA project, looking at Esophageal adenocarcinoma. However, TCGA has placed squamous cell alongside Adenocarcinoma, the diseases are significantly different leading to the immensely painful task of separating out the diseases, in a manner that allows these analysis functions to work.
If the tcga.pipe function was adapted to allow for histological type filtering, it would allow for much quicker analysis of the separate diseases.
KR, Will