waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
44 stars 7 forks source link

Represntation of GDC data portal #48

Closed SerifatAdebola closed 2 years ago

SerifatAdebola commented 3 years ago

Hi, I am quite new to navigating genomic databases and will like to know if the information in curatedTCGAdata representative of all the information in TCGA GDC portal. As I was trying to compare case Ids of a previous data to case IDs in curated data but it only returned 4 out of about 18 cancer types. Code below for reference, st1 is where my casefiles are stored

all = curatedTCGAData(diseaseCode = "*", assays = c("RNASeqGene", dry.run = F)

all2 = all$patient.bcr_patient_uuid

all[all2 %in% st1]

LiNk-NY commented 2 years ago

Hi @SerifatAdebola Sorry, I did not see the notification of your issue. In order to ensure a response, please tag @LiNk-NY in comments. The data in curatedTCGAData mostly corresponds to data from the GDAC Firehose pipeline which is aligned to hg19, IIRC, whereas the GDC data are harmonized to GRCh38. https://broadinstitute.atlassian.net/wiki/spaces/GDAC/pages/844334036/FAQ#FAQ-EndOfTCGAQ%3AIunderstandthatTCGAdatahasmigratedtotheGDC%2CbutwhydoIseediscrepanciesbetweenGDCandFireBrowse%3F