Closed SerifatAdebola closed 2 years ago
Hi @SerifatAdebola
Sorry, I did not see the notification of your issue. In order to ensure a response, please tag @LiNk-NY in comments.
The data in curatedTCGAData
mostly corresponds to data from the GDAC Firehose pipeline which is aligned to hg19, IIRC, whereas the GDC data are harmonized to GRCh38.
https://broadinstitute.atlassian.net/wiki/spaces/GDAC/pages/844334036/FAQ#FAQ-EndOfTCGAQ%3AIunderstandthatTCGAdatahasmigratedtotheGDC%2CbutwhydoIseediscrepanciesbetweenGDCandFireBrowse%3F
Hi, I am quite new to navigating genomic databases and will like to know if the information in curatedTCGAdata representative of all the information in TCGA GDC portal. As I was trying to compare case Ids of a previous data to case IDs in curated data but it only returned 4 out of about 18 cancer types. Code below for reference, st1 is where my casefiles are stored
all = curatedTCGAData(diseaseCode = "*", assays = c("RNASeqGene", dry.run = F)
all2 = all$patient.bcr_patient_uuid
all[all2 %in% st1]