waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

how to use cBioPortalData molecularProfilesIds for all human genes #67

Closed GeorgiaTsagkogeorga closed 1 year ago

GeorgiaTsagkogeorga commented 1 year ago

Hi,

Thanks for a great tool. I would like to ask if there is a way to use the cBioPortalData functionality to download selected molecular profiles for a study, e.g. expression or mutation, but for all human genes rather than a subset or a gene panel. This is to avoid downloading the whole zipped tarball data pack.

Many thanks, Georgia

LiNk-NY commented 1 year ago

Hi Georgia, @GeorgiaTsagkogeorga

The cBioPortal API was designed to handle specific genes of interest or gene panels. Unfortunately, it is currently not possible to do what you state above for all human genes. I recommend downloading the zipped data pack. You may also ask the data team directly at https://github.com/cbioportal/cbioportal

Best, Marcel

GeorgiaTsagkogeorga commented 1 year ago

Hi Marcel,

Thanks for your swift reply. I will then use the zipped data pack, thanks.

Best wishes, Georgia

GeorgiaTsagkogeorga commented 1 year ago

Hi again Marcel, @LiNk-NY

I tried to download the PanCancer TCGA data using cBioDataPack. It worked OK for most of the studies, but failed for four with errors either "Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names" or similar to the issue #60 (tried also to modify the names.field).

I ran something like this:

studies <- getStudies(cbio, buildReport = TRUE)
TCGA_PanCancer_studies <- studies[grep("PanCancer", studies$name),]
cancer_data <- cBioDataPack(TCGA_PanCancer_studies$studyId[19], ask = FALSE, cleanup = TRUE) 

The studies that I am unable to download are the TCGA_PanCancer_studies$studyId[c(5, 19, 23, 32)]

Thanks in advance for your help, Georgia

LiNk-NY commented 1 year ago

Hi @GeorgiaTsagkogeorga Sorry for the late response. This has been fixed in the latest version of cBioPortalData

> cancer_data
A MultiAssayExperiment object of 9 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 9:
 [1] cna_hg19.seg: RaggedExperiment with 12247 rows and 87 columns
 [2] cna: SummarizedExperiment with 25128 rows and 87 columns
 [3] log2_cna: SummarizedExperiment with 25128 rows and 87 columns
 [4] mrna_seq_v2_rsem_zscores_ref_all_samples: SummarizedExperiment with 20531 rows and 87 columns
 [5] mrna_seq_v2_rsem_zscores_ref_diploid_samples: SummarizedExperiment with 20471 rows and 87 columns
 [6] mrna_seq_v2_rsem: SummarizedExperiment with 20531 rows and 87 columns
 [7] mutations: RaggedExperiment with 3980 rows and 82 columns
 [8] rppa_zscores: SummarizedExperiment with 198 rows and 63 columns
 [9] rppa: SummarizedExperiment with 198 rows and 63 columns
Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

Note. You may have to use Bioc-devel.

Best, Marcel