waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

cBioDataPack() fails despite valid studyId #69

Closed ghost closed 1 year ago

ghost commented 1 year ago

hello, followed the docs, and while, as presented there, mae<-cBioDataPack("acc") downloads a multiassayexperiment, the same command,

mae<-cBioDataPack("brain_cptac_2020")

fails with

Error in .check_study_id_building(cancer_study_id, "pack_build", ask = ask) : 
  'studyId', brain_cptac_2020, not found. See 'getStudies()'.

however,

getStudies(cbio)[30,]$studyId

returns "brain_cptac_2020" - precisely what is input. What is going on here?

LiNk-NY commented 1 year ago

Hi @vlaufer Thanks for reporting. It is resolved here a42b2a090a4f86a25ccf1b82e1606eba29d2bfc7 There is still an issue with mapping SAMPLE_ID to PATIENT_ID in the assays. Currently when there is no match with the PATIENT_ID, the assays are put in the metadata(mae). I will work on a fix. Best, Marcel

LiNk-NY commented 1 year ago

I've incorporated information from SAMPLE_ID from the datasets to map and build SummarizedExperiment objects. Now, you should get an object that looks like the following:

> (mae <- cBioDataPack("brain_cptac_2020"))
A MultiAssayExperiment object of 7 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 7:
 [1] cna: SummarizedExperiment with 19380 rows and 190 columns
 [2] linear_cna: SummarizedExperiment with 19380 rows and 190 columns
 [3] mrna_seq_v2_rsem_zscores_ref_all_samples: SummarizedExperiment with 18209 rows and 188 columns
 [4] mrna_seq_v2_rsem: SummarizedExperiment with 18209 rows and 188 columns
 [5] mutations: RaggedExperiment with 9951 rows and 200 columns
 [6] protein_quantification_zscores: SummarizedExperiment with 6429 rows and 218 columns
 [7] protein_quantification: SummarizedExperiment with 6429 rows and 218 columns
Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

These changes are in the latest version of cBioPortalData in Bioc-devel (package version 2.13.4).