waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 13 forks source link

study Id not recognized in cBioPortalData function #72

Closed ZWael closed 1 year ago

ZWael commented 1 year ago

Hello @LiNk-NY and cBioPortalData team, I have an error retrieving gene expression data fro some studies (the clinical data was retrieved with no problem with the same studyId) the error message was Error in .check_study_id_building(exargs[["studyId"]], "api_build"): 'studyId', pog570_bcgsc_2020, not found. See 'getStudies()'.

you can find a reprex below

library(cBioPortalData)
library(AnVIL)
cbio <- cBioPortal()
c_data=clinicalData(cbio, studyId = "pog570_bcgsc_2020")
head(c_data)
#> # A tibble: 6 × 34
#>   patientId AGE   GENDER OS_MONTHS OS_STATUS SAMPLE_COUNT TREATED_WITH_ICI ICI  
#>   <chr>     <chr> <chr>  <chr>     <chr>     <chr>        <chr>            <chr>
#> 1 11004     30    Female 8.11      1:DECEAS… 1            No               <NA> 
#> 2 11307     48    Female 27.46     1:DECEAS… 1            No               <NA> 
#> 3 11698     64    Male   83.9      1:DECEAS… 1            No               <NA> 
#> 4 12255     34    Male   9.66      1:DECEAS… 1            No               <NA> 
#> 5 13009     <NA>  Female 72.14     0:LIVING  1            No               <NA> 
#> 6 13261     44    Male   53.68     1:DECEAS… 1            No               <NA> 
#> # ℹ 26 more variables: ICI_BEST_RESPONSE <chr>,
#> #   ICI_DURABLE_CLINICAL_BENEFIT <chr>, ICI_MUTATION_CATEGORY <chr>,
#> #   TREATMENT_CATEGORY <chr>, T_CELLS_CD4_MEMORY_ACTIVATED <chr>,
#> #   T_CELLS_CD4_MEMORY_RESTING <chr>, T_CELLS_CD4_NAIVE <chr>,
#> #   T_CELLS_CD8 <chr>, T_CELLS_FOLLICULAR_HELPER <chr>,
#> #   T_CELLS_GAMMA_DELTA <chr>, T_CELLS_REGULATORY_TREGS <chr>, sampleId <chr>,
#> #   ANALYSIS_COHORT <chr>, BIOPSY_COHORT <chr>, BIOPSY_SITE <chr>, …

exp_datadata=cBioPortalData(api = cbio,
                    by = "hugoGeneSymbol",
                    genes="ALB",
                    studyId = "pog570_bcgsc_2020",
                    molecularProfileIds = "pog570_bcgsc_2020_rna_seq_mrna")
#> Error in .check_study_id_building(exargs[["studyId"]], "api_build"): 'studyId', pog570_bcgsc_2020, not found. See 'getStudies()'.

Created on 2023-08-21 with reprex v2.0.2

package.version("cBioPortalData")
[1] "2.6.1"
LiNk-NY commented 1 year ago

Hi @ZWael Please use version >= 2.13.3 of cBioPortalData. Best regards, Marcel

ZWael commented 1 year ago

Hi @LiNk-NY I test this with a more recent version of R and i have this message advising me manually download the data any idea why ?

Our testing shows that 'pog570_bcgsc_2020' is not currently building. Use 'downloadStudy()' to manually obtain the data. Proceed anyway? [y/n]:

package.version("cBioPortalData")
[1] "2.13.6"
data=cBioPortalData(api = cbio,
               by = "hugoGeneSymbol",
               genes=hugo_id,
               studyId = "pog570_bcgsc_2020",
               molecularProfileIds ="pog570_bcgsc_2020_rna_seq_mrna" )

 Our testing shows that 'pog570_bcgsc_2020' is not currently building.
 Use 'downloadStudy()' to manually obtain the data.
 Proceed anyway? [y/n]:   
 y
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
group length is 0 but data length > 0                
LiNk-NY commented 1 year ago

Hi @ZWael Can you provide a reproducible example? Particularly the value of hugo_id. I have tried this with one of the gene panels and it works ok.

suppressPackageStartupMessages(library(cBioPortalData))
cbio <- cBioPortal()
cBioPortalData(
    api = cbio,
    genePanelId = "IMPACT341",
    studyId = "pog570_bcgsc_2020",
    molecularProfileIds ="pog570_bcgsc_2020_rna_seq_mrna"
)
#> A MultiAssayExperiment object of 1 listed
#>  experiment with a user-defined name and respective class.
#>  Containing an ExperimentList class object of length 1:
#>  [1] pog570_bcgsc_2020_rna_seq_mrna: SummarizedExperiment with 341 rows and 570 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files

Created on 2023-08-25 with reprex v2.0.2

Note. There may be some data that was not able to be imported in the metadata().

ZWael commented 1 year ago

@LiNk-NY Thank you, interesting, I tested with same arguments as you, and i verified the version (2.13.6 installation with BiocManager). for hugo_id it is a vector with hugo gene names from the gencode.v27 (40563 names) but i tested with "ALB" also (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz)

data=cBioPortalData(api = cbio,
               #by = "hugoGeneSymbol",
               genePanelId = "IMPACT341",
               studyId = "pog570_bcgsc_2020",
               molecularProfileIds ="pog570_bcgsc_2020_rna_seq_mrna" )

  Our testing shows that 'pog570_bcgsc_2020' is not currently building.
  Use 'downloadStudy()' to manually obtain the data.
  Proceed anyway? [y/n]: 
LiNk-NY commented 1 year ago

The message you see is normal for datasets that have not been fully built. We do testing and update a small dataset that gets queried at every cBioPortalData call. You can either download it with downloadStudy or proceed but you may not get all the data in the MultiAssayExperiment, see the metadata(). I have added a Considerations section in the vignette. Best, Marcel