waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

clinicalData does not retrieve all patient data #43

Closed afaissa closed 3 years ago

afaissa commented 3 years ago

Hi Marcel,

Thank you for your feedback and sorry for the missing information.

Here the whole thing:

library(cBioPortalData) cbio <- cBioPortal() PatientData <- clinicalData(api = cbio, studyId = "luad_tcga")

There is a file for the environment here if you would like to check.

https://drive.google.com/file/d/15OWRvojEGOpYscKmQc6liM2CzAiPnhN9/view?usp=sharing

I understand the issue is on my R, but I would be happy to get some help.

sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] Rsamtools_2.8.0 Biostrings_2.61.0
[3] XVector_0.33.0 cBioPortalData_2.5.1
[5] MultiAssayExperiment_1.19.1 SummarizedExperiment_1.23.0 [7] Biobase_2.53.0 GenomicRanges_1.45.0
[9] GenomeInfoDb_1.29.0 IRanges_2.27.0
[11] S4Vectors_0.31.0 BiocGenerics_0.39.0
[13] MatrixGenerics_1.5.0 matrixStats_0.58.0
[15] AnVIL_1.5.0 dplyr_1.0.6
[17] curl_4.3.1

loaded via a namespace (and not attached): [1] bitops_1.0-7 bit64_4.0.5 filelock_1.0.2
[4] progress_1.2.2 httr_1.4.2 GenomicDataCommons_1.17.0 [7] tools_4.1.0 utf8_1.2.1 R6_2.5.0
[10] DBI_1.1.1 withr_2.4.2 tidyselect_1.1.1
[13] prettyunits_1.1.1 TCGAutils_1.13.0 bit_4.0.4
[16] compiler_4.1.0 cli_2.5.0 rvest_1.0.0
[19] formatR_1.10 xml2_1.3.2 DelayedArray_0.19.0
[22] rtracklayer_1.53.0 readr_1.4.0 rappdirs_0.3.3
[25] rapiclient_0.1.3 RCircos_1.2.1 stringr_1.4.0
[28] digest_0.6.27 pkgconfig_2.0.3 dbplyr_2.1.1
[31] fastmap_1.1.0 limma_3.49.0 rlang_0.4.11
[34] RSQLite_2.2.7 BiocIO_1.3.0 generics_0.1.0
[37] jsonlite_1.7.2 BiocParallel_1.27.0 RCurl_1.98-1.3
[40] magrittr_2.0.1 GenomeInfoDbData_1.2.6 futile.logger_1.4.3
[43] Matrix_1.3-3 Rcpp_1.0.6 fansi_0.5.0
[46] lifecycle_1.0.0 stringi_1.6.2 yaml_2.2.1
[49] RaggedExperiment_1.17.0 RJSONIO_1.3-1.4 zlibbioc_1.39.0
[52] BiocFileCache_2.1.0 grid_4.1.0 blob_1.2.1
[55] crayon_1.4.1 lattice_0.20-44 splines_4.1.0
[58] GenomicFeatures_1.45.0 hms_1.1.0 KEGGREST_1.33.0
[61] pillar_1.6.1 rjson_0.2.20 biomaRt_2.49.0
[64] futile.options_1.0.1 XML_3.99-0.6 glue_1.4.2
[67] lambda.r_1.2.4 data.table_1.14.0 BiocManager_1.30.15
[70] vctrs_0.3.8 png_0.1-7 purrr_0.3.4
[73] tidyr_1.1.3 assertthat_0.2.1 cachem_1.0.5
[76] restfulr_0.0.13 survival_3.2-11 tibble_3.1.2
[79] RTCGAToolbox_2.23.1 GenomicAlignments_1.29.0 AnnotationDbi_1.55.0
[82] memoise_2.0.0 ellipsis_0.3.2

Originally posted by @afaissa in https://github.com/waldronlab/cBioPortalData/issues/42#issuecomment-851682010

LiNk-NY commented 3 years ago

Hi @afaissa,

Sorry but you would have to be more specific here. What do you mean by "all" patient data? Please make sure that BiocManager::valid() is returning TRUE for you. If the data that you are looking for is not present. You would have to provide a more detailed report that includes the API endpoint for the data that you were expecting.

This is the clinical data table that I get back including a large number of variables.

> cbio <- cBioPortal()
> PatientData <- clinicalData(api = cbio, studyId = "luad_tcga")
> PatientData
# A tibble: 586 x 81
   patientId  AGE   AJCC_METASTASIS_PATH… AJCC_NODES_PATHOL… AJCC_PATHOLOGIC_TU…
   <chr>      <chr> <chr>                 <chr>              <chr>              
 1 TCGA-05-4… 70    M1                    N2                 Stage IV           
 2 TCGA-05-4… 81    M0                    N2                 Stage IIIA         
 3 TCGA-05-4… 67    M0                    N0                 Stage IB           
 4 TCGA-05-4… 79    M0                    N1                 Stage IIIA         
 5 TCGA-05-4… 68    M0                    N0                 Stage IB           
 6 TCGA-05-4… 66    M0                    N2                 Stage IIIA         
 7 TCGA-05-4… 70    M0                    N0                 Stage IA           
 8 TCGA-05-4… 58    M0                    N0                 Stage IB           
 9 TCGA-05-4… 76    M0                    N2                 Stage IIIB         
10 TCGA-05-4… 76    M0                    N1                 Stage IIIB   
[truncated...]      

You can also try clearing the cache by deleting the file listed from this command:

cBioPortalData:::.getHashCache(digest::digest(list("clinicalData", cbio, "luad_tcga")))

Best, Marcel

afaissa commented 3 years ago

Hi Marcel,

Thank you for your reply and your time on that. "BiocManager::valid()" was indeed returning "TRUE".

The problem was indeed the cache. After following your suggestion clearing the cache it did work.

PatientData

A tibble: 586 x 81

patientId AGE AJCC_METASTASIS_PATH~ AJCC_NODES_PATHOL~ AJCC_PATHOLOGIC_TUM~

1 TCGA-05-42~ 70 M1 N2 Stage IV 2 TCGA-05-42~ 81 M0 N2 Stage IIIA

Thank you very much! Alex