cbio <- cBioPortal() does not work #35

Closed ryao-mdanderson closed 3 years ago

ryao-mdanderson commented 3 years ago


I installed cBioPortalData on R/4.0.0 on our institution HPC cluster. however I failed to use cBioPortal(), $ module load R/4.0.0 $ R $library(cBioPortalData)

cbio <- cBioPortal() Error in Service(service = "cBioPortal", host = hostname, config = httr::config(ssl_verifypeer = 0L, : unused arguments (api_reference_url = apiUrl, api_reference_md5sum = "b39b387c6fdd8b04badf38cb0777998f")

May you please advise what's the reason and how to fix?

Thanks, Rong Yao

LiNk-NY commented 3 years ago

Hi Rong Yao, @ryao-mdanderson Please make sure you have a valid Bioconductor installation by checking BiocManager::valid(). With R >= 4.0.0, you should have either Bioc 3.12 or 3.13 installed. Can you provide more information on what sessionInfo() returns?

ryao-mdanderson commented 3 years ago


Thanks for your quick response.

Our R 4.0.0 has Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.0 (2020-04-24)

Here is what the sessionInfo() returns


R version 4.0.0 (2020-04-24)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux

Matrix products: default

BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so








attached base packages:

[1] parallel stats4 stats graphics grDevices utils datasets

[8] methods base

other attached packages:

[1] cBioPortalData_2.3.3 MultiAssayExperiment_1.14.0

[3] SummarizedExperiment_1.18.1 DelayedArray_0.14.0

[5] matrixStats_0.56.0 Biobase_2.48.0

[7] GenomicRanges_1.40.0 GenomeInfoDb_1.24.0

[9] IRanges_2.22.2 S4Vectors_0.26.1

[11] BiocGenerics_0.34.0 AnVIL_1.0.3

[13] dplyr_1.0.0

loaded via a namespace (and not attached):

[1] httr_1.4.1 bit64_0.9-7

[3] jsonlite_1.6.1 splines_4.0.0

[5] assertthat_0.2.1 askpass_1.1

[7] TCGAutils_1.10.0 BiocFileCache_1.12.0

[9] blob_1.2.1 Rsamtools_2.4.0

[11] GenomeInfoDbData_1.2.3 RTCGAToolbox_2.20.0

[13] yaml_2.2.1 progress_1.2.2

[15] pillar_1.4.4 RSQLite_2.2.0

[17] lattice_0.20-41 glue_1.4.1

[19] limma_3.44.1 digest_0.6.25

[21] XVector_0.28.0 rvest_0.3.6

[23] Matrix_1.2-18 XML_3.99-0.3

[25] pkgconfig_2.0.3 biomaRt_2.44.1

[27] zlibbioc_1.34.0 purrr_0.3.4

[29] RCircos_1.2.1 rapiclient_0.1.3

[31] BiocParallel_1.22.0 tibble_3.0.1

[33] openssl_1.4.1 generics_0.0.2

[35] ellipsis_0.3.1 GenomicFeatures_1.40.1

[37] survival_3.1-12 RJSONIO_1.3-1.4

[39] magrittr_1.5 crayon_1.3.4

[41] memoise_1.1.0 xml2_1.3.2

[43] tools_4.0.0 data.table_1.12.8

[45] prettyunits_1.1.1 hms_0.5.3

[47] formatR_1.7 lifecycle_0.2.0

[49] stringr_1.4.0 AnnotationDbi_1.50.0

[51] lambda.r_1.2.4 Biostrings_2.56.0

[53] compiler_4.0.0 rlang_0.4.6

[55] futile.logger_1.4.3 grid_4.0.0

[57] GenomicDataCommons_1.12.0 RCurl_1.98-1.2

[59] rappdirs_0.3.1 bitops_1.0-6

[61] DBI_1.1.0 curl_4.3

[63] R6_2.4.1 GenomicAlignments_1.24.0

[65] rtracklayer_1.48.0 bit_4.0.4

[67] futile.options_1.0.1 readr_1.3.1

[69] stringi_1.4.6 RaggedExperiment_1.12.0

[71] Rcpp_1.0.4.6 vctrs_0.3.0

[73] dbplyr_1.4.4 tidyselect_1.1.0

Thanks, Rong

LiNk-NY commented 3 years ago

Hi Rong Yao, @ryao-mdanderson

You're using the a 'too-new' version of cBioPortalData for the Bioconductor version that you have installed.

I would strongly recommend that you update your version of Bioconductor to 3.12 by using BiocManager::install(version = '3.12') but make sure that you have an appropriate location for the new package installations... See the BiocManager vignette for managing multiple Bioconductor versions.

Alternatively, the appropriate version of cBioPortalData for Bioconductor 3.11 is cBioPortalData_2.0.10. You can re-install it using BiocManager::install("cBioPortalData").

Best, Marcel

ryao-mdanderson commented 3 years ago

Thank you, Marcel. I will try.

Happy Thanksgiving! Rong

LiNk-NY commented 3 years ago

Feel free to re-open if you are having issues. AFAIK they should be resolved. Best, Marcel

ryao-mdanderson commented 3 years ago

Hello Marcel,

Thank you for your suggestion. Today, I followed your alternative suggestion: “Alternatively, the appropriate version of cBioPortalData for Bioconductor 3.11 is cBioPortalData_2.0.10. You can re-install it using BiocManager::install("cBioPortalData").

I reinstall cBioPortalData, the installation is successful. I am able to test : $ module load R/4.0.0 $ R

packageVersion("cBioPortalData") [1] ‘2.0.10’ cbio <- cBioPortal()

this success!

laml <- cBioDataPack("laml_tcga")

Output as the following:

Study file in cache: laml_tcga

Working on: /tmp/RtmpCBSQek/136c70be6467_laml_tcga/data_cna_hg19.seg

Parsed with column specification:


ID = col_character(),

chrom = col_double(),

loc.start = col_double(),

loc.end = col_double(),

num.mark = col_double(),

seg.mean = col_double()


Working on: /tmp/RtmpCBSQek/136c70be6467_laml_tcga/data_CNA.txt

Parsed with column specification:


.default = col_double(),

Hugo_Symbol = col_character()


See spec(...) for full column specifications.

|=================================================================| 100% 9 MB

Parsed with column specification:


Hugo_Symbol = col_character()


Working on: /tmp/RtmpCBSQek/136c70be6467_laml_tcga/data_linear_CNA.txt

Parsed with column specification:


.default = col_double(),

Hugo_Symbol = col_character()


See spec(...) for full column specifications.

|=================================================================| 100% 29 MB

Parsed with column specification:


Hugo_Symbol = col_character()


Working on: /tmp/RtmpCBSQek/136c70be6467_laml_tcga/data_methylation_hm27.txt

Parsed with column specification:


.default = col_double(),

Hugo_Symbol = col_character()


See spec(...) for full column specifications.

Parsed with column specification:


Hugo_Symbol = col_character()


Working on: /tmp/RtmpCBSQek/136c70be6467_laml_tcga/data_methylation_hm450.txt

Parsed with column specification:


.default = col_double(),

Hugo_Symbol = col_character()


See spec(...) for full column specifications.

Parsed with column specification:


Hugo_Symbol = col_character()


Working on: /tmp/RtmpCBSQek/136c70be6467_laml_tcga/data_mutations_extended.txt

Parsed with column specification:


.default = col_character(),

Entrez_Gene_Id = col_double(),

Start_Position = col_double(),

End_Position = col_double(),

dbSNP_Val_Status = col_logical(),

Score = col_double(),

t_ref_count = col_double(),

t_alt_count = col_double(),

n_ref_count = col_double(),

n_alt_count = col_double(),

Protein_position = col_double(),

Hotspot = col_double(),

RNAVAF_WU = col_double(),

RNAVarReads_WU = col_double(),

stop = col_double(),

NormalVAF_WU = col_double(),

start = col_double(),

TumorVAF_WU = col_double(),

RNARefReads_WU = col_double()


See spec(...) for full column specifications.

Parsed with column specification:


.default = col_character()


See spec(...) for full column specifications.

Error in .local(x, ...) : strand values must be in '+' '-' '*'

In addition: Warning messages:

1: In .find_seqnames_col(df_colnames0, seqnames.field0, xfix) :

cannnot determine seqnames column unambiguously

2: In .find_seqnames_col(df_colnames0, seqnames.field0, xfix) :

cannnot determine seqnames column unambiguously

3: In .find_seqnames_col(df_colnames0, seqnames.field0, xfix) :

cannnot determine seqnames column unambiguously

May you please advise what could be the cause of the error? How to fix?

Thank you very much for your help! Rong

Happy Thanksgiving! Rong

LiNk-NY commented 3 years ago

Hi Rong Yao, @ryao-mdanderson

From the error, it looks like there is an issue with the dataset. Perhaps you may want to report it here: https://github.com/cBioPortal/datahub.

All datasets are not guaranteed to build as MultiAssayExperiment data objects with older versions of cBioPortalData. We recommend using the latest version of R and cBioPortalData.

It is more likely that they build now since our success rate is around 83% for packaged studies. We have also added functionality to only download the data in the case where the MultiAssayExperiment build is not working. These are some of the features in the newest version of the package.

Best regards, Marcel

ryao-mdanderson commented 3 years ago

Hello Marcel,

Thank you for your suggestion.

Just be curious, for cBioPortalData 2.0.10, laml <- cBioDataPack("laml_tcga") Access the website: https://cbioportal-datahub.s3.amazonaws.com to download data.

I also find dataset in https://github.com/cBioPortal/datahubhttps://urldefense.com/v3/__https:/github.com/cBioPortal/datahub__;!!PfbeBCCAmug!xbfMBQowNussIe5xTkKZq8hfZbC0xL418xjqEH-IzUExf9zEqLT1XlrlysPPXTI$ e.g. https://github.com/cBioPortal/datahub/tree/master/public/laml_tcga

In the new version of cBioPortalData, which is the above website that cBioDataPack() actually will access to download data? I am asking this, our HPC cluster has firewall restriction, it is better to know which url or both url(s) to for firewall open.

Thank you! Rong

LiNk-NY commented 3 years ago

Hi Rong, @ryao-mdanderson

cBioPortalData version 2.0.10 (and newer) uses the AWS URL https://cbioportal-datahub.s3.amazonaws.com. You can see the code here: https://github.com/waldronlab/cBioPortalData/blob/RELEASE_3_11/R/cBioDataPack.R#L2

Best, Marcel

ryao-mdanderson commented 3 years ago

Hi Marcel,

Thank you for your information.

Have a good night. Rong

