waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

Error in .checkBarcodes(barcodes) : Barcodes must start with 'TCGA' #36

Closed sahilseth closed 3 years ago

sahilseth commented 3 years ago

Its possible the pancancer data is organized a bit differently?

mae = cBioDataPack("brca_tcga_pan_can_atlas_2018", ask = FALSE)
# works through quite a few modalities, then fails at fusions (DNA/RNA)

See spec(...) for full column specifications.
Parsed with column specification:
cols(
  Hugo_Symbol = col_character(),
  Entrez_Gene_Id = col_logical(),
  Center = col_character(),
  Tumor_Sample_Barcode = col_character(),
  Fusion = col_character(),
  DNA_support = col_logical(),
  RNA_support = col_logical(),
  Method = col_logical(),
  Frame = col_character()
)
Error in .checkBarcodes(barcodes) : Barcodes must start with 'TCGA'
sahilseth commented 3 years ago

I looked at the fusions file data_fusions.txt, and its seems OK:

Hugo_Symbol     Entrez_Gene_Id  Center  Tumor_Sample_Barcode    Fusion  DNA_support     RNA_support     Method  Frame
GAB2            WashU   TCGA-3C-AAAU-01 GAB2-CHKA
PPFIA1          WashU   TCGA-3C-AAAU-01 PPFIA1-MS4A5                            in-frame
RAB3IP          WashU   TCGA-3C-AAAU-01 RAB3IP-MSRB3                            in-frame
SHROOM3         WashU   TCGA-3C-AAAU-01 SHROOM3-DCST2                           frameshift
CNOT2           WashU   TCGA-3C-AAAU-01 CNOT2-SRGAP1                            in-frame
RASSF6          WashU   TCGA-3C-AAAU-01 RASSF6-LEMD3                            in-frame
BTC             WashU   TCGA-3C-AAAU-01 BTC-ZBTB7B
DPY19L2         WashU   TCGA-3C-AAAU-01 DPY19L2-MRPL1                           frameshift
LiNk-NY commented 3 years ago

Hi Sahil, @sahilseth Can you provide the sessionInfo()? Are you using Bioconductor release or devel? This works for me in devel:

> cBioDataPack("brca_tcga_pan_can_atlas_2018", ask = FALSE)
Downloading study file: brca_tcga_pan_can_atlas_2018.tar.gz
  |======================================================================| 100%

Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_cna_hg19.seg
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_CNA.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_log2CNA.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_microbiome.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_mutations_extended.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_mutations_mskcc.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_RNA_Seq_v2_expression_median.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_RNA_Seq_v2_mRNA_median_all_sample_ref_normal_Zscores.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_RNA_Seq_v2_mRNA_median_all_sample_Zscores.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_RNA_Seq_v2_mRNA_median_normals.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_RNA_Seq_v2_mRNA_median_Zscores_normals.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_RNA_Seq_v2_mRNA_median_Zscores.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_rppa_Zscores.txt
Working on: /tmp/Rtmprr3sMp/1042b77ee016f_brca_tcga_pan_can_atlas_2018/brca_tcga_pan_can_atlas_2018/data_rppa.txt
A MultiAssayExperiment object of 14 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 14:
 [1] cna_hg19.seg: RaggedExperiment with 210376 rows and 1068 columns
 [2] CNA: SummarizedExperiment with 25128 rows and 1070 columns
 [3] log2CNA: SummarizedExperiment with 25128 rows and 1070 columns
 [4] microbiome: SummarizedExperiment with 1406 rows and 1070 columns
 [5] mutations_extended: RaggedExperiment with 130495 rows and 1009 columns
 [6] mutations_mskcc: RaggedExperiment with 130495 rows and 1009 columns
 [7] RNA_Seq_v2_expression_median: SummarizedExperiment with 20531 rows and 1082 columns
 [8] RNA_Seq_v2_mRNA_median_all_sample_ref_normal_Zscores: SummarizedExperiment with 20531 rows and 1082 columns
 [9] RNA_Seq_v2_mRNA_median_all_sample_Zscores: SummarizedExperiment with 20531 rows and 1082 columns
 [10] RNA_Seq_v2_mRNA_median_normals: SummarizedExperiment with 20531 rows and 114 columns
 [11] RNA_Seq_v2_mRNA_median_Zscores_normals: SummarizedExperiment with 20531 rows and 114 columns
 [12] RNA_Seq_v2_mRNA_median_Zscores: SummarizedExperiment with 20471 rows and 1082 columns
 [13] rppa_Zscores: SummarizedExperiment with 198 rows and 876 columns
 [14] rppa: SummarizedExperiment with 198 rows and 876 columns
Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save all data to files
sessionInfo ```r > sessionInfo() R Under development (unstable) (2020-12-12 r79621) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] cBioPortalData_2.3.13 MultiAssayExperiment_1.17.7 [3] SummarizedExperiment_1.21.1 Biobase_2.51.0 [5] GenomicRanges_1.43.3 GenomeInfoDb_1.27.5 [7] IRanges_2.25.6 S4Vectors_0.29.6 [9] BiocGenerics_0.37.0 MatrixGenerics_1.3.1 [11] matrixStats_0.58.0 AnVIL_1.3.15 [13] dplyr_1.0.4 colorout_1.2-2 loaded via a namespace (and not attached): [1] httr_1.4.2 bit64_4.0.5 [3] jsonlite_1.7.2 splines_4.1.0 [5] assertthat_0.2.1 askpass_1.1 [7] TCGAutils_1.11.7 BiocFileCache_1.15.1 [9] blob_1.2.1 Rsamtools_2.7.1 [11] GenomeInfoDbData_1.2.4 RTCGAToolbox_2.21.5 [13] progress_1.2.2 yaml_2.2.1 [15] pillar_1.4.7 RSQLite_2.2.3 [17] lattice_0.20-41 glue_1.4.2 [19] limma_3.47.6 XVector_0.31.1 [21] rvest_0.3.6 Matrix_1.3-2 [23] XML_3.99-0.5 pkgconfig_2.0.3 [25] biomaRt_2.47.4 zlibbioc_1.37.0 [27] purrr_0.3.4 RCircos_1.2.1 [29] rapiclient_0.1.3 BiocParallel_1.25.3 [31] openssl_1.4.3 tibble_3.0.6 [33] generics_0.1.0 ellipsis_0.3.1 [35] withr_2.4.1 cachem_1.0.1 [37] GenomicFeatures_1.43.3 cli_2.3.0 [39] survival_3.2-7 RJSONIO_1.3-1.4 [41] magrittr_2.0.1 crayon_1.4.0 [43] ps_1.5.0 memoise_2.0.0 [45] xml2_1.3.2 prettyunits_1.1.1 [47] tools_4.1.0 data.table_1.13.6 [49] hms_1.0.0 BiocIO_1.1.2 [51] formatR_1.7 lifecycle_0.2.0 [53] stringr_1.4.0 DelayedArray_0.17.7 [55] AnnotationDbi_1.53.0 lambda.r_1.2.4 [57] Biostrings_2.59.2 compiler_4.1.0 [59] rlang_0.4.10 futile.logger_1.4.3 [61] grid_4.1.0 GenomicDataCommons_1.15.0 [63] RCurl_1.98-1.2 rstudioapi_0.13 [65] rjson_0.2.20 rappdirs_0.3.3 [67] bitops_1.0-6 restfulr_0.0.13 [69] DBI_1.1.1 curl_4.3 [71] R6_2.5.0 GenomicAlignments_1.27.2 [73] rtracklayer_1.51.4 fastmap_1.1.0 [75] bit_4.0.4 filelock_1.0.2 [77] futile.options_1.0.1 readr_1.4.0 [79] stringi_1.5.3 RaggedExperiment_1.15.1 [81] Rcpp_1.0.6 vctrs_0.3.6 [83] dbplyr_2.0.0 tidyselect_1.1.0 ```

The fusion data is located in the metadata of the object though it should stay as a data.frame and not unlisted as it is currently. I will work on that fix.

LiNk-NY commented 3 years ago

Hi Sahil, @sahilseth

This is also confirmed to work on the RELEASE_3_12 branch version 2.2.6.

Best, Marcel