Closed vjcitn closed 3 years ago
It looks like the colData field for BRCA that matches the RNASeq2GeneNorm colnames is
"patient.samples.sample.portions.portion.analytes.analyte.2.aliquots.aliquot.2.bcr_aliquot_barcode"
Thanks Vince @vjcitn, I've added this in 8e474ad.
I am not sure what you mean when you talk about the colnames. Did you want them to be matched in the colData
?
This can be taken care of by the user. The current operation takes the entirety of the colData
in the MAE and appends it to the colData
of the extracted object.
I guess what is surprising to me may be shown in the following. I add some comments on the right -- maybe there are methods I don't know about?
> suppressMessages({x = curatedTCGAData("BRCA", "RNASeq2GeneNorm", dry=FALSE)})
> rnaseq = experiments(x)[[1]]
> dim(colData(rnaseq)) ### so the colData need to be assigned somehow
[1] 1212 0
> dim(colData(x)) ### the MAE only has 1093 participants ... OK, some RNA-seq samples are normal
[1] 1093 2684
> colnames(rnaseq)[1:3]
[1] "TCGA-3C-AAAU-01A-11R-A41B-07" "TCGA-3C-AALI-01A-11R-A41B-07"
[3] "TCGA-3C-AALJ-01A-31R-A41B-07"
> rownames(colData(x))[1:3] ### the user has to substring and check sample type?
[1] "TCGA-A1-A0SB" "TCGA-A1-A0SD" "TCGA-A1-A0SE"
> sessionInfo()
R version 4.0.2 Patched (2020-07-19 r78892)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS (fossa-melisa X20)
Matrix products: default
BLAS: /home/stvjc/R-4-0-dist/lib/R/lib/libRblas.so
LAPACK: /home/stvjc/R-4-0-dist/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] TCGAutils_1.10.0 curatedTCGAData_1.12.0
[3] MultiAssayExperiment_1.16.0 SummarizedExperiment_1.20.0
Hi Vince, @vjcitn
Please use MultiAssayExperiment::getWithColData
. There may be some repreated columns in both MAE-level and assay-level colData
objects. Conflicts will produce a warning (as seen below).
suppressPackageStartupMessages({
library(curatedTCGAData)
})
brca <- curatedTCGAData(
"BRCA", "RNASeq2GeneNorm", dry=FALSE, version = "2.0.0"
)
#> snapshotDate(): 2020-11-25
#> Working on: BRCA_RNASeq2GeneNorm-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Working on: BRCA_colData-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Working on: BRCA_metadata-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Working on: BRCA_sampleMap-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> harmonizing input:
#> removing 14373 sampleMap rows not in names(experiments)
#> removing 5 colData rownames not in sampleMap 'primary'
getWithColData(brca, "BRCA_RNASeq2GeneNorm-20160128")
#> Warning: Duplicating colData rows due to replicates in 'replicated(x)'
#> class: SummarizedExperiment
#> dim: 20501 1212
#> metadata(3): filename build platform
#> assays(1): ''
#> rownames(20501): A1BG A1CF ... psiTPTE22 tAKR
#> rowData names(0):
#> colnames(1212): TCGA-3C-AAAU TCGA-3C-AALI ... TCGA-Z7-A8R5 TCGA-Z7-A8R6
#> colData names(2684): patientID years_to_birth ...
#> Integrated.Clusters..unsup.exp. X60.Gene.classifier.Class.Assignment
Created on 2020-11-30 by the reprex package (v0.3.0)
I also want to note that version 2.0.0
includes various improvements to the data provided. See the NEWS.md file for details.
Thanks
Super, thank you!
i was surprised that when i subset an MAE to the RNASeq2GeneNorm, the colData is empty. the package vignette should cover how to properly bind the colData and filter to primary tumor samples. i could attempt a PR to address this if it sounds appropriate.