Closed vjcitn closed 4 years ago
Hi Vince, @vjcitn That is correct there are replicates in the data likely due to the diverse set of samples in the data which includes normals as well:
> library(TCGAutils)
> sampleTables(c1)
$`BRCA_RNASeq2GeneNorm-20160128`
01 06 11
1093 7 112
The best way to check this is to use replicated
function which will give you a logical vector for each entry in the colData
that corresponds to the assay columns in question.
> replicated(c1)
$`BRCA_RNASeq2GeneNorm-20160128`
LogicalList of length 1093
[["TCGA-3C-AAAU"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-3C-AALI"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-3C-AALJ"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-3C-AALK"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-4H-AAAK"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-5L-AAT0"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-5L-AAT1"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-5T-A9QA"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-A1-A0SB"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
[["TCGA-A1-A0SD"]] FALSE FALSE FALSE FALSE FALSE ... FALSE FALSE FALSE FALSE
You can then use this information to pin point the columns that come from the same participant:
> Filter(length, which(replicated(c1)[[1]]))
IntegerList of length 116
[["TCGA-A7-A0CE"]] 126 127
[["TCGA-A7-A0CH"]] 129 130
[["TCGA-A7-A0D9"]] 132 133
[["TCGA-A7-A0DB"]] 135 136
[["TCGA-A7-A13E"]] 138 139
[["TCGA-A7-A13F"]] 140 141
[["TCGA-A7-A13G"]] 142 143
[["TCGA-AC-A23H"]] 260 261
[["TCGA-AC-A2FB"]] 265 266
[["TCGA-AC-A2FF"]] 268 269
If you're only interested in primary tumors, you can use TCGAutils
:
> TCGAprimaryTumors(c1)
harmonizing input:
removing 119 sampleMap rows with 'colname' not in colnames of experiments
A MultiAssayExperiment object of 1 listed
experiment with a user-defined name and respective class.
Containing an ExperimentList class object of length 1:
[1] BRCA_RNASeq2GeneNorm-20160128: SummarizedExperiment with 20501 rows and 1093 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
Otherwise, you can use TCGAsampleSelect
or splitAssays
based on sample codes.
I hope that helps. Thanks.
Many many thanks. How could I forget about paired normal samples ... but I did.
So there are 1212 RNASeq contributions on 1093 individuals. I thought this was explained somewhere but I can't put my finger on it.