waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
44 stars 7 forks source link

issues with `subsetByAssay` #17

Closed LiNk-NY closed 3 years ago

LiNk-NY commented 6 years ago

@vjcitn writes:

hi -- this GBM dataset is constructed by curatedTCGAData

gbmMAE
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] GBM_CNASNP-20160128: RaggedExperiment with 602338 rows and 1104 columns 
 [2] GBM_mRNAArray_huex-20160128: SummarizedExperiment with 18632 rows and 431 columns 
 [3] GBM_mRNAArray_TX_g4502a-20160128: SummarizedExperiment with 17814 rows and 502 columns 
 [4] GBM_mRNAArray_TX_ht_hg_u133a-20160128: SummarizedExperiment with 12042 rows and 528 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

Why does this happen:

gbmMAE[,,3]
harmonizing input:
  removing 528 sampleMap rows not in names(experiments)
  removing 597 colData rownames not in sampleMap 'primary'
A MultiAssayExperiment object of 0 listed
 experiments with no user-defined names and respective classes. 
 Containing an ExperimentList class object of length 0:  
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

Why no assay? If I use the assay name (name in experimentList) to subset it works

Also, do we intend for the assay component to be a DataFrame as opposed to a matrix?

class(assay(gbmMAE[,,"GBM_mRNAArray_TX_ht_hg_u133a-20160128"]))
harmonizing input:
  removing 69 colData rownames not in sampleMap 'primary'
[1] "DataFrame"
attr(,"package")
[1] "S4Vectors"
LiNk-NY commented 6 years ago

Hi Vince, @vjcitn I've updated the code in waldronlab/MultiAssayExperiment@6348aeaec043be5b91ce877c0b6ff2511066f66f that was causing the weird behaviour when subsetting by assay. This was happening because the sampleMap rows and the experiments(gbm) were not in the same order so I switched to using names rather than indices.

As for your second question, I am not sure why DataFrame is output for that particular assay. That was not the intention. I will look at the pipeline for any obvious errors.

Thanks again! -Marcel

LiNk-NY commented 6 years ago

This issue was moved to waldronlab/MultiAssayExperiment#248

LiNk-NY commented 6 years ago

I still have to look into why DataFrame is a product.

LiNk-NY commented 3 years ago

This issue was moved to #31