waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

getWithColData colnames #279

Closed mherberg closed 3 years ago

mherberg commented 4 years ago

I noticed that getWithColData function returns SummarizedExperiment that are modified from the original. It appears the new colnames match the colData and not the initial assay's colnames.

For example if the assay has colnames of full TCGA barcodes then the function returns colnames of the shortened participant TCGA barcode. Example:

> mae[[1]]
class: SummarizedExperiment 
dim: 20501 520 
metadata(0):
assays(1): ''
rownames(20501): A1BG A1CF ... psiTPTE22 tAKR
rowData names(0):
colnames(520): TCGA-4P-AA8J-01A-11R-A39I-07
  TCGA-BA-4074-01A-01R-1436-07 ... TCGA-WA-A7GZ-01A-11R-A34R-07
  TCGA-WA-A7H4-01A-21R-A34R-07
colData names(0):
> getWithColData(mae, 1)
class: SummarizedExperiment 
dim: 20501 520 
metadata(0):
assays(1): ''
rownames(20501): A1BG A1CF ... psiTPTE22 tAKR
rowData names(0):
colnames(520): TCGA-4P-AA8J TCGA-BA-4074 ... TCGA-WA-A7GZ TCGA-WA-A7H4
colData names(1445): patientID years_to_birth ... Copy.Number PARADIGM

Is this the expected function? Wouldn't one expect that getWithColData produces the same exact SummarizedExperiment without changes plus colData? Adjusting the function to store the colnames and then re-seting them:

colnames <- colnames(exObj)
colData(exObj) <- expanded
colnames(exObj) <- colnames

fixes the problem but I wanted to know what the intended function was.

mherberg commented 4 years ago

Also I noticed the assay name is lost. This fixed this case however may need to check if exObj is an SummarizedExperiment. And it may be more intrinsic to the getter for MultiAssayExperiment.

name <- names(experiments(mae))
exObj <- mae[[1L]]
assayNames(exObj) <- name
LiNk-NY commented 4 years ago

Hi Matt, @mherberg Thanks for the report. I was out of town. I will look at this soon. Best, Marcel

mherberg commented 3 years ago

Marcel,

Were you able to look at this and do you agree it is a problem worth fixing?

Thanks, M

LiNk-NY commented 3 years ago

Hi @mherberg

Thank you for following up. Sorry this fell off my radar.

I had a look into it and it looks like this change happens when we use the replacement function: colData(exObj) <- expanded. It is a side-effect of this function and it makes sure that the rownames(expanded) agree with the colnames of the object.

I think this is the right behavior since we are reducing the complexity of a MultiAssayExperiment into a SummarizedExperiment. In otherwords, the sample to patient mapping (sampleMap) is being removed.

MultiAssayExperiment provides the sample and patient level data but when we use getWithColData, the colData only applies to the participant layer.

I would be careful to first separate out the sample types that you would like to keep (TCGAutils::TCGAsampleSelect) and resolve any replicate samples (mergeReplicates) in the MultiAssayExperiment before performing this type of operation. Also note that not all variables in the colData may apply to the data after the sample separation.

Best, Marcel