waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

Additional columns in sampleMap removed #212

Closed frederikziebell closed 5 years ago

frederikziebell commented 7 years ago

When constructing a MultiAssayExperiment object with sampleMap having additional columns, those columns are automatically removed. It would be helpful to have additional columns to further annotate samples, e.g. for batches.

LiNk-NY commented 7 years ago

The sampleMap is meant to keep track of samples and not to provide any additional information. You could add additional columns to the colData or equivalent to your data objects. I invite input from other developers such as @lwaldron, @vjcitn, @mtmorgan about adding this feature.

lwaldron commented 7 years ago

It seems to me the appropriate places to annotate batches would be in the metadata of the experiments, e.g. the colData if they are SummarizedExperiment objects. Or for batches that affect all assays of a biological specimen, in the colData of the MultiAssayExperiment. Why do you want it in the sampleMap?

frederikziebell commented 7 years ago

In my opinion, adding sample annotation to sampleMap would have three advantages. One is that there the user would not need to set up and handle colData objects for each experiment separately but could draw on the existing infrastructure of MultiAssayExperiment. The second, that it would allow to get a nice overview of all samples obtained by simply inspecting sampleMap. Finally, but perhaps this would be a feature request, it would allow subsetting by sample annotation (e.g. show all data that was obtained on a particular day).

In any case, it would be helpful to know the philosophy/recommendation of MultiAssayExperiment developers on whether sample annotation should be managed by each assay separately or in a central place like sampleMap.

lwaldron commented 7 years ago

Hi Fred, sorry for the delay. I don't think allowing extra columns in sampleMap would hurt anything (or change anything in the current API), and our general approach has been to allow more flexibility as long as it doesn't harm the ease of use or functionality too much. Two approaches here could be:

  1. allow extra columns in the sampleMap DataFrame, as long as the three required columns are valid.
  2. recommend use of a class derived from DataFrame that has a rowData() method for storing extra columns. I think this could already be done with no change to the MultiAssayExperiment code base simply by providing an object of this class instead of a data.frame or DataFrame to the sampleMap argument, although I guess such a derivative class is yet to be defined.

@mtmorgan @vjcitn thoughts?

vjcitn commented 7 years ago

I'll reply in line to Fred's remarks in the previous email.

On Fri, Sep 8, 2017 at 9:41 AM, Levi Waldron notifications@github.com wrote:

Hi Fred, sorry for the delay. I don't think allowing extra columns in sampleMap would hurt anything (or change anything in the current API), and our general approach has been to allow more flexibility as long as it doesn't harm the ease of use or functionality too much. Two approaches here could be:

  1. allow extra columns in the sampleMap DataFrame, as long as the three required columns are valid.
  2. recommend use of a class derived from DataFrame that has a rowData() method for storing extra columns. I think this could already be done with no change to the MultiAssayExperiment code base simply by providing an object of this class instead of a data.frame or DataFrame to the sampleMap argument, although I guess such a derivative class is yet to be defined.

@mtmorgan https://github.com/mtmorgan @vjcitn https://github.com/vjcitn thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/waldronlab/MultiAssayExperiment/issues/212#issuecomment-328106330, or mute the thread https://github.com/notifications/unsubscribe-auth/AEaOwq1xyO_D1YeHNGAbIQntKWycEq0jks5sgUP_gaJpZM4PJS0R .

vjcitn commented 7 years ago

On Fri, Sep 1, 2017 at 3:40 AM, Frederik Ziebell notifications@github.com wrote:

In my opinion, adding sample annotation to sampleMap would have three advantages. One is that there the user would not need to set up and handle colData objects for each experiment separately but could draw on the existing

I am basically in agreement with Levi's liberal outlook on this, but I am concerned that this increases the number of paths to sample-level annotation that need to be accommodated. In the current model, sampleMap has a purely structural role, with no substantive content. The proposed modification means that sampleMap may be the bearer of substantive content that needs to be checked, in addition to that available in experiment-specific colData.

infrastructure of MultiAssayExperiment. The second, that it would allow to get a nice overview of all samples obtained by simply inspecting sampleMap.

Isn't this already possible? (upSet diagram?)

Finally, but perhaps this would be a feature request, it would allow subsetting by sample annotation (e.g. show all data that was obtained on a particular day).

This sounds like a programming problem that would not require change to data architecture.

I don't want to be needlessly negative here ... I have not looked at the example extension and would be happy to be persuaded that there is net benefit to the proposed change.

In any case, it would be helpful to know the philosophy/recommendation of MultiAssayExperiment developers on whether sample annotation should be managed by each assay separately or in a central place like sampleMap.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/waldronlab/MultiAssayExperiment/issues/212#issuecomment-326512593, or mute the thread https://github.com/notifications/unsubscribe-auth/AEaOwkdgFksymV6BXbzTGDeHczX6U7yMks5sd7TWgaJpZM4PJS0R .

lwaldron commented 7 years ago

Finally, but perhaps this would be a feature request, it would allow subsetting by sample annotation (e.g. show all data that was obtained on a particular day).

The way I would do this, for example to keep only "ChIP" profiles in these SummarizedExperiment assays, is:

> library(MultiAssayExperiment) 
> library(SummarizedExperiment) 
> example("SummarizedExperiment")
> mae <- MultiAssayExperiment(list(cmb1=cmb1, cmb2=cmb2)) 
> keeps <- lapply(experiments(mae), function(x) x$Treatment == "ChIP") 
> mae[, keeps, ] 
harmonizing input:   
   removing 7 sampleMap rows with 'colname' not in colnames of experiments
   removing 4 colData rownames not in sampleMap 'primary' 
A MultiAssayExperiment object of 2 listed
  experiments with user-defined names and respective classes.
   Containing an ExperimentList class object of length 2:
   [1] cmb1: RangedSummarizedExperiment with 200 rows and 5 columns
   [2] cmb2: RangedSummarizedExperiment with 250 rows and 3 columns  
Features:
   experiments() - obtain the ExperimentList instance
   colData() - the primary/phenotype DataFrame
   sampleMap() - the sample availability DataFrame
   `$`, `[`, `[[` - extract colData columns, subset, or experiment
   *Format() - convert into a long or wide DataFrame
   assays() - convert ExperimentList to a SimpleList of matrices
> 

A user could do something similar to create the keeps list above based on extra columns or rowData they had stored as part of the sampleMap, starting with split(sampleMap(mae), sampleMap(mae)$assay) to create a list, then using lapply on that list to define the keeps list of logicals for subsetting samples. I suppose the advantage there for Fred is that it is more convenient to add metadata to the sampleMap than to the experiments - either because it's already part of the sampleMap and includes technical assay information all in one easily visualized dataframe, or because he's using assay classes that don't support metadata.

LiNk-NY commented 6 years ago

Hi @frederikziebell,

Can you provide an update on the status of this issue?

It seems like you can create a LogicalList() or list(logical()) to subset based on data found in colData, rowData or other extra columns.

You can also use mapToList(sampleMap(mae)) to create a list split by assays and then create the logical list.

Best regards, Marcel