waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

rearrange warning + error #189

Closed lwaldron closed 7 years ago

lwaldron commented 7 years ago

Do you know what this warning is about?

source("https://gist.githubusercontent.com/lwaldron/7506e6867eaae98ade894e9864d9e75a/raw/b1984b08ac2072244b4621a6e75b8d6911037d49/myMultiAssay.R")
rearrange(myMultiAssay)

Here is the warning:

Warning message:
In rearrange(flatBox[[i]], ...) :
  'rowname' column not in 'rowData' taking first one
lwaldron commented 7 years ago

With shape="wide" it returns an error. I noticed this when adding examples of rearrange() to the vignette.

> source("https://gist.githubusercontent.com/lwaldron/7506e6867eaae98ade894e9864d9e75a/raw/b1984b08ac2072244b4621a6e75b8d6911037d49/myMultiAssay.R")
> rearrange(myMultiAssay, shape="wide")
Error in as.data.frame(outputDataFrame) : 
  object 'outputDataFrame' not found
In addition: Warning message:
In rearrange(flatBox[[i]], ...) :
  'rowname' column not in 'rowData' taking first one

Also with just matrix + ExpressionSet from the MultiAssayExperiment example:

> example("MultiAssayExperiment")
> rearrange(myMultiAssayExperiment, shape="wide")
Error in as.data.frame(outputDataFrame) : 
  object 'outputDataFrame' not found
LiNk-NY commented 7 years ago

The warning is about using the rowData in a SummarizedExperiment for the rownames. If there is not rowname column in there, it will take the first one (usually a gene names / ID column).

LiNk-NY commented 7 years ago

I get an error when running the gist.

Error in rownames(patient.data) : object 'patient.data' not found
lwaldron commented 7 years ago

Sorry, here is a fixed gist:

> source("https://gist.githubusercontent.com/lwaldron/7506e6867eaae98ade894e9864d9e75a/raw/e19739f92700d71283893a4fbffd9672b10985c2/myMultiAssay.R")
> rearrange(myMultiAssay, shape="wide")
Error: Duplicate identifiers for rows (39, 44), (40, 45), (38, 43), (36, 41), (37, 42), (9, 11), (10, 12)
In addition: Warning message:
In rearrange(flatBox[[i]], ...) :
  'rowname' column not in 'rowData' taking first one
> 
LiNk-NY commented 7 years ago

This error is due to non-unique identifiers in the data. I'm not sure what the wide format is supposed to look like. What should be in the colnames and in the left-hand side of the DataFrame?

This is what the input looks like:

DataFrame with 6 rows and 5 columns
  assay primary         rowname colname     value
  <Rle>   <Rle>     <character>   <Rle> <numeric>
1  Affy    Jack ENST00000294241  array1       101
2  Affy    Jack ENST00000355076  array1       102
3  Affy    Jill ENST00000294241  array2       103
4  Affy    Jill ENST00000355076  array2       104
5  Affy Barbara ENST00000294241  array3       105
6  Affy Barbara ENST00000355076  array3       106

Do you want the features (rowname column) to be spread out across the columns in the wide format?

LiNk-NY commented 7 years ago

After discussion, the wide format should have one row per biological unit.