wideFormat() expands all columns if any assay contains a duplicate

waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data

https://waldronlab.io/MultiAssayExperiment/

70 stars 32 forks source link

wideFormat() expands all columns if any assay contains a duplicate #229

Closed lwaldron closed 6 years ago

lwaldron commented 6 years ago

For example:

> example("MultiAssayExperiment")
> duplicated(myMultiAssayExperiment)
$Affy
LogicalList of length 0

$Methyl450k
LogicalList of length 1
[["Jack"]] TRUE TRUE FALSE FALSE FALSE

$RNASeqGene
LogicalList of length 0

$GISTIC
LogicalList of length 0

> maemerged <- mergeReplicates(myMultiAssayExperiment)
> dim(wideFormat(maemerged[1, , 2]))
[1] 4 2
> dim(wideFormat(myMultiAssayExperiment[1, , 2])) #I expected this to have 3 columns
See ?mergeReplicates to combine replicated observations
  to get one column per variable
[1] 4 6
> dim(wideFormat(myMultiAssayExperiment[1, , c(1, 3:4)]))
[1] 4 4
> dim(wideFormat(myMultiAssayExperiment[1, , c(1:4)]))  #and this to have 6?
See ?mergeReplicates to combine replicated observations
  to get one column per variable
[1]  4 17
>

Or in other words, I had expected to see the column name appended only to duplicated columns by wideFormat(), not to all columns.

lwaldron commented 6 years ago

I think what I expected would be usually more useful, but I would welcome any other thoughts on expected behavior, @LiNk-NY @vjcitn @kasperdanielhansen @mtmorgan @seandavi @ttriche @lawremi @PeteHaitch? Actually the dimensions I expected above would require leaving the variable name of one duplicate unchanged, and appending the column name to the others. This selection of one duplicate over the others might be arbitrary.

LiNk-NY commented 6 years ago

Hi Levi, @lwaldron I see what you mean. This can be done by selectively reshaping duplicated rows to include the "colname" in the resulting columns (which may get a bit complicated) and then reshaping the rest of the columns via the simple method. Finally, we can do a cbind to join the duplicated and non-duplicated data.frames.

For example, we can take rows c(1,2) and reshape those differently than c(3:5):

longDataFrame
       assay primary         rowname colname value
1 Methyl450k    Jack ENST00000355076 methyl1     1
2 Methyl450k    Jack ENST00000355076 methyl2     6
3 Methyl450k    Jill ENST00000355076 methyl3    11
4 Methyl450k Barbara ENST00000355076 methyl4    16
5 Methyl450k     Bob ENST00000355076 methyl5    21

Regards, Marcel

LiNk-NY commented 6 years ago

As a side note, I've improved the output of the duplicated method on a MultiAssayExperiment. See 60f9bcb87df9ec190c436e90691f5997050428cf