waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

Issue with longFormat and numeric row names #331

Closed lgatto closed 2 months ago

lgatto commented 2 months ago

Let's assume we have an SE with row names that don't match the indices, like below:

> m <- matrix(1:12, nrow = 4)
> colnames(m) <- LETTERS[1:3]
> se <- SummarizedExperiment(m)
> rownames(se) <- c(1, 2, 4, 5) 
> se
class: SummarizedExperiment 
dim: 4 3 
metadata(0):
assays(1): ''
rownames(4): 1 2 4 5
rowData names(0):
colnames(3): A B C
colData names(0):

Once this SE is in an MAE, and I convert it to a long DataFrame, the rowname column is an integer, which is problematic, as there's no 5th row, but a row names "5".

> mae <- MultiAssayExperiment(list(x = se))
> longFormat(mae)
DataFrame with 12 rows and 5 columns
          assay     primary   rowname  colname     value
    <character> <character> <integer> <factor> <integer>
1             x           A         1        A         1
2             x           A         2        A         2
3             x           A         4        A         3
4             x           A         5        A         4
5             x           B         1        B         5
...         ...         ...       ...      ...       ...
8             x           B         5        B         8
9             x           C         1        C         9
10            x           C         2        C        10
11            x           C         4        C        11
12            x           C         5        C        12

There's no such problem if the row names aren't numbers:

> rownames(se) <- paste0("r", c(1, 2, 4, 5))
> se
class: SummarizedExperiment 
dim: 4 3 
metadata(0):
assays(1): ''
rownames(4): r1 r2 r4 r5
rowData names(0):
colnames(3): A B C
colData names(0):
> mae <- MultiAssayExperiment(list(x = se))
> longFormat(mae)
DataFrame with 12 rows and 5 columns
          assay     primary  rowname  colname     value
    <character> <character> <factor> <factor> <integer>
1             x           A       r1        A         1
2             x           A       r2        A         2
3             x           A       r4        A         3
4             x           A       r5        A         4
5             x           B       r1        B         5
...         ...         ...      ...      ...       ...
8             x           B       r5        B         8
9             x           C       r1        C         9
10            x           C       r2        C        10
11            x           C       r4        C        11
12            x           C       r5        C        12

I think the colname should always be a character.

LiNk-NY commented 2 months ago

Hi Laurent, @lgatto Thanks for the reporting and the reprex. I've made sure that the "rowname" column is character in the output. Best, Marcel