stemangiola / tidySummarizedExperiment

Brings SummarizedExperiment to the tidyverse
24 stars 6 forks source link

Changing how a SumarizedExperiment gets displayed should not be a side effect of loading the package #97

Open hpages opened 3 weeks ago

hpages commented 3 weeks ago

Hi @stemangiola,

This is how a SummarizedExperiment object gets displayed before loading tidySummarizedExperiment:

library(SummarizedExperiment)
example(SummarizedExperiment)
se0
# class: SummarizedExperiment 
# dim: 200 6 
# metadata(0):
# assays(1): counts
# rownames: NULL
# rowData names(0):
# colnames(6): A B ... E F
# colData names(1): Treatment

And this is how it gets displayed after loading tidySummarizedExperiment:

library(tidySummarizedExperiment)
Loading required package: ttservice
se0
# # A SummarizedExperiment-tibble abstraction: 1,200 × 4
# # Features=200 | Samples=6 | Assays=counts
#    .feature .sample counts Treatment
#    <chr>    <chr>    <dbl> <chr>    
#  1 1        A         9.66 ChIP     
#  2 2        A         9.71 ChIP     
#  3 3        A         9.50 ChIP     
#  4 4        A         8.71 ChIP     
#  5 5        A         9.72 ChIP     
#  6 6        A         9.37 ChIP     
#  7 7        A         6.34 ChIP     
#  8 8        A         9.89 ChIP     
#  9 9        A         9.62 ChIP     
# 10 10       A         9.14 ChIP     
# # ℹ 40 more rows
# # ℹ Use `print(n = ...)` to see more rows

This is not good behavior. See here and here for why.

Please address at your earliest convenience. I would recommend coordinating with the biocmask authors to avoid duplicated effort and ensure consistency between tidySummarizedExperiment and biocmask.

Thanks, H.

sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 23.10

Matrix products: default
BLAS:   /home/hpages/R/R-4.4.0/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggplot2_3.5.1                   tidyr_1.3.1                    
 [3] dplyr_1.1.4                     tidySummarizedExperiment_1.15.1
 [5] ttservice_0.4.1                 SummarizedExperiment_1.35.3    
 [7] Biobase_2.65.1                  GenomicRanges_1.57.1           
 [9] GenomeInfoDb_1.41.1             IRanges_2.39.2                 
[11] S4Vectors_0.43.2                BiocGenerics_0.51.3            
[13] MatrixGenerics_1.17.0           matrixStats_1.4.1              

loaded via a namespace (and not attached):
 [1] plotly_4.10.4           utf8_1.2.4              generics_0.1.3         
 [4] SparseArray_1.5.41      stringi_1.8.4           lattice_0.22-6         
 [7] digest_0.6.37           magrittr_2.0.3          grid_4.4.0             
[10] fastmap_1.2.0           jsonlite_1.8.9          Matrix_1.7-0           
[13] httr_1.4.7              purrr_1.0.2             fansi_1.0.6            
[16] viridisLite_0.4.2       UCSC.utils_1.1.0        scales_1.3.0           
[19] lazyeval_0.2.2          abind_1.4-8             cli_3.6.3              
[22] rlang_1.1.4             crayon_1.5.3            XVector_0.45.0         
[25] ellipsis_0.3.2          munsell_0.5.1           withr_3.0.1            
[28] DelayedArray_0.31.13    S4Arrays_1.5.10         tools_4.4.0            
[31] colorspace_2.1-1        GenomeInfoDbData_1.2.13 vctrs_0.6.5            
[34] R6_2.5.1                lifecycle_1.0.4         stringr_1.5.1          
[37] zlibbioc_1.51.1         htmlwidgets_1.6.4       pkgconfig_2.0.3        
[40] pillar_1.9.0            gtable_0.3.5            glue_1.8.0             
[43] data.table_1.16.0       tibble_3.2.1            tidyselect_1.2.1       
[46] htmltools_0.5.8.1       compiler_4.4.0         
stemangiola commented 3 weeks ago

Hello Herve, one solution would be to create an overall display package, with the explicit goal of change display method.

Isolating that from tidySE and biomask, will have the nice effect of making them interoperable.

I am discussing this with Michael Love.

hpages commented 6 days ago

Hi Stefano,

See for a cleaner OO approach here: https://github.com/Bioconductor/Contributions/issues/3616#issuecomment-2423413619

The idea is to formalize the "SummarizedExperiment-tibble Abstraction" by wrapping the SummarizedExperiment object in an S4 shell. Maybe tidySummarizedExperiment, tidySingleCellExperiment, and the Bioconductor tidy stack in general, could do something like this? A SummarizedExperiment object is not a data-frame-like object so it's conceptually wrong to pass it naked to a dplyr verb.

Let me know what you think.

Best, H.

stemangiola commented 6 days ago

Thanks Herve,

it seems elegant. if I understand it would require a new class. Do you mean that through LFSummarizedExperiment dplyr would NOT know that this is a SE object? Under the hood, we define dplyr verbs to often act on colData or rowData directly to gain efficiency. For example, if I filter for feature-wise columns, I internally act on rowData, and the same for colData. I don't think I fully understand the setup and if I am missing something here.

As a note about the philosophy (even if we forget the tidy data display), the idea of the tidy stack is to function as a transparent plugin for existing analysis pipelines, which that would not change any of its behaviour. For example, changing the meaning of dim() would change the meaning of some code. Also, the idea is that if I transfer my SE object to my collaborator who does not use tidy, they would have no trace that tidyomics was used, and neither would the SE object. Does this reasoning make sense?