stemangiola / tidySummarizedExperiment

Brings SummarizedExperiment to the tidyverse
23 stars 6 forks source link

`tidySE` manage `to_tibble` if rownames not match, or are absent #70

Closed sdgamboa closed 1 year ago

sdgamboa commented 1 year ago

I used to be able to transform my tse into tidy format by calling tidySummarizedExperiment::as_tibble (don't remember the version), but it doesn't work anymore. Wondering if this could be a bug or just some formatting I need to do in my data. I'd appreciate any help. Thanks.

library(curatedMetagenomicData)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Warning: replacing previous import 'S4Arrays::read_block' by
#> 'DelayedArray::read_block' when loading 'SummarizedExperiment'
#> Loading required package: TreeSummarizedExperiment
#> Loading required package: SingleCellExperiment
#> Loading required package: Biostrings
#> Loading required package: XVector
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
library(tidySummarizedExperiment)
#> 
#> Attaching package: 'tidySummarizedExperiment'
#> The following object is masked from 'package:XVector':
#> 
#>     slice
#> The following object is masked from 'package:IRanges':
#> 
#>     slice
#> The following object is masked from 'package:S4Vectors':
#> 
#>     rename
#> The following object is masked from 'package:matrixStats':
#> 
#>     count
#> The following object is masked from 'package:stats':
#> 
#>     filter
dataset_name <- "HallAB_2017.relative_abundance"
tse <- curatedMetagenomicData(
    pattern = dataset_name, 
    dryrun = FALSE, rownames = 'NCBI',
    counts = TRUE
)[[1]]
#> 
#> $`2021-10-14.HallAB_2017.relative_abundance`
#> dropping rows without rowTree matches:
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Atopobiaceae|g__Olsenella|s__Olsenella_profusa
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Coriobacteriaceae|g__Collinsella|s__Collinsella_stercoris
#>   k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Carnobacteriaceae|g__Granulicatella|s__Granulicatella_elegans
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus|s__Ruminococcus_champanellensis
#>   k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Bulleidia|s__Bulleidia_extructa
#>   k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellaceae|g__Sutterella|s__Sutterella_parvirubra
#>   k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Cloacibacillus|s__Cloacibacillus_evryensis
tse 
#> class: TreeSummarizedExperiment 
#> dim: 503 259 
#> metadata(1): agglomerated_by_rank
#> assays(1): relative_abundance
#> rownames(503): 853 820 ... 172901 1262744
#> rowData names(7): superkingdom phylum ... genus species
#> colnames(259): p8582_mo1 p8582_mo10 ... SKST041_2_G103027
#>   SKST041_3_G103028
#> colData names(24): study_name subject_id ... HBI SCCAI
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (503 rows)
#> rowTree: 1 phylo tree(s) (10430 leaves)
#> colLinks: NULL
#> colTree: NULL
class(tse)
#> [1] "TreeSummarizedExperiment"
#> attr(,"package")
#> [1] "TreeSummarizedExperiment"
tidy_tse <- tidySummarizedExperiment::as_tibble(tse)
#> Error in `map2()`:
#> ℹ In index: 1.
#> ℹ With name: relative_abundance.
#> Caused by error in `.x[rownames(se), , drop = FALSE]`:
#> ! subscript out of bounds
#> Backtrace:
#>      ▆
#>   1. ├─tidySummarizedExperiment::as_tibble(tse)
#>   2. ├─tidySummarizedExperiment:::as_tibble.SummarizedExperiment(tse)
#>   3. │ └─tidySummarizedExperiment:::.as_tibble_optimised(...)
#>   4. │   └─tidySummarizedExperiment:::get_count_datasets(x)
#>   5. │     ├─... %>% ...
#>   6. │     └─purrr::map2(...)
#>   7. │       └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
#>   8. │         ├─purrr:::with_indexed_errors(...)
#>   9. │         │ └─base::withCallingHandlers(...)
#>  10. │         ├─purrr:::call_with_cleanup(...)
#>  11. │         └─tidySummarizedExperiment (local) .f(.x[[i]], .y[[i]], ...)
#>  12. ├─purrr::when(...)
#>  13. ├─purrr::when(...)
#>  14. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#>  15.   └─cli::cli_abort(...)
#>  16.     └─rlang::abort(...)
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.0 (2023-04-21)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-05-15
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                  * version   date (UTC) lib source
#>  AnnotationDbi              1.62.1    2023-05-02 [1] Bioconductor
#>  AnnotationHub              3.8.0     2023-04-25 [1] Bioconductor
#>  ape                        5.7-1     2023-03-13 [1] CRAN (R 4.3.0)
#>  beachmat                   2.16.0    2023-04-25 [1] Bioconductor
#>  beeswarm                   0.4.0     2021-06-01 [1] CRAN (R 4.3.0)
#>  Biobase                  * 2.60.0    2023-04-25 [1] Bioconductor
#>  BiocFileCache              2.8.0     2023-04-25 [1] Bioconductor
#>  BiocGenerics             * 0.46.0    2023-04-25 [1] Bioconductor
#>  BiocManager                1.30.20   2023-02-24 [1] CRAN (R 4.3.0)
#>  BiocNeighbors              1.18.0    2023-04-25 [1] Bioconductor
#>  BiocParallel               1.34.1    2023-05-05 [1] Bioconductor
#>  BiocSingular               1.16.0    2023-04-25 [1] Bioconductor
#>  BiocVersion                3.17.1    2022-11-04 [1] Bioconductor
#>  Biostrings               * 2.68.0    2023-04-25 [1] Bioconductor
#>  bit                        4.0.5     2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64                      4.0.5     2020-08-30 [1] CRAN (R 4.3.0)
#>  bitops                     1.0-7     2021-04-24 [1] CRAN (R 4.3.0)
#>  blob                       1.2.4     2023-03-17 [1] CRAN (R 4.3.0)
#>  cachem                     1.0.8     2023-05-01 [1] CRAN (R 4.3.0)
#>  cli                        3.6.1     2023-03-23 [1] CRAN (R 4.3.0)
#>  cluster                    2.1.4     2022-08-22 [2] CRAN (R 4.3.0)
#>  codetools                  0.2-19    2023-02-01 [2] CRAN (R 4.3.0)
#>  colorspace                 2.1-0     2023-01-23 [1] CRAN (R 4.3.0)
#>  crayon                     1.5.2     2022-09-29 [1] CRAN (R 4.3.0)
#>  curatedMetagenomicData   * 3.8.0     2023-04-27 [1] Bioconductor
#>  curl                       5.0.0     2023-01-12 [1] CRAN (R 4.3.0)
#>  data.table                 1.14.8    2023-02-17 [1] CRAN (R 4.3.0)
#>  DBI                        1.1.3     2022-06-18 [1] CRAN (R 4.3.0)
#>  dbplyr                     2.3.2     2023-03-21 [1] CRAN (R 4.3.0)
#>  DECIPHER                   2.28.0    2023-04-25 [1] Bioconductor
#>  decontam                   1.20.0    2023-04-25 [1] Bioconductor
#>  DelayedArray               0.26.2    2023-05-05 [1] Bioconductor
#>  DelayedMatrixStats         1.22.0    2023-04-25 [1] Bioconductor
#>  digest                     0.6.31    2022-12-11 [1] CRAN (R 4.3.0)
#>  DirichletMultinomial       1.42.0    2023-04-25 [1] Bioconductor
#>  dplyr                      1.1.2     2023-04-20 [1] CRAN (R 4.3.0)
#>  ellipsis                   0.3.2     2021-04-29 [1] CRAN (R 4.3.0)
#>  evaluate                   0.21      2023-05-05 [1] CRAN (R 4.3.0)
#>  ExperimentHub              2.8.0     2023-04-25 [1] Bioconductor
#>  fansi                      1.0.4     2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap                    1.1.1     2023-02-24 [1] CRAN (R 4.3.0)
#>  filelock                   1.0.2     2018-10-05 [1] CRAN (R 4.3.0)
#>  fs                         1.6.2     2023-04-25 [1] CRAN (R 4.3.0)
#>  generics                   0.1.3     2022-07-05 [1] CRAN (R 4.3.0)
#>  GenomeInfoDb             * 1.36.0    2023-04-25 [1] Bioconductor
#>  GenomeInfoDbData           1.2.10    2023-04-28 [1] Bioconductor
#>  GenomicRanges            * 1.52.0    2023-04-25 [1] Bioconductor
#>  ggbeeswarm                 0.7.2     2023-04-29 [1] CRAN (R 4.3.0)
#>  ggplot2                    3.4.2     2023-04-03 [1] CRAN (R 4.3.0)
#>  ggrepel                    0.9.3     2023-02-03 [1] CRAN (R 4.3.0)
#>  glue                       1.6.2     2022-02-24 [1] CRAN (R 4.3.0)
#>  gridExtra                  2.3       2017-09-09 [1] CRAN (R 4.3.0)
#>  gtable                     0.3.3     2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools                  0.5.5     2023-03-23 [1] CRAN (R 4.3.0)
#>  htmlwidgets                1.6.2     2023-03-17 [1] CRAN (R 4.3.0)
#>  httpuv                     1.6.11    2023-05-11 [1] CRAN (R 4.3.0)
#>  httr                       1.4.6     2023-05-08 [1] CRAN (R 4.3.0)
#>  interactiveDisplayBase     1.38.0    2023-04-25 [1] Bioconductor
#>  IRanges                  * 2.34.0    2023-04-25 [1] Bioconductor
#>  irlba                      2.3.5.1   2022-10-03 [1] CRAN (R 4.3.0)
#>  jsonlite                   1.8.4     2022-12-06 [1] CRAN (R 4.3.0)
#>  KEGGREST                   1.40.0    2023-04-25 [1] Bioconductor
#>  knitr                      1.42      2023-01-25 [1] CRAN (R 4.3.0)
#>  later                      1.3.1     2023-05-02 [1] CRAN (R 4.3.0)
#>  lattice                    0.21-8    2023-04-05 [2] CRAN (R 4.3.0)
#>  lazyeval                   0.2.2     2019-03-15 [1] CRAN (R 4.3.0)
#>  lifecycle                  1.0.3     2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr                   2.0.3     2022-03-30 [1] CRAN (R 4.3.0)
#>  MASS                       7.3-60    2023-05-04 [2] CRAN (R 4.3.0)
#>  Matrix                     1.5-4     2023-04-04 [2] CRAN (R 4.3.0)
#>  MatrixGenerics           * 1.12.0    2023-04-25 [1] Bioconductor
#>  matrixStats              * 0.63.0    2022-11-18 [1] CRAN (R 4.3.0)
#>  memoise                    2.0.1     2021-11-26 [1] CRAN (R 4.3.0)
#>  mgcv                       1.8-42    2023-03-02 [2] CRAN (R 4.3.0)
#>  mia                        1.8.0     2023-04-25 [1] Bioconductor
#>  mime                       0.12      2021-09-28 [1] CRAN (R 4.3.0)
#>  MultiAssayExperiment       1.26.0    2023-04-25 [1] Bioconductor
#>  munsell                    0.5.0     2018-06-12 [1] CRAN (R 4.3.0)
#>  nlme                       3.1-162   2023-01-31 [2] CRAN (R 4.3.0)
#>  permute                    0.9-7     2022-01-27 [1] CRAN (R 4.3.0)
#>  pillar                     1.9.0     2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig                  2.0.3     2019-09-22 [1] CRAN (R 4.3.0)
#>  plotly                     4.10.1    2022-11-07 [1] CRAN (R 4.3.0)
#>  plyr                       1.8.8     2022-11-11 [1] CRAN (R 4.3.0)
#>  png                        0.1-8     2022-11-29 [1] CRAN (R 4.3.0)
#>  promises                   1.2.0.1   2021-02-11 [1] CRAN (R 4.3.0)
#>  purrr                      1.0.1     2023-01-10 [1] CRAN (R 4.3.0)
#>  R6                         2.5.1     2021-08-19 [1] CRAN (R 4.3.0)
#>  rappdirs                   0.3.3     2021-01-31 [1] CRAN (R 4.3.0)
#>  Rcpp                       1.0.10    2023-01-22 [1] CRAN (R 4.3.0)
#>  RCurl                      1.98-1.12 2023-03-27 [1] CRAN (R 4.3.0)
#>  reprex                     2.0.2     2022-08-17 [1] CRAN (R 4.3.0)
#>  reshape2                   1.4.4     2020-04-09 [1] CRAN (R 4.3.0)
#>  rlang                      1.1.1     2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown                  2.21      2023-03-26 [1] CRAN (R 4.3.0)
#>  RSQLite                    2.3.1     2023-04-03 [1] CRAN (R 4.3.0)
#>  rstudioapi                 0.14      2022-08-22 [1] CRAN (R 4.3.0)
#>  rsvd                       1.0.5     2021-04-16 [1] CRAN (R 4.3.0)
#>  S4Arrays                   1.0.4     2023-05-14 [1] Bioconductor
#>  S4Vectors                * 0.38.1    2023-05-02 [1] Bioconductor
#>  ScaledMatrix               1.8.1     2023-05-03 [1] Bioconductor
#>  scales                     1.2.1     2022-08-20 [1] CRAN (R 4.3.0)
#>  scater                     1.28.0    2023-04-25 [1] Bioconductor
#>  scuttle                    1.10.1    2023-05-02 [1] Bioconductor
#>  sessioninfo                1.2.2     2021-12-06 [1] CRAN (R 4.3.0)
#>  shiny                      1.7.4     2022-12-15 [1] CRAN (R 4.3.0)
#>  SingleCellExperiment     * 1.22.0    2023-04-25 [1] Bioconductor
#>  sparseMatrixStats          1.12.0    2023-04-25 [1] Bioconductor
#>  stringi                    1.7.12    2023-01-11 [1] CRAN (R 4.3.0)
#>  stringr                    1.5.0     2022-12-02 [1] CRAN (R 4.3.0)
#>  SummarizedExperiment     * 1.30.1    2023-05-01 [1] Bioconductor
#>  tibble                     3.2.1     2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyr                      1.3.0     2023-01-24 [1] CRAN (R 4.3.0)
#>  tidyselect                 1.2.0     2022-10-10 [1] CRAN (R 4.3.0)
#>  tidySummarizedExperiment * 1.10.0    2023-04-25 [1] Bioconductor
#>  tidytree                   0.4.2     2022-12-18 [1] CRAN (R 4.3.0)
#>  treeio                     1.24.0    2023-04-25 [1] Bioconductor
#>  TreeSummarizedExperiment * 2.8.0     2023-04-25 [1] Bioconductor
#>  utf8                       1.2.3     2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs                      0.6.2     2023-04-19 [1] CRAN (R 4.3.0)
#>  vegan                      2.6-4     2022-10-11 [1] CRAN (R 4.3.0)
#>  vipor                      0.4.5     2017-03-22 [1] CRAN (R 4.3.0)
#>  viridis                    0.6.3     2023-05-03 [1] CRAN (R 4.3.0)
#>  viridisLite                0.4.2     2023-05-02 [1] CRAN (R 4.3.0)
#>  withr                      2.5.0     2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun                       0.39      2023-04-20 [1] CRAN (R 4.3.0)
#>  xtable                     1.8-4     2019-04-21 [1] CRAN (R 4.3.0)
#>  XVector                  * 0.40.0    2023-04-25 [1] Bioconductor
#>  yaml                       2.3.7     2023-01-23 [1] CRAN (R 4.3.0)
#>  yulab.utils                0.0.6     2022-12-20 [1] CRAN (R 4.3.0)
#>  zlibbioc                   1.46.0    2023-04-25 [1] Bioconductor
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-4.3.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-05-15 with reprex v2.0.2

csoneson commented 1 year ago

I think this is caused by mismatching rownames in the TSE and the assay itself:

> head(rownames(tse))
[1] "853"    "820"    "301301" "28117"  "357276" "39491" 
> head(rownames(assay(tse, "relative_abundance", withDimnames = FALSE)))
[1] "1239_186801_186802_216572_216851_853" 
[2] "976_200643_171549_815_816_820"        
[3] "1239_186801_186802_186803_841_301301" 
[4] "976_200643_171549_171550_239759_28117"
[5] "976_200643_171549_815_909656_357276"  
[6] "1239_186801_186802_186803_NA_39491"  

which causes problems here.

stemangiola commented 1 year ago

Is it normal to o have row name mismatch? it sounds dangerous to me. We could avoid that rowname-dependent operation. or throw an informative error.

what do you folks think?

csoneson commented 1 year ago

I can't say whether it's normal 🙂 but indeed seems like unintentional things could happen. It seems to also be an issue if the SE does not have dimnames, while the assay does:

> se <- as(scuttle::mockSCE(), "SummarizedExperiment")[, 1:5]
> head(rownames(se))
NULL
> head(rownames(assay(se, "counts", withDimnames = FALSE)))
[1] "Gene_0001" "Gene_0002" "Gene_0003" "Gene_0004" "Gene_0005" "Gene_0006"

> tidySummarizedExperiment::as_tibble(se)
# A tibble: 0 × 6
# ℹ 6 variables: .feature <chr>, .sample <chr>, counts <dbl>,
#   Mutation_Status <chr>, Cell_Cycle <chr>, Treatment <chr>

> rownames(se) <- rownames(assay(se, "counts", withDimnames = FALSE))
> tidySummarizedExperiment::as_tibble(se)
# A tibble: 10,000 × 6
   .feature  .sample  counts Mutation_Status Cell_Cycle Treatment
   <chr>     <chr>     <dbl> <chr>           <chr>      <chr>    
 1 Gene_0001 Cell_001      0 positive        G0         treat2   
 2 Gene_0002 Cell_001     38 positive        G0         treat2   
 3 Gene_0003 Cell_001      0 positive        G0         treat2   
 4 Gene_0004 Cell_001     20 positive        G0         treat2   
 5 Gene_0005 Cell_001     28 positive        G0         treat2   
 6 Gene_0006 Cell_001     32 positive        G0         treat2   
 7 Gene_0007 Cell_001     62 positive        G0         treat2   
 8 Gene_0008 Cell_001      0 positive        G0         treat2   
 9 Gene_0009 Cell_001      4 positive        G0         treat2   
10 Gene_0010 Cell_001      0 positive        G0         treat2   
# ℹ 9,990 more rows
# ℹ Use `print(n = ...)` to see more rows
stemangiola commented 1 year ago

Maybe we should

Feedback and PRs are welcome!

csoneson commented 1 year ago

Sounds reasonable to me. I could give it a go (will most likely be when I'm back from holidays next week).

stemangiola commented 10 months ago

I see

"SummarizedExperiment::SummarizedExperiment(assays = my_assays %>% : the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the SummarizedExperiment object (or derivative) to construct"

@csoneson did you implemented this messaging? If so, could I please ask to use "tidySummarizedExperiment says: " when messages/warnings/errors are launched?

csoneson commented 10 months ago

@stemangiola Good question...I don't see this message anywhere in the tidy repositories; however, it looks like it may come from SummarizedExperiment:

https://github.com/Bioconductor/SummarizedExperiment/blob/9864c3601c9295f8e3a1e060fa8fe3e7f958be1d/R/RangedSummarizedExperiment-class.R#L164-L166

aodainic7 commented 10 months ago

Hey @csoneson, I have the following issue when trying to convert an aggregated experiment to Summarized experiment:

> CD8_adt.harmony_sct_WNN |> slice_sample(prop = 0.001) |> 
+   tidyseurat::aggregate_cells(c(sample, !!as.symbol("manual_cluster")), slot = "counts", assays=c("RNA", "ADT")) |>
+   tidybulk::as_SummarizedExperiment(.sample, .feature, any_of(c("RNA", "ADT")))
Error in SummarizedExperiment::SummarizedExperiment(assays = my_assays %>%  : 
  the rownames and colnames of the supplied assay(s) must be NULL or identical to those of
  the SummarizedExperiment object (or derivative) to construct

Even if I try it on the full dataset I get a similar error:

error in evaluating the argument '.data' in selecting a method for function 'as_SummarizedExperiment': nanny says: some of the .column specified do not exist in the input data frame.
Calls: .rs.sourceWithProgress ... same_src.data.frame -> is.data.frame -> %>% -> subset_tidyseurat

This is what my aggregate_cells output looks like: image

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cbmc.SeuratData_3.1.4   rstatix_0.7.2           gt_0.10.0               limma_3.54.2            tidygate_0.4.9          progressr_0.14.0       
 [7] RColorBrewer_1.1-3      gitlabr_2.0.1.9000      harmony_1.0.3           Rcpp_1.0.11             glmGamPoi_1.10.2        tidyseurat_0.7.4       
[13] ttservice_0.3.8         patchwork_1.1.3         SeuratObject_4.9.9.9091 Seurat_4.4.0            lubridate_1.9.3         forcats_1.0.0          
[19] stringr_1.5.0           dplyr_1.1.3             purrr_1.0.2             readr_2.1.4             tidyr_1.3.0             tibble_3.2.1           
[25] ggplot2_3.4.4           tidyverse_2.0.0        

loaded via a namespace (and not attached):
  [1] utf8_1.2.4                  spatstat.explore_3.2-5      reticulate_1.34.0           tidyselect_1.2.0            htmlwidgets_1.6.2          
  [6] grid_4.2.1                  BiocParallel_1.32.6         Rtsne_0.16                  munsell_0.5.0               ScaledMatrix_1.6.0         
 [11] preprocessCore_1.60.2       codetools_0.2-18            ica_1.0-3                   future_1.33.0               miniUI_0.1.1.1             
 [16] withr_2.5.1                 spatstat.random_3.2-1       colorspace_2.1-0            Biobase_2.58.0              knitr_1.44                 
 [21] rstudioapi_0.15.0           stats4_4.2.1                SingleCellExperiment_1.20.1 ROCR_1.0-11                 tensor_1.5                 
 [26] listenv_0.9.0               MatrixGenerics_1.10.0       GenomeInfoDbData_1.2.9      polyclip_1.10-6             farver_2.1.1               
 [31] parallelly_1.36.0           vctrs_0.6.4                 generics_0.1.3              xfun_0.40                   timechange_0.2.0           
 [36] R6_2.5.1                    GenomeInfoDb_1.34.9         ggbeeswarm_0.7.2            graphlayouts_1.0.1          rsvd_1.0.5                 
 [41] locfit_1.5-9.8              miloR_1.9.1                 cachem_1.0.8                bitops_1.0-7                spatstat.utils_3.0-3       
 [46] DelayedArray_0.24.0         promises_1.2.1              scales_1.2.1                ggraph_2.1.0                beeswarm_0.4.0             
 [51] gtable_0.3.4                beachmat_2.14.2             globals_0.16.2              goftest_1.2-3               spam_2.10-0                
 [56] tidygraph_1.2.3             tidybulk_1.10.1             rlang_1.1.1                 splines_4.2.1               lazyeval_0.2.2             
 [61] spatstat.geom_3.2-7         broom_1.0.5                 yaml_2.3.7                  reshape2_1.4.4              abind_1.4-5                
 [66] backports_1.4.1             httpuv_1.6.11               tools_4.2.1                 SeuratData_0.2.2            ellipsis_0.3.2             
 [71] jquerylib_0.1.4             BiocGenerics_0.44.0         ggridges_0.5.4              plyr_1.8.9                  base64enc_0.1-3            
 [76] sparseMatrixStats_1.10.0    zlibbioc_1.44.0             RCurl_1.98-1.12             deldir_1.0-9                pbapply_1.7-2              
 [81] viridis_0.6.4               cowplot_1.1.1               S4Vectors_0.36.2            zoo_1.8-12                  SummarizedExperiment_1.28.0
 [86] ggrepel_0.9.4               cluster_2.1.3               magrittr_2.0.3              data.table_1.14.8           scattermore_1.2            
 [91] lmtest_0.9-40               RANN_2.6.1                  fitdistrplus_1.1-11         matrixStats_1.0.0           evaluate_0.22              
 [96] hms_1.1.3                   mime_0.12                   xtable_1.8-4                IRanges_2.32.0              gridExtra_2.3              
[101] compiler_4.2.1              scater_1.26.1               pbmc3k.SeuratData_3.1.4     crayon_1.5.2                KernSmooth_2.23-20         
[106] htmltools_0.5.6.1           later_1.3.1                 tzdb_0.4.0                  scclusteval_0.0.0.9000      tweenr_2.0.2               
[111] rappdirs_0.3.3              MASS_7.3-57                 Matrix_1.5-3                car_3.1-2                   cli_3.6.1                  
[116] arpr_0.1.2                  parallel_4.2.1              dotCall64_1.1-0             igraph_1.5.1                GenomicRanges_1.50.2       
[121] pkgconfig_2.0.3             job_0.3.0                   sp_2.1-1                    plotly_4.10.3               scuttle_1.8.4              
[126] spatstat.sparse_3.0-2       xml2_1.3.5                  bslib_0.5.1                 vipor_0.4.5                 XVector_0.38.0             
[131] digest_0.6.33               sctransform_0.4.1           RcppAnnoy_0.0.21            spatstat.data_3.0-1         rmarkdown_2.25             
[136] leiden_0.4.3                uwot_0.1.16                 edgeR_3.40.2                DelayedMatrixStats_1.20.0   shiny_1.7.5.1              
[141] gtools_3.9.4                lifecycle_1.0.3             nlme_3.1-157                jsonlite_1.8.7              carData_3.0-5              
[146] BiocNeighbors_1.16.0        viridisLite_0.4.2           fansi_1.0.5                 pillar_1.9.0                lattice_0.20-45            
[151] ggrastr_1.0.2               fastmap_1.1.1               httr_1.4.7                  survival_3.3-1              glue_1.6.2                 
[156] png_0.1-8                   sass_0.4.7                  ggforce_0.4.1               stringi_1.7.12              BiocSingular_1.14.0        
[161] irlba_2.3.5.1               future.apply_1.11.0 

Cheers!

stemangiola commented 10 months ago

@aodainic7 try to make RNA and ADT both feature, reshaping

... |>
tidyseurat::aggregate_cells(c(sample, !!as.symbol("manual_cluster")), slot = "counts", assays=c("RNA", "ADT")) |>
# Reshape to make RNA and ADT both features
        pivot_longer(
          cols = c(RNA, ADT),
          names_to = "data_source",
          values_to = "count"
        ) |>
        filter(!count |> is.na()) |>
       unite( ".feature", c(feature, data_source), remove = FALSE) |>

        # Covert
        tidybulk::as_SummarizedExperiment(
          .sample = .sample,
          .transcript = .feature,
          .abundance = count
        )