`tidySE` manage `to_tibble` if rownames not match, or are absent

sdgamboa commented 1 year ago

I used to be able to transform my tse into tidy format by calling tidySummarizedExperiment::as_tibble (don't remember the version), but it doesn't work anymore. Wondering if this could be a bug or just some formatting I need to do in my data. I'd appreciate any help. Thanks.

library(curatedMetagenomicData)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Warning: replacing previous import 'S4Arrays::read_block' by
#> 'DelayedArray::read_block' when loading 'SummarizedExperiment'
#> Loading required package: TreeSummarizedExperiment
#> Loading required package: SingleCellExperiment
#> Loading required package: Biostrings
#> Loading required package: XVector
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
library(tidySummarizedExperiment)
#> 
#> Attaching package: 'tidySummarizedExperiment'
#> The following object is masked from 'package:XVector':
#> 
#>     slice
#> The following object is masked from 'package:IRanges':
#> 
#>     slice
#> The following object is masked from 'package:S4Vectors':
#> 
#>     rename
#> The following object is masked from 'package:matrixStats':
#> 
#>     count
#> The following object is masked from 'package:stats':
#> 
#>     filter
dataset_name <- "HallAB_2017.relative_abundance"
tse <- curatedMetagenomicData(
    pattern = dataset_name, 
    dryrun = FALSE, rownames = 'NCBI',
    counts = TRUE
)[[1]]
#> 
#> $`2021-10-14.HallAB_2017.relative_abundance`
#> dropping rows without rowTree matches:
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Atopobiaceae|g__Olsenella|s__Olsenella_profusa
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Coriobacteriaceae|g__Collinsella|s__Collinsella_stercoris
#>   k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Carnobacteriaceae|g__Granulicatella|s__Granulicatella_elegans
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus|s__Ruminococcus_champanellensis
#>   k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Bulleidia|s__Bulleidia_extructa
#>   k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellaceae|g__Sutterella|s__Sutterella_parvirubra
#>   k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Cloacibacillus|s__Cloacibacillus_evryensis
tse 
#> class: TreeSummarizedExperiment 
#> dim: 503 259 
#> metadata(1): agglomerated_by_rank
#> assays(1): relative_abundance
#> rownames(503): 853 820 ... 172901 1262744
#> rowData names(7): superkingdom phylum ... genus species
#> colnames(259): p8582_mo1 p8582_mo10 ... SKST041_2_G103027
#>   SKST041_3_G103028
#> colData names(24): study_name subject_id ... HBI SCCAI
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (503 rows)
#> rowTree: 1 phylo tree(s) (10430 leaves)
#> colLinks: NULL
#> colTree: NULL
class(tse)
#> [1] "TreeSummarizedExperiment"
#> attr(,"package")
#> [1] "TreeSummarizedExperiment"
tidy_tse <- tidySummarizedExperiment::as_tibble(tse)
#> Error in `map2()`:
#> ℹ In index: 1.
#> ℹ With name: relative_abundance.
#> Caused by error in `.x[rownames(se), , drop = FALSE]`:
#> ! subscript out of bounds
#> Backtrace:
#>      ▆
#>   1. ├─tidySummarizedExperiment::as_tibble(tse)
#>   2. ├─tidySummarizedExperiment:::as_tibble.SummarizedExperiment(tse)
#>   3. │ └─tidySummarizedExperiment:::.as_tibble_optimised(...)
#>   4. │   └─tidySummarizedExperiment:::get_count_datasets(x)
#>   5. │     ├─... %>% ...
#>   6. │     └─purrr::map2(...)
#>   7. │       └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
#>   8. │         ├─purrr:::with_indexed_errors(...)
#>   9. │         │ └─base::withCallingHandlers(...)
#>  10. │         ├─purrr:::call_with_cleanup(...)
#>  11. │         └─tidySummarizedExperiment (local) .f(.x[[i]], .y[[i]], ...)
#>  12. ├─purrr::when(...)
#>  13. ├─purrr::when(...)
#>  14. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#>  15.   └─cli::cli_abort(...)
#>  16.     └─rlang::abort(...)
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.0 (2023-04-21)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-05-15
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                  * version   date (UTC) lib source
#>  AnnotationDbi              1.62.1    2023-05-02 [1] Bioconductor
#>  AnnotationHub              3.8.0     2023-04-25 [1] Bioconductor
#>  ape                        5.7-1     2023-03-13 [1] CRAN (R 4.3.0)
#>  beachmat                   2.16.0    2023-04-25 [1] Bioconductor
#>  beeswarm                   0.4.0     2021-06-01 [1] CRAN (R 4.3.0)
#>  Biobase                  * 2.60.0    2023-04-25 [1] Bioconductor
#>  BiocFileCache              2.8.0     2023-04-25 [1] Bioconductor
#>  BiocGenerics             * 0.46.0    2023-04-25 [1] Bioconductor
#>  BiocManager                1.30.20   2023-02-24 [1] CRAN (R 4.3.0)
#>  BiocNeighbors              1.18.0    2023-04-25 [1] Bioconductor
#>  BiocParallel               1.34.1    2023-05-05 [1] Bioconductor
#>  BiocSingular               1.16.0    2023-04-25 [1] Bioconductor
#>  BiocVersion                3.17.1    2022-11-04 [1] Bioconductor
#>  Biostrings               * 2.68.0    2023-04-25 [1] Bioconductor
#>  bit                        4.0.5     2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64                      4.0.5     2020-08-30 [1] CRAN (R 4.3.0)
#>  bitops                     1.0-7     2021-04-24 [1] CRAN (R 4.3.0)
#>  blob                       1.2.4     2023-03-17 [1] CRAN (R 4.3.0)
#>  cachem                     1.0.8     2023-05-01 [1] CRAN (R 4.3.0)
#>  cli                        3.6.1     2023-03-23 [1] CRAN (R 4.3.0)
#>  cluster                    2.1.4     2022-08-22 [2] CRAN (R 4.3.0)
#>  codetools                  0.2-19    2023-02-01 [2] CRAN (R 4.3.0)
#>  colorspace                 2.1-0     2023-01-23 [1] CRAN (R 4.3.0)
#>  crayon                     1.5.2     2022-09-29 [1] CRAN (R 4.3.0)
#>  curatedMetagenomicData   * 3.8.0     2023-04-27 [1] Bioconductor
#>  curl                       5.0.0     2023-01-12 [1] CRAN (R 4.3.0)
#>  data.table                 1.14.8    2023-02-17 [1] CRAN (R 4.3.0)
#>  DBI                        1.1.3     2022-06-18 [1] CRAN (R 4.3.0)
#>  dbplyr                     2.3.2     2023-03-21 [1] CRAN (R 4.3.0)
#>  DECIPHER                   2.28.0    2023-04-25 [1] Bioconductor
#>  decontam                   1.20.0    2023-04-25 [1] Bioconductor
#>  DelayedArray               0.26.2    2023-05-05 [1] Bioconductor
#>  DelayedMatrixStats         1.22.0    2023-04-25 [1] Bioconductor
#>  digest                     0.6.31    2022-12-11 [1] CRAN (R 4.3.0)
#>  DirichletMultinomial       1.42.0    2023-04-25 [1] Bioconductor
#>  dplyr                      1.1.2     2023-04-20 [1] CRAN (R 4.3.0)
#>  ellipsis                   0.3.2     2021-04-29 [1] CRAN (R 4.3.0)
#>  evaluate                   0.21      2023-05-05 [1] CRAN (R 4.3.0)
#>  ExperimentHub              2.8.0     2023-04-25 [1] Bioconductor
#>  fansi                      1.0.4     2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap                    1.1.1     2023-02-24 [1] CRAN (R 4.3.0)
#>  filelock                   1.0.2     2018-10-05 [1] CRAN (R 4.3.0)
#>  fs                         1.6.2     2023-04-25 [1] CRAN (R 4.3.0)
#>  generics                   0.1.3     2022-07-05 [1] CRAN (R 4.3.0)
#>  GenomeInfoDb             * 1.36.0    2023-04-25 [1] Bioconductor
#>  GenomeInfoDbData           1.2.10    2023-04-28 [1] Bioconductor
#>  GenomicRanges            * 1.52.0    2023-04-25 [1] Bioconductor
#>  ggbeeswarm                 0.7.2     2023-04-29 [1] CRAN (R 4.3.0)
#>  ggplot2                    3.4.2     2023-04-03 [1] CRAN (R 4.3.0)
#>  ggrepel                    0.9.3     2023-02-03 [1] CRAN (R 4.3.0)
#>  glue                       1.6.2     2022-02-24 [1] CRAN (R 4.3.0)
#>  gridExtra                  2.3       2017-09-09 [1] CRAN (R 4.3.0)
#>  gtable                     0.3.3     2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools                  0.5.5     2023-03-23 [1] CRAN (R 4.3.0)
#>  htmlwidgets                1.6.2     2023-03-17 [1] CRAN (R 4.3.0)
#>  httpuv                     1.6.11    2023-05-11 [1] CRAN (R 4.3.0)
#>  httr                       1.4.6     2023-05-08 [1] CRAN (R 4.3.0)
#>  interactiveDisplayBase     1.38.0    2023-04-25 [1] Bioconductor
#>  IRanges                  * 2.34.0    2023-04-25 [1] Bioconductor
#>  irlba                      2.3.5.1   2022-10-03 [1] CRAN (R 4.3.0)
#>  jsonlite                   1.8.4     2022-12-06 [1] CRAN (R 4.3.0)
#>  KEGGREST                   1.40.0    2023-04-25 [1] Bioconductor
#>  knitr                      1.42      2023-01-25 [1] CRAN (R 4.3.0)
#>  later                      1.3.1     2023-05-02 [1] CRAN (R 4.3.0)
#>  lattice                    0.21-8    2023-04-05 [2] CRAN (R 4.3.0)
#>  lazyeval                   0.2.2     2019-03-15 [1] CRAN (R 4.3.0)
#>  lifecycle                  1.0.3     2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr                   2.0.3     2022-03-30 [1] CRAN (R 4.3.0)
#>  MASS                       7.3-60    2023-05-04 [2] CRAN (R 4.3.0)
#>  Matrix                     1.5-4     2023-04-04 [2] CRAN (R 4.3.0)
#>  MatrixGenerics           * 1.12.0    2023-04-25 [1] Bioconductor
#>  matrixStats              * 0.63.0    2022-11-18 [1] CRAN (R 4.3.0)
#>  memoise                    2.0.1     2021-11-26 [1] CRAN (R 4.3.0)
#>  mgcv                       1.8-42    2023-03-02 [2] CRAN (R 4.3.0)
#>  mia                        1.8.0     2023-04-25 [1] Bioconductor
#>  mime                       0.12      2021-09-28 [1] CRAN (R 4.3.0)
#>  MultiAssayExperiment       1.26.0    2023-04-25 [1] Bioconductor
#>  munsell                    0.5.0     2018-06-12 [1] CRAN (R 4.3.0)
#>  nlme                       3.1-162   2023-01-31 [2] CRAN (R 4.3.0)
#>  permute                    0.9-7     2022-01-27 [1] CRAN (R 4.3.0)
#>  pillar                     1.9.0     2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig                  2.0.3     2019-09-22 [1] CRAN (R 4.3.0)
#>  plotly                     4.10.1    2022-11-07 [1] CRAN (R 4.3.0)
#>  plyr                       1.8.8     2022-11-11 [1] CRAN (R 4.3.0)
#>  png                        0.1-8     2022-11-29 [1] CRAN (R 4.3.0)
#>  promises                   1.2.0.1   2021-02-11 [1] CRAN (R 4.3.0)
#>  purrr                      1.0.1     2023-01-10 [1] CRAN (R 4.3.0)
#>  R6                         2.5.1     2021-08-19 [1] CRAN (R 4.3.0)
#>  rappdirs                   0.3.3     2021-01-31 [1] CRAN (R 4.3.0)
#>  Rcpp                       1.0.10    2023-01-22 [1] CRAN (R 4.3.0)
#>  RCurl                      1.98-1.12 2023-03-27 [1] CRAN (R 4.3.0)
#>  reprex                     2.0.2     2022-08-17 [1] CRAN (R 4.3.0)
#>  reshape2                   1.4.4     2020-04-09 [1] CRAN (R 4.3.0)
#>  rlang                      1.1.1     2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown                  2.21      2023-03-26 [1] CRAN (R 4.3.0)
#>  RSQLite                    2.3.1     2023-04-03 [1] CRAN (R 4.3.0)
#>  rstudioapi                 0.14      2022-08-22 [1] CRAN (R 4.3.0)
#>  rsvd                       1.0.5     2021-04-16 [1] CRAN (R 4.3.0)
#>  S4Arrays                   1.0.4     2023-05-14 [1] Bioconductor
#>  S4Vectors                * 0.38.1    2023-05-02 [1] Bioconductor
#>  ScaledMatrix               1.8.1     2023-05-03 [1] Bioconductor
#>  scales                     1.2.1     2022-08-20 [1] CRAN (R 4.3.0)
#>  scater                     1.28.0    2023-04-25 [1] Bioconductor
#>  scuttle                    1.10.1    2023-05-02 [1] Bioconductor
#>  sessioninfo                1.2.2     2021-12-06 [1] CRAN (R 4.3.0)
#>  shiny                      1.7.4     2022-12-15 [1] CRAN (R 4.3.0)
#>  SingleCellExperiment     * 1.22.0    2023-04-25 [1] Bioconductor
#>  sparseMatrixStats          1.12.0    2023-04-25 [1] Bioconductor
#>  stringi                    1.7.12    2023-01-11 [1] CRAN (R 4.3.0)
#>  stringr                    1.5.0     2022-12-02 [1] CRAN (R 4.3.0)
#>  SummarizedExperiment     * 1.30.1    2023-05-01 [1] Bioconductor
#>  tibble                     3.2.1     2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyr                      1.3.0     2023-01-24 [1] CRAN (R 4.3.0)
#>  tidyselect                 1.2.0     2022-10-10 [1] CRAN (R 4.3.0)
#>  tidySummarizedExperiment * 1.10.0    2023-04-25 [1] Bioconductor
#>  tidytree                   0.4.2     2022-12-18 [1] CRAN (R 4.3.0)
#>  treeio                     1.24.0    2023-04-25 [1] Bioconductor
#>  TreeSummarizedExperiment * 2.8.0     2023-04-25 [1] Bioconductor
#>  utf8                       1.2.3     2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs                      0.6.2     2023-04-19 [1] CRAN (R 4.3.0)
#>  vegan                      2.6-4     2022-10-11 [1] CRAN (R 4.3.0)
#>  vipor                      0.4.5     2017-03-22 [1] CRAN (R 4.3.0)
#>  viridis                    0.6.3     2023-05-03 [1] CRAN (R 4.3.0)
#>  viridisLite                0.4.2     2023-05-02 [1] CRAN (R 4.3.0)
#>  withr                      2.5.0     2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun                       0.39      2023-04-20 [1] CRAN (R 4.3.0)
#>  xtable                     1.8-4     2019-04-21 [1] CRAN (R 4.3.0)
#>  XVector                  * 0.40.0    2023-04-25 [1] Bioconductor
#>  yaml                       2.3.7     2023-01-23 [1] CRAN (R 4.3.0)
#>  yulab.utils                0.0.6     2022-12-20 [1] CRAN (R 4.3.0)
#>  zlibbioc                   1.46.0    2023-04-25 [1] Bioconductor
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-4.3.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

^{Created on 2023-05-15 with reprex v2.0.2}

csoneson commented 1 year ago

I think this is caused by mismatching rownames in the TSE and the assay itself:

> head(rownames(tse))
[1] "853"    "820"    "301301" "28117"  "357276" "39491" 
> head(rownames(assay(tse, "relative_abundance", withDimnames = FALSE)))
[1] "1239_186801_186802_216572_216851_853" 
[2] "976_200643_171549_815_816_820"        
[3] "1239_186801_186802_186803_841_301301" 
[4] "976_200643_171549_171550_239759_28117"
[5] "976_200643_171549_815_909656_357276"  
[6] "1239_186801_186802_186803_NA_39491"

which causes problems here.

stemangiola commented 1 year ago

Is it normal to o have row name mismatch? it sounds dangerous to me. We could avoid that rowname-dependent operation. or throw an informative error.

what do you folks think?

csoneson commented 1 year ago

I can't say whether it's normal 🙂 but indeed seems like unintentional things could happen. It seems to also be an issue if the SE does not have dimnames, while the assay does:

> se <- as(scuttle::mockSCE(), "SummarizedExperiment")[, 1:5]
> head(rownames(se))
NULL
> head(rownames(assay(se, "counts", withDimnames = FALSE)))
[1] "Gene_0001" "Gene_0002" "Gene_0003" "Gene_0004" "Gene_0005" "Gene_0006"

> tidySummarizedExperiment::as_tibble(se)
# A tibble: 0 × 6
# ℹ 6 variables: .feature <chr>, .sample <chr>, counts <dbl>,
#   Mutation_Status <chr>, Cell_Cycle <chr>, Treatment <chr>

> rownames(se) <- rownames(assay(se, "counts", withDimnames = FALSE))
> tidySummarizedExperiment::as_tibble(se)
# A tibble: 10,000 × 6
   .feature  .sample  counts Mutation_Status Cell_Cycle Treatment
   <chr>     <chr>     <dbl> <chr>           <chr>      <chr>    
 1 Gene_0001 Cell_001      0 positive        G0         treat2   
 2 Gene_0002 Cell_001     38 positive        G0         treat2   
 3 Gene_0003 Cell_001      0 positive        G0         treat2   
 4 Gene_0004 Cell_001     20 positive        G0         treat2   
 5 Gene_0005 Cell_001     28 positive        G0         treat2   
 6 Gene_0006 Cell_001     32 positive        G0         treat2   
 7 Gene_0007 Cell_001     62 positive        G0         treat2   
 8 Gene_0008 Cell_001      0 positive        G0         treat2   
 9 Gene_0009 Cell_001      4 positive        G0         treat2   
10 Gene_0010 Cell_001      0 positive        G0         treat2   
# ℹ 9,990 more rows
# ℹ Use `print(n = ...)` to see more rows

stemangiola commented 1 year ago

Maybe we should

through a warning, if assay has names but SE not, and give SE names based on the assay before converting to tibble
through error if everything has names but they don't overlap

Feedback and PRs are welcome!

csoneson commented 1 year ago

Sounds reasonable to me. I could give it a go (will most likely be when I'm back from holidays next week).

stemangiola commented 10 months ago

I see

"SummarizedExperiment::SummarizedExperiment(assays = my_assays %>% : the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the SummarizedExperiment object (or derivative) to construct"

@csoneson did you implemented this messaging? If so, could I please ask to use "tidySummarizedExperiment says: " when messages/warnings/errors are launched?

csoneson commented 10 months ago

@stemangiola Good question...I don't see this message anywhere in the tidy repositories; however, it looks like it may come from SummarizedExperiment:

https://github.com/Bioconductor/SummarizedExperiment/blob/9864c3601c9295f8e3a1e060fa8fe3e7f958be1d/R/RangedSummarizedExperiment-class.R#L164-L166

aodainic7 commented 10 months ago

Hey @csoneson, I have the following issue when trying to convert an aggregated experiment to Summarized experiment:

> CD8_adt.harmony_sct_WNN |> slice_sample(prop = 0.001) |> 
+   tidyseurat::aggregate_cells(c(sample, !!as.symbol("manual_cluster")), slot = "counts", assays=c("RNA", "ADT")) |>
+   tidybulk::as_SummarizedExperiment(.sample, .feature, any_of(c("RNA", "ADT")))
Error in SummarizedExperiment::SummarizedExperiment(assays = my_assays %>%  : 
  the rownames and colnames of the supplied assay(s) must be NULL or identical to those of
  the SummarizedExperiment object (or derivative) to construct

Even if I try it on the full dataset I get a similar error:

error in evaluating the argument '.data' in selecting a method for function 'as_SummarizedExperiment': nanny says: some of the .column specified do not exist in the input data frame.
Calls: .rs.sourceWithProgress ... same_src.data.frame -> is.data.frame -> %>% -> subset_tidyseurat

This is what my aggregate_cells output looks like:

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cbmc.SeuratData_3.1.4   rstatix_0.7.2           gt_0.10.0               limma_3.54.2            tidygate_0.4.9          progressr_0.14.0       
 [7] RColorBrewer_1.1-3      gitlabr_2.0.1.9000      harmony_1.0.3           Rcpp_1.0.11             glmGamPoi_1.10.2        tidyseurat_0.7.4       
[13] ttservice_0.3.8         patchwork_1.1.3         SeuratObject_4.9.9.9091 Seurat_4.4.0            lubridate_1.9.3         forcats_1.0.0          
[19] stringr_1.5.0           dplyr_1.1.3             purrr_1.0.2             readr_2.1.4             tidyr_1.3.0             tibble_3.2.1           
[25] ggplot2_3.4.4           tidyverse_2.0.0        

loaded via a namespace (and not attached):
  [1] utf8_1.2.4                  spatstat.explore_3.2-5      reticulate_1.34.0           tidyselect_1.2.0            htmlwidgets_1.6.2          
  [6] grid_4.2.1                  BiocParallel_1.32.6         Rtsne_0.16                  munsell_0.5.0               ScaledMatrix_1.6.0         
 [11] preprocessCore_1.60.2       codetools_0.2-18            ica_1.0-3                   future_1.33.0               miniUI_0.1.1.1             
 [16] withr_2.5.1                 spatstat.random_3.2-1       colorspace_2.1-0            Biobase_2.58.0              knitr_1.44                 
 [21] rstudioapi_0.15.0           stats4_4.2.1                SingleCellExperiment_1.20.1 ROCR_1.0-11                 tensor_1.5                 
 [26] listenv_0.9.0               MatrixGenerics_1.10.0       GenomeInfoDbData_1.2.9      polyclip_1.10-6             farver_2.1.1               
 [31] parallelly_1.36.0           vctrs_0.6.4                 generics_0.1.3              xfun_0.40                   timechange_0.2.0           
 [36] R6_2.5.1                    GenomeInfoDb_1.34.9         ggbeeswarm_0.7.2            graphlayouts_1.0.1          rsvd_1.0.5                 
 [41] locfit_1.5-9.8              miloR_1.9.1                 cachem_1.0.8                bitops_1.0-7                spatstat.utils_3.0-3       
 [46] DelayedArray_0.24.0         promises_1.2.1              scales_1.2.1                ggraph_2.1.0                beeswarm_0.4.0             
 [51] gtable_0.3.4                beachmat_2.14.2             globals_0.16.2              goftest_1.2-3               spam_2.10-0                
 [56] tidygraph_1.2.3             tidybulk_1.10.1             rlang_1.1.1                 splines_4.2.1               lazyeval_0.2.2             
 [61] spatstat.geom_3.2-7         broom_1.0.5                 yaml_2.3.7                  reshape2_1.4.4              abind_1.4-5                
 [66] backports_1.4.1             httpuv_1.6.11               tools_4.2.1                 SeuratData_0.2.2            ellipsis_0.3.2             
 [71] jquerylib_0.1.4             BiocGenerics_0.44.0         ggridges_0.5.4              plyr_1.8.9                  base64enc_0.1-3            
 [76] sparseMatrixStats_1.10.0    zlibbioc_1.44.0             RCurl_1.98-1.12             deldir_1.0-9                pbapply_1.7-2              
 [81] viridis_0.6.4               cowplot_1.1.1               S4Vectors_0.36.2            zoo_1.8-12                  SummarizedExperiment_1.28.0
 [86] ggrepel_0.9.4               cluster_2.1.3               magrittr_2.0.3              data.table_1.14.8           scattermore_1.2            
 [91] lmtest_0.9-40               RANN_2.6.1                  fitdistrplus_1.1-11         matrixStats_1.0.0           evaluate_0.22              
 [96] hms_1.1.3                   mime_0.12                   xtable_1.8-4                IRanges_2.32.0              gridExtra_2.3              
[101] compiler_4.2.1              scater_1.26.1               pbmc3k.SeuratData_3.1.4     crayon_1.5.2                KernSmooth_2.23-20         
[106] htmltools_0.5.6.1           later_1.3.1                 tzdb_0.4.0                  scclusteval_0.0.0.9000      tweenr_2.0.2               
[111] rappdirs_0.3.3              MASS_7.3-57                 Matrix_1.5-3                car_3.1-2                   cli_3.6.1                  
[116] arpr_0.1.2                  parallel_4.2.1              dotCall64_1.1-0             igraph_1.5.1                GenomicRanges_1.50.2       
[121] pkgconfig_2.0.3             job_0.3.0                   sp_2.1-1                    plotly_4.10.3               scuttle_1.8.4              
[126] spatstat.sparse_3.0-2       xml2_1.3.5                  bslib_0.5.1                 vipor_0.4.5                 XVector_0.38.0             
[131] digest_0.6.33               sctransform_0.4.1           RcppAnnoy_0.0.21            spatstat.data_3.0-1         rmarkdown_2.25             
[136] leiden_0.4.3                uwot_0.1.16                 edgeR_3.40.2                DelayedMatrixStats_1.20.0   shiny_1.7.5.1              
[141] gtools_3.9.4                lifecycle_1.0.3             nlme_3.1-157                jsonlite_1.8.7              carData_3.0-5              
[146] BiocNeighbors_1.16.0        viridisLite_0.4.2           fansi_1.0.5                 pillar_1.9.0                lattice_0.20-45            
[151] ggrastr_1.0.2               fastmap_1.1.1               httr_1.4.7                  survival_3.3-1              glue_1.6.2                 
[156] png_0.1-8                   sass_0.4.7                  ggforce_0.4.1               stringi_1.7.12              BiocSingular_1.14.0        
[161] irlba_2.3.5.1               future.apply_1.11.0

Cheers!

stemangiola commented 10 months ago

@aodainic7 try to make RNA and ADT both feature, reshaping

... |>
tidyseurat::aggregate_cells(c(sample, !!as.symbol("manual_cluster")), slot = "counts", assays=c("RNA", "ADT")) |>
# Reshape to make RNA and ADT both features
        pivot_longer(
          cols = c(RNA, ADT),
          names_to = "data_source",
          values_to = "count"
        ) |>
        filter(!count |> is.na()) |>
       unite( ".feature", c(feature, data_source), remove = FALSE) |>

        # Covert
        tidybulk::as_SummarizedExperiment(
          .sample = .sample,
          .transcript = .feature,
          .abundance = count
        )

stemangiola / tidySummarizedExperiment

`tidySE` manage `to_tibble` if rownames not match, or are absent #70