Closed sdgamboa closed 1 year ago
I think this is caused by mismatching rownames in the TSE and the assay itself:
> head(rownames(tse))
[1] "853" "820" "301301" "28117" "357276" "39491"
> head(rownames(assay(tse, "relative_abundance", withDimnames = FALSE)))
[1] "1239_186801_186802_216572_216851_853"
[2] "976_200643_171549_815_816_820"
[3] "1239_186801_186802_186803_841_301301"
[4] "976_200643_171549_171550_239759_28117"
[5] "976_200643_171549_815_909656_357276"
[6] "1239_186801_186802_186803_NA_39491"
which causes problems here.
Is it normal to o have row name mismatch? it sounds dangerous to me. We could avoid that rowname-dependent operation. or throw an informative error.
what do you folks think?
I can't say whether it's normal 🙂 but indeed seems like unintentional things could happen. It seems to also be an issue if the SE does not have dimnames, while the assay does:
> se <- as(scuttle::mockSCE(), "SummarizedExperiment")[, 1:5]
> head(rownames(se))
NULL
> head(rownames(assay(se, "counts", withDimnames = FALSE)))
[1] "Gene_0001" "Gene_0002" "Gene_0003" "Gene_0004" "Gene_0005" "Gene_0006"
> tidySummarizedExperiment::as_tibble(se)
# A tibble: 0 × 6
# ℹ 6 variables: .feature <chr>, .sample <chr>, counts <dbl>,
# Mutation_Status <chr>, Cell_Cycle <chr>, Treatment <chr>
> rownames(se) <- rownames(assay(se, "counts", withDimnames = FALSE))
> tidySummarizedExperiment::as_tibble(se)
# A tibble: 10,000 × 6
.feature .sample counts Mutation_Status Cell_Cycle Treatment
<chr> <chr> <dbl> <chr> <chr> <chr>
1 Gene_0001 Cell_001 0 positive G0 treat2
2 Gene_0002 Cell_001 38 positive G0 treat2
3 Gene_0003 Cell_001 0 positive G0 treat2
4 Gene_0004 Cell_001 20 positive G0 treat2
5 Gene_0005 Cell_001 28 positive G0 treat2
6 Gene_0006 Cell_001 32 positive G0 treat2
7 Gene_0007 Cell_001 62 positive G0 treat2
8 Gene_0008 Cell_001 0 positive G0 treat2
9 Gene_0009 Cell_001 4 positive G0 treat2
10 Gene_0010 Cell_001 0 positive G0 treat2
# ℹ 9,990 more rows
# ℹ Use `print(n = ...)` to see more rows
Maybe we should
Feedback and PRs are welcome!
Sounds reasonable to me. I could give it a go (will most likely be when I'm back from holidays next week).
I see
"SummarizedExperiment::SummarizedExperiment(assays = my_assays %>% : the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the SummarizedExperiment object (or derivative) to construct"
@csoneson did you implemented this messaging? If so, could I please ask to use "tidySummarizedExperiment says: " when messages/warnings/errors are launched?
@stemangiola Good question...I don't see this message anywhere in the tidy repositories; however, it looks like it may come from SummarizedExperiment
:
Hey @csoneson, I have the following issue when trying to convert an aggregated experiment to Summarized experiment:
> CD8_adt.harmony_sct_WNN |> slice_sample(prop = 0.001) |>
+ tidyseurat::aggregate_cells(c(sample, !!as.symbol("manual_cluster")), slot = "counts", assays=c("RNA", "ADT")) |>
+ tidybulk::as_SummarizedExperiment(.sample, .feature, any_of(c("RNA", "ADT")))
Error in SummarizedExperiment::SummarizedExperiment(assays = my_assays %>% :
the rownames and colnames of the supplied assay(s) must be NULL or identical to those of
the SummarizedExperiment object (or derivative) to construct
Even if I try it on the full dataset I get a similar error:
error in evaluating the argument '.data' in selecting a method for function 'as_SummarizedExperiment': nanny says: some of the .column specified do not exist in the input data frame.
Calls: .rs.sourceWithProgress ... same_src.data.frame -> is.data.frame -> %>% -> subset_tidyseurat
This is what my aggregate_cells output looks like:
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cbmc.SeuratData_3.1.4 rstatix_0.7.2 gt_0.10.0 limma_3.54.2 tidygate_0.4.9 progressr_0.14.0
[7] RColorBrewer_1.1-3 gitlabr_2.0.1.9000 harmony_1.0.3 Rcpp_1.0.11 glmGamPoi_1.10.2 tidyseurat_0.7.4
[13] ttservice_0.3.8 patchwork_1.1.3 SeuratObject_4.9.9.9091 Seurat_4.4.0 lubridate_1.9.3 forcats_1.0.0
[19] stringr_1.5.0 dplyr_1.1.3 purrr_1.0.2 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[25] ggplot2_3.4.4 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 spatstat.explore_3.2-5 reticulate_1.34.0 tidyselect_1.2.0 htmlwidgets_1.6.2
[6] grid_4.2.1 BiocParallel_1.32.6 Rtsne_0.16 munsell_0.5.0 ScaledMatrix_1.6.0
[11] preprocessCore_1.60.2 codetools_0.2-18 ica_1.0-3 future_1.33.0 miniUI_0.1.1.1
[16] withr_2.5.1 spatstat.random_3.2-1 colorspace_2.1-0 Biobase_2.58.0 knitr_1.44
[21] rstudioapi_0.15.0 stats4_4.2.1 SingleCellExperiment_1.20.1 ROCR_1.0-11 tensor_1.5
[26] listenv_0.9.0 MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9 polyclip_1.10-6 farver_2.1.1
[31] parallelly_1.36.0 vctrs_0.6.4 generics_0.1.3 xfun_0.40 timechange_0.2.0
[36] R6_2.5.1 GenomeInfoDb_1.34.9 ggbeeswarm_0.7.2 graphlayouts_1.0.1 rsvd_1.0.5
[41] locfit_1.5-9.8 miloR_1.9.1 cachem_1.0.8 bitops_1.0-7 spatstat.utils_3.0-3
[46] DelayedArray_0.24.0 promises_1.2.1 scales_1.2.1 ggraph_2.1.0 beeswarm_0.4.0
[51] gtable_0.3.4 beachmat_2.14.2 globals_0.16.2 goftest_1.2-3 spam_2.10-0
[56] tidygraph_1.2.3 tidybulk_1.10.1 rlang_1.1.1 splines_4.2.1 lazyeval_0.2.2
[61] spatstat.geom_3.2-7 broom_1.0.5 yaml_2.3.7 reshape2_1.4.4 abind_1.4-5
[66] backports_1.4.1 httpuv_1.6.11 tools_4.2.1 SeuratData_0.2.2 ellipsis_0.3.2
[71] jquerylib_0.1.4 BiocGenerics_0.44.0 ggridges_0.5.4 plyr_1.8.9 base64enc_0.1-3
[76] sparseMatrixStats_1.10.0 zlibbioc_1.44.0 RCurl_1.98-1.12 deldir_1.0-9 pbapply_1.7-2
[81] viridis_0.6.4 cowplot_1.1.1 S4Vectors_0.36.2 zoo_1.8-12 SummarizedExperiment_1.28.0
[86] ggrepel_0.9.4 cluster_2.1.3 magrittr_2.0.3 data.table_1.14.8 scattermore_1.2
[91] lmtest_0.9-40 RANN_2.6.1 fitdistrplus_1.1-11 matrixStats_1.0.0 evaluate_0.22
[96] hms_1.1.3 mime_0.12 xtable_1.8-4 IRanges_2.32.0 gridExtra_2.3
[101] compiler_4.2.1 scater_1.26.1 pbmc3k.SeuratData_3.1.4 crayon_1.5.2 KernSmooth_2.23-20
[106] htmltools_0.5.6.1 later_1.3.1 tzdb_0.4.0 scclusteval_0.0.0.9000 tweenr_2.0.2
[111] rappdirs_0.3.3 MASS_7.3-57 Matrix_1.5-3 car_3.1-2 cli_3.6.1
[116] arpr_0.1.2 parallel_4.2.1 dotCall64_1.1-0 igraph_1.5.1 GenomicRanges_1.50.2
[121] pkgconfig_2.0.3 job_0.3.0 sp_2.1-1 plotly_4.10.3 scuttle_1.8.4
[126] spatstat.sparse_3.0-2 xml2_1.3.5 bslib_0.5.1 vipor_0.4.5 XVector_0.38.0
[131] digest_0.6.33 sctransform_0.4.1 RcppAnnoy_0.0.21 spatstat.data_3.0-1 rmarkdown_2.25
[136] leiden_0.4.3 uwot_0.1.16 edgeR_3.40.2 DelayedMatrixStats_1.20.0 shiny_1.7.5.1
[141] gtools_3.9.4 lifecycle_1.0.3 nlme_3.1-157 jsonlite_1.8.7 carData_3.0-5
[146] BiocNeighbors_1.16.0 viridisLite_0.4.2 fansi_1.0.5 pillar_1.9.0 lattice_0.20-45
[151] ggrastr_1.0.2 fastmap_1.1.1 httr_1.4.7 survival_3.3-1 glue_1.6.2
[156] png_0.1-8 sass_0.4.7 ggforce_0.4.1 stringi_1.7.12 BiocSingular_1.14.0
[161] irlba_2.3.5.1 future.apply_1.11.0
Cheers!
@aodainic7 try to make RNA and ADT both feature, reshaping
... |>
tidyseurat::aggregate_cells(c(sample, !!as.symbol("manual_cluster")), slot = "counts", assays=c("RNA", "ADT")) |>
# Reshape to make RNA and ADT both features
pivot_longer(
cols = c(RNA, ADT),
names_to = "data_source",
values_to = "count"
) |>
filter(!count |> is.na()) |>
unite( ".feature", c(feature, data_source), remove = FALSE) |>
# Covert
tidybulk::as_SummarizedExperiment(
.sample = .sample,
.transcript = .feature,
.abundance = count
)
I used to be able to transform my tse into tidy format by calling
tidySummarizedExperiment::as_tibble
(don't remember the version), but it doesn't work anymore. Wondering if this could be a bug or just some formatting I need to do in my data. I'd appreciate any help. Thanks.Created on 2023-05-15 with reprex v2.0.2