Error in SketchData when analyzing CyTOF data

juicejulia commented 6 months ago

Hello, Thank you for developing and continuing to improve the wonderful Seurat package.

We are trying to use the sketch integration method on our CyTOF data. We have ~ 40 million cells from 47 donors, with 33 antibody channels. We got the following error when calling SketchData:

CyTOF_combined <- SketchData(object = CyTOF_combined, ncells = 500, method = "LeverageScore", sketched.assay = "sketch")

Calcuating Leverage Score Error in irlba(A = object, nv = 50, nu = 0, verbose = FALSE) : max(nu, nv) must be strictly less than min(nrow(A), ncol(A))

I think this error comes from that fact that we only have 33 dimensions. But I don't know whether there is a way to change the default value of the irlba internal function? In addition, what was confusing is that we were able to run the whole dataset a couple of months ago with the beta release of Seurat V5. Not sure what is changed. Below is my session info: R version 4.2.3 (2023-03-15) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 18.04.6 LTS

Matrix products: default BLAS/LAPACK: /home/jwang/anaconda3/envs/r_4.3.0/lib/libopenblasp-r0.3.26.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SeuratWrappers_0.3.19 Azimuth_0.5.0 shinyBS_0.61.1
[4] patchwork_1.2.0 ggrepel_0.9.5 ggplot2_3.4.4
[7] dplyr_1.1.4 BPCells_0.1.0 Seurat_5.0.2
[10] SeuratObject_5.0.1 sp_2.1-3

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 rtracklayer_1.58.0
[3] scattermore_1.2 R.methodsS3_1.8.2
[5] tidyr_1.3.1 JASPAR2020_0.99.10
[7] bit64_4.0.5 irlba_2.3.5.1
[9] DelayedArray_0.24.0 R.utils_2.12.3
[11] data.table_1.15.0 KEGGREST_1.38.0
[13] TFBSTools_1.36.0 RCurl_1.98-1.14
[15] AnnotationFilter_1.22.0 generics_0.1.3
[17] BiocGenerics_0.44.0 GenomicFeatures_1.50.4
[19] callr_3.7.3 cowplot_1.1.3
[21] usethis_2.2.2 RSQLite_2.3.5
[23] RANN_2.6.1 future_1.33.1
[25] bit_4.0.5 tzdb_0.4.0
[27] spatstat.data_3.0-4 xml2_1.3.6
[29] httpuv_1.6.14 SummarizedExperiment_1.28.0
[31] DirichletMultinomial_1.40.0 gargle_1.5.2
[33] hms_1.1.3 promises_1.2.1
[35] fansi_1.0.6 restfulr_0.0.15
[37] progress_1.2.3 caTools_1.18.2
[39] dbplyr_2.4.0 igraph_1.5.1
[41] DBI_1.2.1 htmlwidgets_1.6.4
[43] spatstat.geom_3.2-8 googledrive_2.1.1
[45] stats4_4.2.3 purrr_1.0.2
[47] ellipsis_0.3.2 RSpectra_0.16-1
[49] annotate_1.76.0 biomaRt_2.54.1
[51] deldir_2.0-2 MatrixGenerics_1.10.0
[53] vctrs_0.6.5 Biobase_2.58.0
[55] remotes_2.4.2.1 SeuratDisk_0.0.0.9021
[57] ensembldb_2.22.0 ROCR_1.0-11
[59] abind_1.4-5 cachem_1.0.8
[61] withr_3.0.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5 [63] BSgenome_1.66.3 progressr_0.14.0
[65] presto_1.0.0 sctransform_0.4.1
[67] GenomicAlignments_1.34.1 prettyunits_1.2.0
[69] goftest_1.2-3 cluster_2.1.6
[71] dotCall64_1.1-1 lazyeval_0.2.2
[73] seqLogo_1.64.0 crayon_1.5.2
[75] hdf5r_1.3.9 spatstat.explore_3.2-5
[77] pkgconfig_2.0.3 GenomeInfoDb_1.34.9
[79] pkgload_1.3.4 nlme_3.1-164
[81] ProtGenerics_1.30.0 devtools_2.4.5
[83] rlang_1.1.3 globals_0.16.2
[85] lifecycle_1.0.4 miniUI_0.1.1.1
[87] filelock_1.0.3 fastDummies_1.7.3
[89] BiocFileCache_2.6.1 rsvd_1.0.5
[91] SeuratData_0.2.2.9001 cellranger_1.1.0
[93] polyclip_1.10-6 RcppHNSW_0.5.0
[95] matrixStats_1.1.0 lmtest_0.9-40
[97] Matrix_1.6-5 Rhdf5lib_1.20.0
[99] zoo_1.8-12 processx_3.8.3
[101] ggridges_0.5.6 googlesheets4_1.1.1
[103] png_0.1-8 viridisLite_0.4.2
[105] rjson_0.2.21 bitops_1.0-7
[107] shinydashboard_0.7.2 R.oo_1.26.0
[109] KernSmooth_2.23-22 spam_2.10-0
[111] rhdf5filters_1.10.1 Biostrings_2.66.0
[113] blob_1.2.4 stringr_1.5.1
[115] parallelly_1.36.0 spatstat.random_3.2-2
[117] readr_2.1.5 S4Vectors_0.36.2
[119] CNEr_1.34.0 scales_1.3.0
[121] memoise_2.0.1 magrittr_2.0.3
[123] plyr_1.8.9 ica_1.0-3
[125] zlibbioc_1.44.0 compiler_4.2.3
[127] BiocIO_1.8.0 RColorBrewer_1.1-3
[129] fitdistrplus_1.1-11 Rsamtools_2.14.0
[131] cli_3.6.2 urlchecker_1.0.1
[133] XVector_0.38.0 listenv_0.9.1
[135] ps_1.7.6 pbapply_1.7-2
[137] MASS_7.3-60.0.1 tidyselect_1.2.0
[139] stringi_1.8.3 yaml_2.3.8
[141] grid_4.2.3 fastmatch_1.1-4
[143] EnsDb.Hsapiens.v86_2.99.0 tools_4.2.3
[145] future.apply_1.11.1 parallel_4.2.3
[147] TFMPvalue_0.0.9 gridExtra_2.3
[149] Rtsne_0.17 BiocManager_1.30.22
[151] digest_0.6.34 shiny_1.8.0
[153] pracma_2.4.4 Rcpp_1.0.12
[155] GenomicRanges_1.50.2 later_1.3.2
[157] RcppAnnoy_0.0.22 httr_1.4.7
[159] AnnotationDbi_1.60.2 colorspace_2.1-0
[161] XML_3.99-0.16.1 fs_1.6.3
[163] tensor_1.5 reticulate_1.34.0
[165] IRanges_2.32.0 splines_4.2.3
[167] uwot_0.1.16 RcppRoll_0.3.0
[169] spatstat.utils_3.0-4 sessioninfo_1.2.2
[171] plotly_4.10.4 xtable_1.8-4
[173] jsonlite_1.8.8 poweRlaw_0.80.0
[175] R6_2.5.1 profvis_0.3.8
[177] pillar_1.9.0 htmltools_0.5.7
[179] mime_0.12 glue_1.7.0
[181] fastmap_1.1.1 DT_0.31
[183] BiocParallel_1.32.6 codetools_0.2-19
[185] pkgbuild_1.4.3 Signac_1.12.0
[187] utf8_1.2.4 lattice_0.22-5
[189] spatstat.sparse_3.0-3 tibble_3.2.1
[191] curl_5.2.0 leiden_0.4.3.1
[193] gtools_3.9.5 shinyjs_2.1.0
[195] GO.db_3.16.0 survival_3.5-7
[197] desc_1.4.3 munsell_0.5.0
[199] rhdf5_2.42.1 GenomeInfoDbData_1.2.9
[201] reshape2_1.4.4 gtable_0.3.4

juicejulia commented 6 months ago

An update that the error seems to originate from some of the samples had way lower numbers of cells ( < 5,000 cells) compared with the rest of the samples (> 50,000 cells). Deleting those low cell number samples results in successfully sketching and downstream steps. Not exactly sure why...

igrabski commented 2 months ago

Hi Julia, by default, SketchData has a parameter ncells set to 5,000 -- since you have samples with fewer than 5,000 cells, that is likely the issue. I would recommend trying with ncells set to a lower value, depending on the size of your data.

satijalab / seurat

Error in SketchData when analyzing CyTOF data #8596