Closed ankushs0128 closed 1 year ago
Hey Ankush,
Number skewing in the clonalDiversity()
function is usually a result of the downsampling. The function will automatically identify the lowest number of clonotypes across all samples and then bootstrap the diversity estimates for that minimum number. If a sample has very low diversity or recovered reads. Is this maybe what is going on?
If not, maybe you can provide the version of scRepertoire you are using. What do you mean by subsetting 1000 cells?
Also if you are comfortable emailing me the exported table from the function call - that would help immensely when it comes to troubleshooting.
Thanks, Nick
Hi Nick,
Thanks again for all your work with this wonderful package. I am having a problem similar to Ankush's, and figured I'd add here (but if it's better for me to open a new issue, let me know).
I have a multiplexed experiment with samples from many patients, three different phenotypes (Sp, St, Ctrl) and three days (D0, D1, D3). All steps have worked, including generating a single Seurat object with all the clonotypic information attached. However, I attempted to compare clonal diversity for a subset of the data (eg. CD4 T cells at Day 0) but I’m getting Shannon and other indices all close to zero. As per below, I alternatively tried using either the Single Cell Object (cohort3_VDJCD4_D0) or the df from expression2List (ExpressionList_VDJCD4_D0_byPhenotype), but got the same results.
clonaldiversity_CD4D0 <- clonalDiversity(ExpressionList_VDJCD4_D0_byPhenotype.Patient, cloneCall = "strict", group.by = "Phenotype.Patient", x.axis = "Phenotype", exportTable = FALSE)
clonaldiversity_CD4D0 <- clonalDiversity(cohort3_VDJCD4_D0, cloneCall = "strict", split.by = "Phenotype.Patient", exportTable = FALSE)
There is an “NA” grouping generated when the plot is split by “Phenotype”, though I doublechecked that there are no NAs in the original dataset for that metadata slot. Could this be the issue? Also, if as you’ve noted above, this is the result of some samples with very low diversity – is there a strategy for dealing with that you’d recommend?
In addition, I’m getting the following error when I set exportTable = TRUE: Error in FUN(left, right) : non-numeric argument to binary operator
Your help is, as always, very much appreciated. My session info is below.
sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6
Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeuratData_0.2.2 SeuratDisk_0.0.0.9020 SeuratObject_4.1.3 Seurat_4.2.1
[5] data.table_1.14.4 cowplot_1.1.1 reshape2_1.4.4 dplyr_1.0.10
[9] knitr_1.40 hdf5r_1.3.7 tinytex_0.42 rmarkdown_2.18
[13] scales_1.2.1 circlize_0.4.15 devtools_2.4.5 usethis_2.1.6
[17] remotes_2.4.2 BiocManager_1.30.19 scRepertoire_1.6.0 ggplot2_3.4.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 spatstat.explore_3.0-5 reticulate_1.26
[4] tidyselect_1.2.0 htmlwidgets_1.5.4 grid_4.2.2
[7] Rtsne_0.16 munsell_0.5.0 codetools_0.2-18
[10] ica_1.0-3 future_1.29.0 miniUI_0.1.1.1
[13] withr_2.5.0 spatstat.random_3.0-1 colorspace_2.0-3
[16] progressr_0.11.0 Biobase_2.58.0 ggalluvial_0.12.3
[19] rstudioapi_0.14 stats4_4.2.2 SingleCellExperiment_1.20.0
[22] ROCR_1.0-11 tensor_1.5 listenv_0.8.0
[25] labeling_0.4.2 MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9
[28] polyclip_1.10-4 bit64_4.0.5 farver_2.1.1
[31] rprojroot_2.0.3 parallelly_1.32.1 vctrs_0.5.1
[34] generics_0.1.3 xfun_0.34 R6_2.5.1
[37] doParallel_1.0.17 GenomeInfoDb_1.34.3 graphlayouts_0.8.3
[40] VGAM_1.1-7 bitops_1.0-7 spatstat.utils_3.0-1
[43] cachem_1.0.6 DelayedArray_0.24.0 assertthat_0.2.1
[46] promises_1.2.0.1 ggraph_2.1.0 gtable_0.3.1
[49] globals_0.16.1 goftest_1.2-3 processx_3.8.0
[52] tidygraph_1.2.2 rlang_1.0.6 GlobalOptions_0.1.2
[55] splines_4.2.2 lazyeval_0.2.2 spatstat.geom_3.0-3
[58] abind_1.4-5 httpuv_1.6.6 tools_4.2.2
[61] cubature_2.0.4.5 ellipsis_0.3.2 RColorBrewer_1.1-3
[64] BiocGenerics_0.44.0 sessioninfo_1.2.2 clustree_0.5.0
[67] ggridges_0.5.4 Rcpp_1.0.9 plyr_1.8.8
[70] zlibbioc_1.44.0 purrr_0.3.5 RCurl_1.98-1.9
[73] ps_1.7.2 prettyunits_1.1.1 deldir_1.0-6
[76] pbapply_1.6-0 viridis_0.6.2 urlchecker_1.0.1
[79] S4Vectors_0.36.0 zoo_1.8-11 SummarizedExperiment_1.28.0
[82] ggrepel_0.9.2 cluster_2.1.4 fs_1.5.2
[85] magrittr_2.0.3 scattermore_0.8 SparseM_1.81
[88] lmtest_0.9-40 RANN_2.6.1 truncdist_1.0-2
[91] fitdistrplus_1.1-8 matrixStats_0.62.0 pkgload_1.3.2
[94] gsl_2.1-7.1 patchwork_1.1.2 mime_0.12
[97] evaluate_0.18 xtable_1.8-4 IRanges_2.32.0
[100] gridExtra_2.3 shape_1.4.6 compiler_4.2.2
[103] tibble_3.1.8 KernSmooth_2.23-20 crayon_1.5.2
[106] htmltools_0.5.3 mgcv_1.8-41 later_1.3.0
[109] tidyr_1.2.1 powerTCR_1.18.0 DBI_1.1.3
[112] tweenr_2.0.2 rappdirs_0.3.3 MASS_7.3-58.1
[115] Matrix_1.5-3 permute_0.9-7 cli_3.4.1
[118] parallel_4.2.2 evd_2.3-6.1 igraph_1.3.5
[121] GenomicRanges_1.50.1 pkgconfig_2.0.3 sp_1.5-1
[124] plotly_4.10.1 spatstat.sparse_3.0-0 foreach_1.5.2
[127] stringdist_0.9.10 XVector_0.38.0 stringr_1.4.1
[130] callr_3.7.3 digest_0.6.30 sctransform_0.3.5
[133] RcppAnnoy_0.0.20 vegan_2.6-4 spatstat.data_3.0-0
[136] leiden_0.4.3 uwot_0.1.14 curl_4.3.3
[139] evmix_2.12 shiny_1.7.3 lifecycle_1.0.3
[142] nlme_3.1-160 jsonlite_1.8.3 viridisLite_0.4.1
[145] fansi_1.0.3 pillar_1.8.1 lattice_0.20-45
[148] fastmap_1.1.0 httr_1.4.4 pkgbuild_1.3.1
[151] survival_3.4-0 glue_1.6.2 png_0.1-7
[154] iterators_1.0.14 bit_4.0.5 ggforce_0.4.1
[157] stringi_1.7.8 profvis_0.3.7 memoise_2.0.1
[160] irlba_2.3.5.1 future.apply_1.10.0
Hey Pierre,
Thanks for reaching out and excellent run down. Based on the error message you are getting, I am betting there is something internal that is causing the NA values. Would you mind emailing me the output of expression2List()
so I can troubleshoot the issue? Or a subsample of the data if you are more comfortable with that?
In terms of handling low numbers of clones - I think this is a persistent problem so I will add a parameter in the function to disable boot strapping (and I will push a commit when we figure out your issue).
Thanks, Nick
Thanks for the quick response Nick! Emailed you the output.
Best, Pierre
Hey Pierre,
Thanks so much! I will get back to you as soon as possible with a fix!
Nick
Hey Pierre,
I think I identified the issue and it is indeed due to low TCR per grouping variable. I am going to partially remove the outputs from the code below for concerns with privacy.
#Loading data and merging it all together
df <- readRDS("~/Documents/ExpressionList.Rds")
dat <- bind_rows(df)
#Stripped out version of expression2List()
#df is now a list grouped by your Phenotype.Patient variable
unique <- str_sort(as.character(unique(dat$Phenotype.Patient)), numeric = TRUE)
df <- NULL
for (i in seq_along(unique)) {
subset <- subset(dat, dat[,split.by] == unique[i])
subset <- subset(subset, !is.na(cloneType))
df[[i]] <- subset
}
output <- clonalDiversity(df, cloneCall = "strict", split.by = "Phenotype.Patient", exportTable = FALSE)
Tthe input to the clonalDiversity()
function (below) is now a list with the smallest repertoire = 4. So the clonalDiversity()
is now subsampling 4 TCRs per list element to calculate the diversity metrics.
This is why you are getting repetitive values and values close to 0 for the diversity functions.
I have added the new parameter for clonalDiversity(...,skip.boots = TRUE)
. When set to TRUE, this will skip the downsampling and bootstrapping of the function to allow for the calculation of diversity in your situation. However, just be cautious using the skip.boots parameter as large variances in repertoire size really has major influences on the diversity calculations.
You can install the version with this change, using:
devtools::install_github("ncborcherding/scRepertoire@dev")
Hope the helps and let me know if you have any additional problems (or suggestions).
Thanks, Nick
Thanks Nick , Best Regards Ankush Sharma
On Mon, Nov 21, 2022 at 4:21 PM theHumanBorch @.***> wrote:
Hey Pierre,
I think I identified the issue and it is indeed due to low TCR per grouping variable. I am going to partially remove the outputs from the code below for concerns with privacy.
Loading data and merging it all together
df <- readRDS("~/Documents/ExpressionList.Rds") dat <- bind_rows(df)
Stripped out version of expression2List()
df is now a list grouped by your Phenotype.Patient variable
unique <- str_sort(as.character(unique(dat$Phenotype.Patient)), numeric = TRUE) df <- NULL for (i in seq_along(unique)) { subset <- subset(dat, dat[,split.by] == unique[i]) subset <- subset(subset, !is.na(cloneType)) df[[i]] <- subset }
output <- clonalDiversity(df, cloneCall = "strict", split.by = "Phenotype.Patient", exportTable = FALSE)
Tthe input to the clonalDiversity() function (below) is now a list with the smallest repertoire = 4. So the clonalDiversity()is now subsampling 4 TCRs per list element to calculate the diversity metrics.
[image: Screen Shot 2022-11-21 at 9 10 52 AM] https://user-images.githubusercontent.com/22754118/203090057-f9887ab3-0f74-4318-be02-8a8ec9c081ad.png
This is why you are getting repetitive values and values close to 0 for the diversity functions.
I have added the new parameter for clonalDiversity(...,skip.boots = TRUE). When set to TRUE, this will skip the downsampling and bootstrapping of the function to allow for the calculation of diversity in your situation. However, just be cautious using the skip.boots parameter as large variances in repertoire size really has major influences on the diversity calculations.
You can install the version with this change, using:
@.***")
Hope the helps and let me know if you have any additional problems (or suggestions).
Thanks, Nick
— Reply to this email directly, view it on GitHub https://github.com/ncborcherding/scRepertoire/issues/203#issuecomment-1322224861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWCZGE32CACE7YC3LISXP3WJOHOJANCNFSM6AAAAAAR2WUKXY . You are receiving this because you authored the thread.Message ID: @.***>
Thanks very much Nick. That's super helpful!
Best, Pierre
Hi Nick,
I installed the dev version described above, but when I try to use the skip.boots parameter like this, I get the error below:
clonalDiversity(combined,
cloneCall = "aa",
group.by = "sample",
skip.boots = TRUE)
Error in `[<-`(`*tmp*`, , group.by, value = str_split(names(df), "[.]", :
subscript out of bounds
If I run clonalDiversity without the skip.boots=TRUE, it runs properly. Any idea as to what the issue is? Thanks!
Here is my session info:
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeuratObject_4.1.3 Seurat_4.3.0 scRepertoire_1.7.4 ggplot2_3.3.6
loaded via a namespace (and not attached):
[1] VGAM_1.1-7 plyr_1.8.7 igraph_1.3.5
[4] lazyeval_0.2.2 sp_1.5-0 splines_4.2.0
[7] powerTCR_1.18.0 listenv_0.9.0 scattermore_0.8
[10] usethis_2.1.6 GenomeInfoDb_1.34.0 digest_0.6.30
[13] foreach_1.5.2 htmltools_0.5.4 viridis_0.6.2
[16] ggalluvial_0.12.3 fansi_1.0.3 magrittr_2.0.3
[19] truncdist_1.0-2 memoise_2.0.1 tensor_1.5
[22] cluster_2.1.4 doParallel_1.0.17 ROCR_1.0-11
[25] remotes_2.4.2 globals_0.16.2 graphlayouts_0.8.3
[28] matrixStats_0.62.0 spatstat.sparse_3.0-0 prettyunits_1.1.1
[31] colorspace_2.0-3 ggrepel_0.9.1 dplyr_1.0.10
[34] jsonlite_1.8.3 callr_3.7.3 crayon_1.5.2
[37] RCurl_1.98-1.9 spatstat.data_3.0-0 progressr_0.13.0
[40] zoo_1.8-11 survival_3.4-0 iterators_1.0.14
[43] glue_1.6.2 polyclip_1.10-4 gtable_0.3.1
[46] zlibbioc_1.44.0 XVector_0.38.0 leiden_0.4.3
[49] DelayedArray_0.24.0 pkgbuild_1.4.0 evd_2.3-6.1
[52] future.apply_1.10.0 SingleCellExperiment_1.20.0 BiocGenerics_0.44.0
[55] abind_1.4-5 SparseM_1.81 scales_1.2.1
[58] DBI_1.1.3 spatstat.random_3.0-1 miniUI_0.1.1.1
[61] Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4
[64] reticulate_1.27 stats4_4.2.0 profvis_0.3.7
[67] htmlwidgets_1.6.1 httr_1.4.4 RColorBrewer_1.1-3
[70] ellipsis_0.3.2 ica_1.0-3 urlchecker_1.0.1
[73] pkgconfig_2.0.3 farver_2.1.1 uwot_0.1.14
[76] deldir_1.0-6 utf8_1.2.2 labeling_0.4.2
[79] tidyselect_1.2.0 rlang_1.0.6 reshape2_1.4.4
[82] later_1.3.0 munsell_0.5.0 tools_4.2.0
[85] cachem_1.0.6 cli_3.4.1 generics_0.1.3
[88] devtools_2.4.5 ggridges_0.5.4 stringr_1.4.1
[91] fastmap_1.1.0 goftest_1.2-3 evmix_2.12
[94] processx_3.8.0 fs_1.5.2 fitdistrplus_1.1-8
[97] tidygraph_1.2.2 purrr_0.3.5 RANN_2.6.1
[100] ggraph_2.1.0 pbapply_1.6-0 future_1.30.0
[103] nlme_3.1-160 mime_0.12 compiler_4.2.0
[106] rstudioapi_0.14 plotly_4.10.1 png_0.1-7
[109] spatstat.utils_3.0-1 tibble_3.1.8 tweenr_2.0.2
[112] stringi_1.7.8 gsl_2.1-7.1 ps_1.7.2
[115] cubature_2.0.4.5 lattice_0.20-45 Matrix_1.5-1
[118] vegan_2.6-4 permute_0.9-7 vctrs_0.5.0
[121] stringdist_0.9.10 pillar_1.8.1 lifecycle_1.0.3
[124] spatstat.geom_3.0-3 lmtest_0.9-40 RcppAnnoy_0.0.20
[127] data.table_1.14.4 irlba_2.3.5.1 cowplot_1.1.1
[130] bitops_1.0-7 patchwork_1.1.2 httpuv_1.6.7
[133] GenomicRanges_1.50.0 R6_2.5.1 promises_1.2.0.1
[136] KernSmooth_2.23-20 gridExtra_2.3 IRanges_2.32.0
[139] parallelly_1.33.0 sessioninfo_1.2.2 codetools_0.2-18
[142] MASS_7.3-58.1 assertthat_0.2.1 pkgload_1.3.2
[145] SummarizedExperiment_1.28.0 withr_2.5.0 sctransform_0.3.5
[148] S4Vectors_0.36.0 GenomeInfoDbData_1.2.9 mgcv_1.8-41
[151] parallel_4.2.0 grid_4.2.0 tidyr_1.2.1
[154] MatrixGenerics_1.10.0 Rtsne_0.16 spatstat.explore_3.0-5
[157] ggforce_0.4.1 Biobase_2.58.0 shiny_1.7.4
@sloanlewis
Thanks for finding a bug - will take a look and update you this week.
Nick
@sloanlewis
Found the bug, tested fix and committed in the most recent dev version.
Nick
That worked, thanks!
Hi
I'm still getting an error when trying to set skip.boots=TRUE
with v1.7.4 development version, commit c3d3ddd5. any ideas?
@bobermayer
Did you restart your R session after the update?
Nick
ah, that fixed it. thanks a lot! looks like with skip.boots=TRUE
the clonalDiversity
function gives otherwise identical results to what I had a while ago in version 1.1.2, this is very helpful
Hi, we are experiencing issues with the clonalDiversity function on our scTRCseq/scRNAseq dataset. We have 9 samples, and 3 of them seem to be computing erroneous results, ie the Shannon index is around 0, and we know that the samples have polyclonal populations with little expansion.
Any pointers or information to rectify the error would be greatly appreciated.
Thanks in advance Ankush