ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
301 stars 49 forks source link

Clonal Diversity for Randomly Subsetting 1000 cells #203

Closed ankushs0128 closed 1 year ago

ankushs0128 commented 1 year ago

Hi, we are experiencing issues with the clonalDiversity function on our scTRCseq/scRNAseq dataset. We have 9 samples, and 3 of them seem to be computing erroneous results, ie the Shannon index is around 0, and we know that the samples have polyclonal populations with little expansion.

D1_DIV  <- combineTCR(D1_Div, samples = c("ptg1","ptg2","ptg3"), ID = c("d1","d2","d3"), cells ="T-AB")
D1_DIV$ptg1_d1$barcode <- gsub("_d1","", D1_DIV$ptg1_d1$barcode)
D1_DIV$ptg2_d2$barcode <- gsub("_d2","", D1_DIV$ptg2_d2$barcode)
D1_DIV$ptg3_d3$barcode <- gsub("_d3","", D1_DIV$ptg3_d3$barcode)
D1_Comb <– combineExpression(D1_DIV, integrated_seurat, cloneCall="CTstrict", proportion = FALSE,  cloneTypes=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))
DimPlot(D1_Comb , group.by = "cloneType", na.value="darkgrey",pt.size = 0.8, cols = c("#D6476B", "#1F6272", "#D3D3D3", "#C29947","#265473"), shuffle=TRUE) 
si_d1_comb<-clonalDiversity((D1_Comb),
cloneCall = "CTstrict", 
exportTable = TRUE, split.by= "Celltypes")
write.table(si_d1_comb,"D1_clonal_shanondiversity.txt", sep= "\t", row.names = FALSE,quote= FALSE)
D1.subsampled <- D1_Comb[, sample(colnames(D1_Comb), size =1000, replace=F)]
si_d1_comb_random<-clonalDiversity((D1.subsampled),
cloneCall = "CTstrict", 
exportTable = TRUE, split.by= "Celltypes")
write.table(si_d1_comb_random,"_Random_D1_Clonal_ShanonDiversity.txt", sep= "\t", row.names = FALSE,quote= FALSE)

Any pointers or information to rectify the error would be greatly appreciated.

Thanks in advance Ankush

ncborcherding commented 1 year ago

Hey Ankush,

Number skewing in the clonalDiversity() function is usually a result of the downsampling. The function will automatically identify the lowest number of clonotypes across all samples and then bootstrap the diversity estimates for that minimum number. If a sample has very low diversity or recovered reads. Is this maybe what is going on?

If not, maybe you can provide the version of scRepertoire you are using. What do you mean by subsetting 1000 cells?

Also if you are comfortable emailing me the exported table from the function call - that would help immensely when it comes to troubleshooting.

Thanks, Nick

pankomah commented 1 year ago

Hi Nick,

Thanks again for all your work with this wonderful package. I am having a problem similar to Ankush's, and figured I'd add here (but if it's better for me to open a new issue, let me know).

I have a multiplexed experiment with samples from many patients, three different phenotypes (Sp, St, Ctrl) and three days (D0, D1, D3). All steps have worked, including generating a single Seurat object with all the clonotypic information attached. However, I attempted to compare clonal diversity for a subset of the data (eg. CD4 T cells at Day 0) but I’m getting Shannon and other indices all close to zero. As per below, I alternatively tried using either the Single Cell Object (cohort3_VDJCD4_D0) or the df from expression2List (ExpressionList_VDJCD4_D0_byPhenotype), but got the same results.

clonaldiversity_CD4D0 <- clonalDiversity(ExpressionList_VDJCD4_D0_byPhenotype.Patient, cloneCall = "strict", group.by = "Phenotype.Patient", x.axis = "Phenotype", exportTable = FALSE)

clonaldiversity_CD4D0 <- clonalDiversity(cohort3_VDJCD4_D0, cloneCall = "strict", split.by = "Phenotype.Patient", exportTable = FALSE)

There is an “NA” grouping generated when the plot is split by “Phenotype”, though I doublechecked that there are no NAs in the original dataset for that metadata slot. Could this be the issue? Also, if as you’ve noted above, this is the result of some samples with very low diversity – is there a strategy for dealing with that you’d recommend?

In addition, I’m getting the following error when I set exportTable = TRUE: Error in FUN(left, right) : non-numeric argument to binary operator

Your help is, as always, very much appreciated. My session info is below.

sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SeuratData_0.2.2 SeuratDisk_0.0.0.9020 SeuratObject_4.1.3 Seurat_4.2.1
[5] data.table_1.14.4 cowplot_1.1.1 reshape2_1.4.4 dplyr_1.0.10
[9] knitr_1.40 hdf5r_1.3.7 tinytex_0.42 rmarkdown_2.18
[13] scales_1.2.1 circlize_0.4.15 devtools_2.4.5 usethis_2.1.6
[17] remotes_2.4.2 BiocManager_1.30.19 scRepertoire_1.6.0 ggplot2_3.4.0

loaded via a namespace (and not attached): [1] utf8_1.2.2 spatstat.explore_3.0-5 reticulate_1.26
[4] tidyselect_1.2.0 htmlwidgets_1.5.4 grid_4.2.2
[7] Rtsne_0.16 munsell_0.5.0 codetools_0.2-18
[10] ica_1.0-3 future_1.29.0 miniUI_0.1.1.1
[13] withr_2.5.0 spatstat.random_3.0-1 colorspace_2.0-3
[16] progressr_0.11.0 Biobase_2.58.0 ggalluvial_0.12.3
[19] rstudioapi_0.14 stats4_4.2.2 SingleCellExperiment_1.20.0 [22] ROCR_1.0-11 tensor_1.5 listenv_0.8.0
[25] labeling_0.4.2 MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9
[28] polyclip_1.10-4 bit64_4.0.5 farver_2.1.1
[31] rprojroot_2.0.3 parallelly_1.32.1 vctrs_0.5.1
[34] generics_0.1.3 xfun_0.34 R6_2.5.1
[37] doParallel_1.0.17 GenomeInfoDb_1.34.3 graphlayouts_0.8.3
[40] VGAM_1.1-7 bitops_1.0-7 spatstat.utils_3.0-1
[43] cachem_1.0.6 DelayedArray_0.24.0 assertthat_0.2.1
[46] promises_1.2.0.1 ggraph_2.1.0 gtable_0.3.1
[49] globals_0.16.1 goftest_1.2-3 processx_3.8.0
[52] tidygraph_1.2.2 rlang_1.0.6 GlobalOptions_0.1.2
[55] splines_4.2.2 lazyeval_0.2.2 spatstat.geom_3.0-3
[58] abind_1.4-5 httpuv_1.6.6 tools_4.2.2
[61] cubature_2.0.4.5 ellipsis_0.3.2 RColorBrewer_1.1-3
[64] BiocGenerics_0.44.0 sessioninfo_1.2.2 clustree_0.5.0
[67] ggridges_0.5.4 Rcpp_1.0.9 plyr_1.8.8
[70] zlibbioc_1.44.0 purrr_0.3.5 RCurl_1.98-1.9
[73] ps_1.7.2 prettyunits_1.1.1 deldir_1.0-6
[76] pbapply_1.6-0 viridis_0.6.2 urlchecker_1.0.1
[79] S4Vectors_0.36.0 zoo_1.8-11 SummarizedExperiment_1.28.0 [82] ggrepel_0.9.2 cluster_2.1.4 fs_1.5.2
[85] magrittr_2.0.3 scattermore_0.8 SparseM_1.81
[88] lmtest_0.9-40 RANN_2.6.1 truncdist_1.0-2
[91] fitdistrplus_1.1-8 matrixStats_0.62.0 pkgload_1.3.2
[94] gsl_2.1-7.1 patchwork_1.1.2 mime_0.12
[97] evaluate_0.18 xtable_1.8-4 IRanges_2.32.0
[100] gridExtra_2.3 shape_1.4.6 compiler_4.2.2
[103] tibble_3.1.8 KernSmooth_2.23-20 crayon_1.5.2
[106] htmltools_0.5.3 mgcv_1.8-41 later_1.3.0
[109] tidyr_1.2.1 powerTCR_1.18.0 DBI_1.1.3
[112] tweenr_2.0.2 rappdirs_0.3.3 MASS_7.3-58.1
[115] Matrix_1.5-3 permute_0.9-7 cli_3.4.1
[118] parallel_4.2.2 evd_2.3-6.1 igraph_1.3.5
[121] GenomicRanges_1.50.1 pkgconfig_2.0.3 sp_1.5-1
[124] plotly_4.10.1 spatstat.sparse_3.0-0 foreach_1.5.2
[127] stringdist_0.9.10 XVector_0.38.0 stringr_1.4.1
[130] callr_3.7.3 digest_0.6.30 sctransform_0.3.5
[133] RcppAnnoy_0.0.20 vegan_2.6-4 spatstat.data_3.0-0
[136] leiden_0.4.3 uwot_0.1.14 curl_4.3.3
[139] evmix_2.12 shiny_1.7.3 lifecycle_1.0.3
[142] nlme_3.1-160 jsonlite_1.8.3 viridisLite_0.4.1
[145] fansi_1.0.3 pillar_1.8.1 lattice_0.20-45
[148] fastmap_1.1.0 httr_1.4.4 pkgbuild_1.3.1
[151] survival_3.4-0 glue_1.6.2 png_0.1-7
[154] iterators_1.0.14 bit_4.0.5 ggforce_0.4.1
[157] stringi_1.7.8 profvis_0.3.7 memoise_2.0.1
[160] irlba_2.3.5.1 future.apply_1.10.0

ncborcherding commented 1 year ago

Hey Pierre,

Thanks for reaching out and excellent run down. Based on the error message you are getting, I am betting there is something internal that is causing the NA values. Would you mind emailing me the output of expression2List() so I can troubleshoot the issue? Or a subsample of the data if you are more comfortable with that?

In terms of handling low numbers of clones - I think this is a persistent problem so I will add a parameter in the function to disable boot strapping (and I will push a commit when we figure out your issue).

Thanks, Nick

pankomah commented 1 year ago

Thanks for the quick response Nick! Emailed you the output.

Best, Pierre

ncborcherding commented 1 year ago

Hey Pierre,

Thanks so much! I will get back to you as soon as possible with a fix!

Nick

ncborcherding commented 1 year ago

Hey Pierre,

I think I identified the issue and it is indeed due to low TCR per grouping variable. I am going to partially remove the outputs from the code below for concerns with privacy.

#Loading data and merging it all together
df <- readRDS("~/Documents/ExpressionList.Rds")
dat <- bind_rows(df)

#Stripped out version of expression2List() 
#df is now a list grouped by your Phenotype.Patient variable
unique <- str_sort(as.character(unique(dat$Phenotype.Patient)), numeric = TRUE)
df <- NULL
for (i in seq_along(unique)) {
      subset <- subset(dat, dat[,split.by] == unique[i])
      subset <- subset(subset, !is.na(cloneType))
      df[[i]] <- subset
}

output <- clonalDiversity(df, cloneCall = "strict", split.by = "Phenotype.Patient", exportTable = FALSE)

Tthe input to the clonalDiversity() function (below) is now a list with the smallest repertoire = 4. So the clonalDiversity()is now subsampling 4 TCRs per list element to calculate the diversity metrics.

Screen Shot 2022-11-21 at 9 10 52 AM

This is why you are getting repetitive values and values close to 0 for the diversity functions.

I have added the new parameter for clonalDiversity(...,skip.boots = TRUE). When set to TRUE, this will skip the downsampling and bootstrapping of the function to allow for the calculation of diversity in your situation. However, just be cautious using the skip.boots parameter as large variances in repertoire size really has major influences on the diversity calculations.

You can install the version with this change, using:

devtools::install_github("ncborcherding/scRepertoire@dev")

Hope the helps and let me know if you have any additional problems (or suggestions).

Thanks, Nick

ankushs0128 commented 1 year ago

Thanks Nick , Best Regards Ankush Sharma

On Mon, Nov 21, 2022 at 4:21 PM theHumanBorch @.***> wrote:

Hey Pierre,

I think I identified the issue and it is indeed due to low TCR per grouping variable. I am going to partially remove the outputs from the code below for concerns with privacy.

Loading data and merging it all together

df <- readRDS("~/Documents/ExpressionList.Rds") dat <- bind_rows(df)

Stripped out version of expression2List()

df is now a list grouped by your Phenotype.Patient variable

unique <- str_sort(as.character(unique(dat$Phenotype.Patient)), numeric = TRUE) df <- NULL for (i in seq_along(unique)) { subset <- subset(dat, dat[,split.by] == unique[i]) subset <- subset(subset, !is.na(cloneType)) df[[i]] <- subset }

output <- clonalDiversity(df, cloneCall = "strict", split.by = "Phenotype.Patient", exportTable = FALSE)

Tthe input to the clonalDiversity() function (below) is now a list with the smallest repertoire = 4. So the clonalDiversity()is now subsampling 4 TCRs per list element to calculate the diversity metrics.

[image: Screen Shot 2022-11-21 at 9 10 52 AM] https://user-images.githubusercontent.com/22754118/203090057-f9887ab3-0f74-4318-be02-8a8ec9c081ad.png

This is why you are getting repetitive values and values close to 0 for the diversity functions.

I have added the new parameter for clonalDiversity(...,skip.boots = TRUE). When set to TRUE, this will skip the downsampling and bootstrapping of the function to allow for the calculation of diversity in your situation. However, just be cautious using the skip.boots parameter as large variances in repertoire size really has major influences on the diversity calculations.

You can install the version with this change, using:

@.***")

Hope the helps and let me know if you have any additional problems (or suggestions).

Thanks, Nick

— Reply to this email directly, view it on GitHub https://github.com/ncborcherding/scRepertoire/issues/203#issuecomment-1322224861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWCZGE32CACE7YC3LISXP3WJOHOJANCNFSM6AAAAAAR2WUKXY . You are receiving this because you authored the thread.Message ID: @.***>

pankomah commented 1 year ago

Thanks very much Nick. That's super helpful!

Best, Pierre

sloanlewis commented 1 year ago

Hi Nick,

I installed the dev version described above, but when I try to use the skip.boots parameter like this, I get the error below:

clonalDiversity(combined, 
                 cloneCall = "aa", 
               group.by = "sample", 
                 skip.boots = TRUE)

Error in `[<-`(`*tmp*`, , group.by, value = str_split(names(df), "[.]",  : 
  subscript out of bounds

If I run clonalDiversity without the skip.boots=TRUE, it runs properly. Any idea as to what the issue is? Thanks!

Here is my session info:

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SeuratObject_4.1.3 Seurat_4.3.0       scRepertoire_1.7.4 ggplot2_3.3.6     

loaded via a namespace (and not attached):
  [1] VGAM_1.1-7                  plyr_1.8.7                  igraph_1.3.5               
  [4] lazyeval_0.2.2              sp_1.5-0                    splines_4.2.0              
  [7] powerTCR_1.18.0             listenv_0.9.0               scattermore_0.8            
 [10] usethis_2.1.6               GenomeInfoDb_1.34.0         digest_0.6.30              
 [13] foreach_1.5.2               htmltools_0.5.4             viridis_0.6.2              
 [16] ggalluvial_0.12.3           fansi_1.0.3                 magrittr_2.0.3             
 [19] truncdist_1.0-2             memoise_2.0.1               tensor_1.5                 
 [22] cluster_2.1.4               doParallel_1.0.17           ROCR_1.0-11                
 [25] remotes_2.4.2               globals_0.16.2              graphlayouts_0.8.3         
 [28] matrixStats_0.62.0          spatstat.sparse_3.0-0       prettyunits_1.1.1          
 [31] colorspace_2.0-3            ggrepel_0.9.1               dplyr_1.0.10               
 [34] jsonlite_1.8.3              callr_3.7.3                 crayon_1.5.2               
 [37] RCurl_1.98-1.9              spatstat.data_3.0-0         progressr_0.13.0           
 [40] zoo_1.8-11                  survival_3.4-0              iterators_1.0.14           
 [43] glue_1.6.2                  polyclip_1.10-4             gtable_0.3.1               
 [46] zlibbioc_1.44.0             XVector_0.38.0              leiden_0.4.3               
 [49] DelayedArray_0.24.0         pkgbuild_1.4.0              evd_2.3-6.1                
 [52] future.apply_1.10.0         SingleCellExperiment_1.20.0 BiocGenerics_0.44.0        
 [55] abind_1.4-5                 SparseM_1.81                scales_1.2.1               
 [58] DBI_1.1.3                   spatstat.random_3.0-1       miniUI_0.1.1.1             
 [61] Rcpp_1.0.9                  viridisLite_0.4.1           xtable_1.8-4               
 [64] reticulate_1.27             stats4_4.2.0                profvis_0.3.7              
 [67] htmlwidgets_1.6.1           httr_1.4.4                  RColorBrewer_1.1-3         
 [70] ellipsis_0.3.2              ica_1.0-3                   urlchecker_1.0.1           
 [73] pkgconfig_2.0.3             farver_2.1.1                uwot_0.1.14                
 [76] deldir_1.0-6                utf8_1.2.2                  labeling_0.4.2             
 [79] tidyselect_1.2.0            rlang_1.0.6                 reshape2_1.4.4             
 [82] later_1.3.0                 munsell_0.5.0               tools_4.2.0                
 [85] cachem_1.0.6                cli_3.4.1                   generics_0.1.3             
 [88] devtools_2.4.5              ggridges_0.5.4              stringr_1.4.1              
 [91] fastmap_1.1.0               goftest_1.2-3               evmix_2.12                 
 [94] processx_3.8.0              fs_1.5.2                    fitdistrplus_1.1-8         
 [97] tidygraph_1.2.2             purrr_0.3.5                 RANN_2.6.1                 
[100] ggraph_2.1.0                pbapply_1.6-0               future_1.30.0              
[103] nlme_3.1-160                mime_0.12                   compiler_4.2.0             
[106] rstudioapi_0.14             plotly_4.10.1               png_0.1-7                  
[109] spatstat.utils_3.0-1        tibble_3.1.8                tweenr_2.0.2               
[112] stringi_1.7.8               gsl_2.1-7.1                 ps_1.7.2                   
[115] cubature_2.0.4.5            lattice_0.20-45             Matrix_1.5-1               
[118] vegan_2.6-4                 permute_0.9-7               vctrs_0.5.0                
[121] stringdist_0.9.10           pillar_1.8.1                lifecycle_1.0.3            
[124] spatstat.geom_3.0-3         lmtest_0.9-40               RcppAnnoy_0.0.20           
[127] data.table_1.14.4           irlba_2.3.5.1               cowplot_1.1.1              
[130] bitops_1.0-7                patchwork_1.1.2             httpuv_1.6.7               
[133] GenomicRanges_1.50.0        R6_2.5.1                    promises_1.2.0.1           
[136] KernSmooth_2.23-20          gridExtra_2.3               IRanges_2.32.0             
[139] parallelly_1.33.0           sessioninfo_1.2.2           codetools_0.2-18           
[142] MASS_7.3-58.1               assertthat_0.2.1            pkgload_1.3.2              
[145] SummarizedExperiment_1.28.0 withr_2.5.0                 sctransform_0.3.5          
[148] S4Vectors_0.36.0            GenomeInfoDbData_1.2.9      mgcv_1.8-41                
[151] parallel_4.2.0              grid_4.2.0                  tidyr_1.2.1                
[154] MatrixGenerics_1.10.0       Rtsne_0.16                  spatstat.explore_3.0-5     
[157] ggforce_0.4.1               Biobase_2.58.0              shiny_1.7.4    
ncborcherding commented 1 year ago

@sloanlewis

Thanks for finding a bug - will take a look and update you this week.

Nick

ncborcherding commented 1 year ago

@sloanlewis

Found the bug, tested fix and committed in the most recent dev version.

Nick

sloanlewis commented 1 year ago

That worked, thanks!

bobermayer commented 1 year ago

Hi I'm still getting an error when trying to set skip.boots=TRUE with v1.7.4 development version, commit c3d3ddd5. any ideas?

ncborcherding commented 1 year ago

@bobermayer

Did you restart your R session after the update?

Nick

bobermayer commented 1 year ago

ah, that fixed it. thanks a lot! looks like with skip.boots=TRUE the clonalDiversity function gives otherwise identical results to what I had a while ago in version 1.1.2, this is very helpful