ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
311 stars 54 forks source link

The 2.0.5 clonalCluster example code on the vignette doesn't work on dev branch #436

Closed Qile0317 closed 3 weeks ago

Qile0317 commented 3 weeks ago

The clonalCluster example on the current 2.0.5 vignette page when ran on the current dev branch does not work. Here are the steps to reproduce it on my machine:

data("scRep_example")

scRep_example$Patient <- substr(scRep_example$orig.ident, 1,3)
scRep_example$Type <- substr(scRep_example$orig.ident, 4,4)
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)

scRep_example <- clonalCluster(scRep_example, 
                               chain = "TRA", 
                               sequence = "aa", 
                               threshold = 0.85, 
                               group.by = "Patient")

This produces the following error and traceback (from the second last):

Error in eval(e, x, parent.frame()) : object 'cloneSize' not found
> traceback()
8: eval(e, x, parent.frame())
7: eval(e, x, parent.frame())
6: subset.data.frame(subset, !is.na(cloneSize))
5: subset(subset, !is.na(cloneSize)) at utils.R#557
4: .expression2List(df, split.by) at utils.R#185
3: .list.input.return(df, split.by) at utils.R#567
2: .data.wrangle(input.data, group.by, "CTgene", chain) at clonalCluster.R#67

Additionally, I am having issues with certain other tests but getting other related errors. I am unsure if this is linked to any recent PRs that unintentionally changed certain behaviour and will investigate. I have also yet to investigate whether this issue exists on a previous patch version. This is probably the best way to investigate this issue.

SessionInfo:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] scRepertoire_2.0.8 ggplot2_3.5.1      testthat_3.2.1.1  

loaded via a namespace (and not attached):
  [1] fs_1.6.4                    matrixStats_1.4.1          
  [3] spatstat.sparse_3.1-0       devtools_2.4.5             
  [5] lubridate_1.9.3             httr_1.4.7                 
  [7] RColorBrewer_1.1-3          profvis_0.4.0              
  [9] tools_4.4.1                 sctransform_0.4.1          
 [11] utf8_1.2.4                  R6_2.5.1                   
 [13] lazyeval_0.2.2              uwot_0.2.2                 
 [15] urlchecker_1.0.1            withr_3.0.2                
 [17] sp_2.1-4                    gridExtra_2.3              
 [19] progressr_0.14.0            quantreg_5.99              
 [21] cli_3.6.3                   Biobase_2.64.0             
 [23] spatstat.explore_3.3-3      fastDummies_1.7.4          
 [25] iNEXT_3.0.1                 labeling_0.4.3             
 [27] Seurat_5.1.0                spatstat.data_3.1-2        
 [29] ggridges_0.5.6              pbapply_1.7-2              
 [31] commonmark_1.9.2            stringdist_0.9.12          
 [33] parallelly_1.38.0           sessioninfo_1.2.2          
 [35] VGAM_1.1-12                 rstudioapi_0.17.1          
 [37] generics_0.1.3              ica_1.0-3                  
 [39] spatstat.random_3.3-2       dplyr_1.1.4                
 [41] Matrix_1.7-1                waldo_0.5.3                
 [43] fansi_1.0.6                 S4Vectors_0.42.1           
 [45] abind_1.4-8                 lifecycle_1.0.4            
 [47] SummarizedExperiment_1.34.0 SparseArray_1.4.8          
 [49] Rtsne_0.17                  grid_4.4.1                 
 [51] promises_1.3.0              crayon_1.5.3               
 [53] miniUI_0.1.1.1              lattice_0.22-6             
 [55] cowplot_1.1.3               pillar_1.9.0               
 [57] knitr_1.48                  GenomicRanges_1.56.2       
 [59] rjson_0.2.23                future.apply_1.11.3        
 [61] codetools_0.2-20            leiden_0.4.3.1             
 [63] glue_1.8.0                  spatstat.univar_3.0-1      
 [65] data.table_1.16.2           remotes_2.5.0              
 [67] vctrs_0.6.5                 png_0.1-8                  
 [69] spam_2.11-0                 gtable_0.3.6               
 [71] assertthat_0.2.1            cachem_1.1.0               
 [73] xfun_0.48                   S4Arrays_1.4.1             
 [75] mime_0.12                   tidygraph_1.3.1            
 [77] survival_3.7-0              SingleCellExperiment_1.26.0
 [79] ellipsis_0.3.2              fitdistrplus_1.2-1         
 [81] ROCR_1.0-11                 nlme_3.1-166               
 [83] usethis_3.0.0               RcppAnnoy_0.0.22           
 [85] evd_2.3-7.1                 GenomeInfoDb_1.40.1        
 [87] rprojroot_2.0.4             irlba_2.3.5.1              
 [89] KernSmooth_2.23-24          colorspace_2.1-1           
 [91] BiocGenerics_0.50.0         tidyselect_1.2.1           
 [93] processx_3.8.4              compiler_4.4.1             
 [95] vdiffr_1.0.7                SparseM_1.84-2             
 [97] xml2_1.3.6                  desc_1.4.3                 
 [99] ggdendro_0.2.0              DelayedArray_0.30.1        
[101] plotly_4.10.4               scales_1.3.0               
[103] lmtest_0.9-40               callr_3.7.6                
[105] stringr_1.5.1               digest_0.6.37              
[107] goftest_1.2-3               spatstat.utils_3.1-0       
[109] XVector_0.44.0              htmltools_0.5.8.1          
[111] pkgconfig_2.0.3             MatrixGenerics_1.16.0      
[113] fastmap_1.2.0               rlang_1.1.4                
[115] htmlwidgets_1.6.4           UCSC.utils_1.0.0           
[117] shiny_1.9.1                 farver_2.1.2               
[119] zoo_1.8-12                  jsonlite_1.8.9             
[121] magrittr_2.0.3              GenomeInfoDbData_1.2.12    
[123] dotCall64_1.2               patchwork_1.3.0            
[125] munsell_0.5.1               Rcpp_1.0.13                
[127] evmix_2.12                  viridis_0.6.5              
[129] reticulate_1.39.0           truncdist_1.0-2            
[131] stringi_1.8.4               ggalluvial_0.12.5          
[133] ggraph_2.2.1                brio_1.1.5                 
[135] zlibbioc_1.50.0             MASS_7.3-61                
[137] plyr_1.8.9                  pkgbuild_1.4.5             
[139] parallel_4.4.1              listenv_0.9.1              
[141] ggrepel_0.9.6               deldir_2.0-4               
[143] graphlayouts_1.2.0          splines_4.4.1              
[145] tensor_1.5                  hash_2.2.6.3               
[147] ps_1.8.1                    igraph_2.1.1               
[149] spatstat.geom_3.3-3         cubature_2.1.1             
[151] RcppHNSW_0.6.0              reshape2_1.4.4             
[153] stats4_4.4.1                pkgload_1.4.0              
[155] SeuratObject_5.0.2          tweenr_2.0.3               
[157] httpuv_1.6.15               MatrixModels_0.5-3         
[159] RANN_2.6.2                  tidyr_1.3.1                
[161] purrr_1.0.2                 polyclip_1.10-7            
[163] future_1.34.0               scattermore_1.2            
[165] ggforce_0.4.2               xtable_1.8-4               
[167] RSpectra_0.16-2             roxygen2_7.3.2             
[169] later_1.3.2                 viridisLite_0.4.2          
[171] gsl_2.1-8                   tibble_3.2.1               
[173] memoise_2.0.1               IRanges_2.38.1             
[175] cluster_2.1.6               timechange_0.3.0           
[177] globals_0.16.3 
Qile0317 commented 3 weeks ago

Also if my hypothesis that some recent PR accidentally changed some intended behaviour of certain functions is correct, then this may or may not be related to #433. Also a high chance that these are completely unrelated though as I am unsure which branch they even used.

ncborcherding commented 3 weeks ago

There is only one place that the following phrase exists - which is in the internal function .expression2List()

subset.data.frame(subset, !is.na(cloneSize))

I think this means that the clones have not been added to the object scRep_example. Especially because the following code does not have a discrete combineExpression() call.

data("scRep_example")

scRep_example$Patient <- substr(scRep_example$orig.ident, 1,3)
scRep_example$Type <- substr(scRep_example$orig.ident, 4,4)
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)

scRep_example <- clonalCluster(scRep_example, 
                               chain = "TRA", 
                               sequence = "aa", 
                               threshold = 0.85, 
                               group.by = "Patient")
Qile0317 commented 3 weeks ago

@ncborcherding Ok - combined the contig list but now the TRA_clusters metadata column is almost all NAs which does not line up with the visualization showing many clusters:

data("contig_list")
combined <- combineTCR(contig_list, 
                           samples = c("P17B", "P17L", "P18B", "P18L", 
                                            "P19B","P19L", "P20B", "P20L"),
                           removeNA = FALSE, 
                           removeMulti = FALSE, 
                           filterMulti = FALSE)

data("scRep_example")
scRep_example <- combineExpression(combined, scRep_example)

scRep_example$Patient <- substr(scRep_example$orig.ident, 1,3)

#Adding type information
scRep_example$Type <- substr(scRep_example$orig.ident, 4,4)

#Define color palette 
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)

scRep_example <- clonalCluster(scRep_example, 
                               chain = "TRA", 
                               sequence = "aa", 
                               threshold = 0.85, 
                               group.by = "Patient")

mean(!is.na(scRep_example$TRA_cluster))
# outputs 0.008 (only 4 non-NA rows)

Perhaps I missed some prior step in the vignette where scRep example was further modified? Just trying to get to the bottom of this so I can ensure correct behaviour for #425

Probably also will add an internal assert checker that an object has been combined.

ncborcherding commented 3 weeks ago

@Qile0317

The issue is you are using the small scRep_example Seurat object that has 500 randomly sampled cells (~350 have clones). The vignette on the website uses the full Seurat object.

If you just look at the output of combined.TCR(), you can get a better approximation:

combined <- addVariable(combined, 
                        variable.name = "Patient", 
                        variables = c("P17", "P17", "P18", "P18", 
                                      "P19","P19", "P20", "P20"))
combined <- clonalCluster(combined, 
                               chain = "TRA", 
                               sequence = "aa", 
                               threshold = 0.85, 
                               group.by = "Patient")

mean(!is.na(do.call(rbind, combined)))

[1] 0.9462471

I will work on troubleshooting #425 tomorrow/next week and see if I can help find the issue.

Nick