How to Assess Clonal Diversity at the Subgroup Level?

lisch7 commented 3 months ago

Hi，

First of all, thank you for developing this package! It has been incredibly helpful.

I am pretty new to analysis scTCR-seq data. I have been following your tutorial, which seems to focus on calculations for individual samples. I am interested in extending these analyses to the subgroup level. Could you provide some guidance on how to perform basic analyses for subgroups, such as assessing clonal diversity, analyzing overlaps, etc.?

Thank you for your time and assistance!

Best regards

lisch7 commented 3 months ago

I believe adding a tutorial on analyzing cell subpopulations would greatly assist newcomers in understanding these concepts more easily. Such a tutorial could make the learning curve much smoother. Thank you for considering this suggestion!

ncborcherding commented 3 months ago

Hey @lisch7,

Thanks for reaching out - when you say subgroup analysis what are you trying to do exactly?

For the most part you can combine multiple groupings together and that will let you do a lot of things. For instance, say I wanted to look clonal trends in a patient and cluster specific manner:

seuratObj$new.variable <- paste0(seuratObj$patient, ".", seuratObj$seurat_clusters)

From there I can use group.by in most of the functions above to do the analysis.

Happy to work with you more, just let me know. I think it would be helpful for other users as well.

Nick

lisch7 commented 3 months ago

Hi Nick,

Thank you for your prompt response. I apologize for any confusion in my previous message; I’d like to clarify with an example.

The tutorial outlines how to use combineTCR to merge different output files and addVariable to append variables such as sample type and patient ID to the combined list. Subsequent steps like clonalQuant are demonstrated for clonal analysis. However, the tutorial mainly showcases clonal analysis across different samples.

In my view, a critical aspect of TCR analysis involves detailed cell clustering. For instance, I am interested in comparing the clonal share of Cluster A before and after treatment to identify key changes during the treatment. Typically, detailed annotation would be handled with tools like Seurat or Scanpy. The tutorial guides on integrating the combineTCR object with a Seurat object, but it’s unclear how to apply clonalQuant, clonalDiversity, or similar functions to assess the state of cell clusters.

I’ve noticed discussions about the expression2list function, which seems to split the integrated object into a list. However, it appears that direct usage of this function isn’t recommended, suggesting there might be a better approach that isn't covered in the tutorial.

Could you provide guidance on how to proceed after having a fully annotated Seurat object and a merged combineTCR object to effectively analyze and correlate clones across different conditions and subgroups?

Thank you for your assistance.

Best regards

ncborcherding commented 3 months ago

Hey @lisch7,

Sorry for the confusion, functions like clonalQuant() and clonalDiversity() allow as inputs the output of combineTCR()/combineBCR() and combineExpression() - so you can feed the single-cell object directly into the function clonalQuant(). By default if a seurat object, the function will then use the active identity, but you can select whatever you need in terms of variables using the group.by function.

There use to be an option of either using the suerat object directly or expression2List(), this was confusing and I had to handle a lot of questions/github issues. It is still present, just an internal function now assists in converting single-cell meta data into plotly formats.

Hope that helps clarify, Nick

lisch7 commented 3 months ago

Hi Nick,

Thank you for your kind and prompt response. I successfully utilized the clonalOverlap function with my combineExpression object, and it worked well. However, I encountered issues with other functions like clonalQuant and clonalAbundance. For instance, when I execute clonalQuant(sce_TCR, cloneCall="strict", chain = "both", group.by = c("Minor_anno")), I receive the following output:

And some information about my object was as follows:

> sce_TCR
An object of class Seurat 
36601 features across 116854 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)
 2 layers present: counts, data
 1 dimensional reduction calculated: umap

> sce_TCR$Minor_anno %>% unique()
 [1] CD8T_08_ZNF683  CD8T_07_GZMK    CD8T_02_ANXA1   CD8T_09_XCL2    CD8T_06_GZMB    CD8T_10_CXCL13  CD8T_03_SELL    CD8T_04_SLC4A10 CD8T_11_HAVCR2 
[10] CD8T_12_STMN1   CD8T_01_CCR7    CD8T_05_IFIT1  
12 Levels: CD8T_01_CCR7 CD8T_02_ANXA1 CD8T_03_SELL CD8T_04_SLC4A10 CD8T_05_IFIT1 CD8T_06_GZMB CD8T_07_GZMK CD8T_08_ZNF683 CD8T_09_XCL2 ... CD8T_12_STMN1

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Asia/Shanghai
tzcode source: system (glibc)

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-3          scRepertoire_2.0.0          clusterProfiler_4.11.0.001  tidyHeatmap_1.8.1           scDblFinder_1.16.0         
 [6] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0 Biobase_2.62.0              GenomicRanges_1.54.1        GenomeInfoDb_1.38.6        
[11] IRanges_2.36.0              S4Vectors_0.40.2            BiocGenerics_0.48.1         MatrixGenerics_1.14.0       matrixStats_1.3.0          
[16] ggVolcano_0.0.2             ComplexHeatmap_2.18.0       corrplot_0.92               Rmisc_1.5.1                 plyr_1.8.9                 
[21] lattice_0.22-5              rstatix_0.7.2               Hmisc_5.1-1                 data.table_1.15.4           reticulate_1.36.0          
[26] Azimuth_0.5.0               shinyBS_0.61.1              BPCells_0.1.0               Seurat_5.0.3                SeuratObject_5.0.1         
[31] sp_2.1-3                    pacman_0.5.1                lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1              
[36] dplyr_1.1.4                 purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                 tibble_3.2.1               
[41] ggplot2_3.5.0               tidyverse_2.0.0            

loaded via a namespace (and not attached):
  [1] DBI_1.2.2                         lpSolve_5.6.20                    httr_1.4.7                        registry_0.5-1                   
  [5] TFMPvalue_0.0.9                   BiocParallel_1.36.0               prettyunits_1.2.0                 yulab.utils_0.1.4                
  [9] gsl_2.1-8                         ProtGenerics_1.34.0               ggplotify_0.1.2                   GenomicAlignments_1.38.2         
 [13] sparseMatrixStats_1.14.0          spatstat.geom_3.2-9               survMisc_0.5.6                    pillar_1.9.0                     
 [17] R6_2.5.1                          mime_0.12                         edgeR_4.0.14                      uwot_0.2.2                       
 [21] viridis_0.6.5                     Rhdf5lib_1.24.2                   vegan_2.6-4                       SeuratData_0.2.2.9001            
 [25] ROCR_1.0-11                       fields_15.2                       ggpubr_0.6.0                      KMsurv_0.1-5                     
 [29] limma_3.58.1                      parallelly_1.37.1                 iNEXT_3.0.0                       GlobalOptions_0.1.2              
 [33] caTools_1.18.2                    mgcv_1.9-0                        polyclip_1.10-6                   NMF_0.27                         
 [37] beachmat_2.18.1                   htmltools_0.5.8.1                 fansi_1.0.6                       e1071_1.7-14                     
 [41] googledrive_2.1.1                 remotes_2.4.2.1                   ggrepel_0.9.5                     BSgenome.Hsapiens.UCSC.hg38_1.4.5
 [45] car_3.1-2                         stringfish_0.16.0                 fgsea_1.28.0                      scuttle_1.12.0                   
 [49] spatstat.utils_3.0-5              HDO.db_0.99.1                     maps_3.4.2                        pec_2023.04.12                   
 [53] survminer_0.4.9                   rpart_4.1.23                      clue_0.3-65                       scatterpie_0.2.1                 
 [57] fitdistrplus_1.1-11               goftest_1.2-3                     tidyselect_1.2.1                  sessioninfo_1.2.2                
 [61] RSQLite_2.3.5                     truncdist_1.0-2                   cowplot_1.1.3                     GenomeInfoDbData_1.2.11          
 [65] utf8_1.2.4                        ScaledMatrix_1.10.0               scattermore_1.2                   spatstat.data_3.0-4              
 [69] gridExtra_2.3                     fs_1.6.3                          xgboost_1.7.7.1                   timereg_2.0.5                    
 [73] sctransform_0.4.1                 graph_1.80.0                      future.apply_1.11.2               R.oo_1.26.0                      
 [77] CNEr_1.38.0                       Signac_1.12.0                     RcppHNSW_0.6.0                    vipor_0.4.7                      
 [81] rtracklayer_1.62.0                Rtsne_0.17                        DelayedMatrixStats_1.24.0         lazyeval_0.2.2                   
 [85] scales_1.3.0                      carData_3.0-5                     munsell_0.5.1                     lava_1.7.3                       
 [89] treeio_1.26.0                     R.utils_2.12.3                    profvis_0.3.8                     pracma_2.4.4                     
 [93] bitops_1.0-7                      labeling_0.4.3                    R.methodsS3_1.8.2                 KEGGREST_1.42.0                  
 [97] promises_1.3.0                    shape_1.4.6                       rhdf5filters_1.14.1               poweRlaw_0.80.0                  
[101] zoo_1.8-12                        princurve_2.1.6                   GSVA_1.50.0                       GenomicFeatures_1.54.3           
[105] locfit_1.5-9.8                    DelayedArray_0.28.0               RSpectra_0.16-1                   SeuratDisk_0.0.0.9021            
[109] tools_4.3.1                       ape_5.7-1                         ensembldb_2.26.0                  scater_1.30.1                    
[113] shiny_1.8.1.1                     BiocFileCache_2.10.1              rlang_1.1.3                       generics_0.1.3                   
[117] BiocSingular_1.18.0               ggridges_0.5.6                    evaluate_0.23                     TrajectoryUtils_1.10.0           
[121] stringdist_0.9.12                 BiocIO_1.12.0                     devtools_2.4.5                    reshape2_1.4.4                   
[125] colorspace_2.1-0                  limSolve_1.5.7.1                  ellipsis_0.3.2                    withr_3.0.0                      
[129] gargle_1.5.2                      RCurl_1.98-1.14                   presto_1.0.0                      restfulr_0.0.15                  
[133] xtable_1.8-4                      aplot_0.2.2                       MatrixModels_0.5-3                ks_1.14.2                        
[137] mclust_6.0.1                      httpuv_1.6.15                     cubature_2.1.0                    rmarkdown_2.26                   
[141] metapod_1.10.1                    EnsDb.Hsapiens.v86_2.99.0         MASS_7.3-60                       dqrng_0.3.2                      
[145] broom_1.0.5                       deldir_2.0-4                      GO.db_3.18.0                      rhdf5_2.46.1                     
[149] tensor_1.5                        googlesheets4_1.1.1               SCP_0.5.6                         vctrs_0.6.5                      
[153] lifecycle_1.0.4                   readxl_1.4.3                      proxy_0.4-27                      codetools_0.2-19                 
[157] fastDummies_1.7.3                 DT_0.31                           nlme_3.1-163                      future_1.33.2                    
[161] DESeq2_1.42.0                     hash_2.2.6.3                      progress_1.2.3                    dbplyr_2.4.0                     
[165] pkgload_1.3.4                     cellranger_1.1.0                  Rcpp_1.0.12                       shinydashboard_0.7.2             
[169] rstudioapi_0.15.0                 patchwork_1.2.0                   stringi_1.8.3                     AnnotationFilter_1.26.0          
[173] sscVis_0.1.0                      VGAM_1.1-9                        hms_1.1.3                         pbapply_1.7-2                    
[177] cachem_1.0.8                      hdf5r_1.3.11                      tidytree_0.4.6                    listenv_0.9.1                    
[181] XVector_0.42.0                    urlchecker_1.0.1                  ggrastr_1.0.2                     plotly_4.10.4                    
[185] ggtree_3.10.0                     enrichplot_1.22.0                 pkgbuild_1.4.3                    GetoptLong_1.0.5                 
[189] ggfun_0.1.4                       HDF5Array_1.30.1                  SparseArray_1.2.4                 htmlwidgets_1.6.4                
[193] Formula_1.2-5                     TFBSTools_1.40.0                  dendextend_1.17.1                 class_7.3-22                     
[197] memoise_2.0.1                     crayon_1.5.2                      gridGraphics_0.5-1                rappdirs_0.3.3                   
[201] S4Arrays_1.2.0                    xml2_1.3.6                        filelock_1.0.3                    preprocessCore_1.64.0            
[205] GOSemSim_2.28.1                   png_0.1-8                         progressr_0.14.0                  tzdb_0.4.0                       
[209] evd_2.3-6.1                       GSEABase_1.64.0                   fastmap_1.1.1                     tidygraph_1.3.1                  
[213] pkgconfig_2.0.3                   cli_3.6.2                         beeswarm_0.4.0                    DOSE_3.28.2                      
[217] ggforce_0.4.1                     IOBR_0.99.8                       prodlim_2023.08.28                Startrac_0.1.0                   
[221] ggsignif_0.6.4                    nnet_7.3-19                       DirichletMultinomial_1.44.0       gridBase_0.4-7                   
[225] ggalluvial_0.12.5                 JASPAR2020_0.99.10                lmtest_0.9-40                     latex2exp_0.9.6                  
[229] usethis_2.2.2                     RcppAnnoy_0.0.22                  evmix_2.12                        qs_0.25.7                        
[233] timechange_0.3.0                  viridisLite_0.4.2                 scran_1.30.2                      ggdendro_0.1.23                  
[237] foreign_0.8-86                    splines_4.3.1                     blob_1.2.4                        timeROC_0.4                      
[241] impute_1.76.0                     annotate_1.80.0                   XML_3.99-0.16.1                   numDeriv_2016.8-1.1              
[245] globals_0.16.3                    ggbeeswarm_0.7.2                  permute_0.9-7                     knitr_1.46                       
[249] ggprism_1.0.4                     RcppRoll_0.3.0                    ica_1.0-3                         spam_2.10-0                      
[253] compiler_4.3.1                    rjson_0.2.21                      RcppParallel_5.1.7                biomaRt_2.58.2                   
[257] bit_4.0.5                         slingshot_2.10.0                  BiocNeighbors_1.20.2              glue_1.7.0                       
[261] R.cache_0.16.0                    digest_0.6.35                     quadprog_1.5-8                    irlba_2.3.5.1                    
[265] leiden_0.4.3.1                    glmnet_4.1-8                      graphlayouts_1.1.0                foreach_1.5.2                    
[269] RApiSerialize_0.1.2               seqLogo_1.68.0                    spatstat.random_3.2-3             SparseM_1.81                     
[273] zlibbioc_1.48.0                   dotCall64_1.1-1                   tweenr_2.0.2                      proxyC_0.3.4                     
[277] statmod_1.5.0                     ggraph_2.1.0                      rsvd_1.0.5                        gson_0.1.0                       
[281] igraph_2.0.3                      mvtnorm_1.2-4                     yaml_2.3.8                        ggnewscale_0.4.9                 
[285] qvalue_2.34.0                     BSgenome_1.70.1                   later_1.3.2                       backports_1.4.1                  
[289] moduleColor_1.8-4                 shadowtext_0.1.3                  Rsamtools_2.18.0                  AnnotationDbi_1.64.1             
[293] parallel_4.3.1                    quantreg_5.97                     miniUI_0.1.1.1                    gtable_0.3.4                     
[297] abind_1.4-5                       xfun_0.43                         Biostrings_2.70.2                 curl_5.2.1                       
[301] doParallel_1.0.17                 dynamicTreeCut_1.63-1             KernSmooth_2.23-22                survival_3.5-7                   
[305] jsonlite_1.8.8                    magrittr_2.0.3                    base64enc_0.1-3                   iterators_1.0.14                 
[309] Matrix_1.6-5                      RhpcBLASctl_0.23-42               km.ci_0.5-6                       fastmatch_1.1-4                  
[313] checkmate_2.3.1                   gtools_3.9.5                      shinyjs_2.1.0                     htmlTable_2.4.2                  
[317] spatstat.sparse_3.0-3             rngtools_1.5.2                    RANN_2.6.1                        bluster_1.12.0                   
[321] writexl_1.4.2                     circlize_0.4.15                   spatstat.explore_3.2-7            bit64_4.0.5                      
[325] cluster_2.1.4                     farver_2.1.1

Additionally, I'm seeking guidance on how to apply the clonalCompare function effectively within a combineExpression object. The example in the documentation combines different samples as shown below:

combined <- combineTCR(contig_list, 
                       samples =c("P17B", "P17L", "P18B", "P18L", 
                                   "P19B","P19L", "P20B", "P20L"))
clonalCompare(combined, 
              top.clones = 5, 
              samples = c("P17B", "P17L"), 
              cloneCall="aa")

In this scenario, the combineTCR function labels the samples within the combined list, allowing for direct comparisons, such as between P18B and P17L. However, the documentation also mentions that a combinedExpression object can be used. Could you please explain how this applies to analyzing cell clusters?

Thank you for your guidance.

ncborcherding commented 3 months ago

I am going to use the built-in version of data to make a reproducible example:

Here is the way I am making the data:

data("contig_list") 
combined.TCR <- combineTCR(contig_list, 
                           samples = c("P17B", "P17L", "P18B", "P18L", 
                                            "P19B","P19L", "P20B", "P20L"))

#Loading a local copy of the Seurat Object I have
scRep_example <- readRDS("scRep_example_full.rds")
scRep_example <- combineExpression(combined.TCR, 
                                   scRep_example, 
                                   cloneCall="gene", 
                                   group.by = "sample", 
                                   proportion = FALSE, 
                                   cloneSize=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))

clonalQuant

clonalQuant(scRep_example, cloneCall="strict", chain = "both", group.by = c("orig.ident"))

It looks like I did find a small bug in the code if the input is a factor. I did just push a fix for this in the dev branch - which will hopefully be up in the master branch of the github repo this week or early next week. I'm then planning to get it on bioconductor.

scRep_example$factor.orig.ident <- as.factor(scRep_example$orig.ident)
clonalQuant(scRep_example, cloneCall="strict", chain = "both", group.by = "factor.orig.ident")

After implementing the change:

clonalQuant(scRep_example, cloneCall="strict", chain = "both", group.by = "factor.orig.ident")

clonalCompare

You should be able to state the variable you would like to use with the group.by parameter. Here is an example using the output of combineTCR()

clonalCompare(combined.TCR, 
              top.clones = 5, 
              samples = c("P17B", "P17L"), 
              cloneCall="aa")

Here is a similar graph using the output of combineExpression(). Notice 2 things - 1) I state group.by = "orig.ident" and I call the specific groups I want using samples and 2) The graphs is slightly different from the one above - this is due to the clonal information only adding to cells with RNA values (so the numbers and ranks are slightly different).

clonalCompare(scRep_example,
              group.by = "orig.ident",
              top.clones = 5, 
              samples = c("P17B", "P17L"), 
              cloneCall="aa")

I hope that helps and let me know if you have any other questions or suggestions.

Thanks, Nick

lisch7 commented 3 months ago

Hi Nick,

Thank you so much for your prompt and detailed response! I am genuinely grateful for your enthusiasm and professionalism. Your attentiveness and dedication truly stand out. Having interacted with numerous projects on GitHub, I must say that your response time is the fastest I've encountered. I deeply appreciate your contributions to the community. Wishing you all the best in your ongoing and future endeavors!

Best regards

ncborcherding commented 3 months ago

Thanks for the kind words - the best part for me about scRepertoire has been seeing how people use the package.

ncborcherding / scRepertoire

How to Assess Clonal Diversity at the Subgroup Level? #394

clonalQuant

clonalCompare