Closed lisch7 closed 3 months ago
I believe adding a tutorial on analyzing cell subpopulations would greatly assist newcomers in understanding these concepts more easily. Such a tutorial could make the learning curve much smoother. Thank you for considering this suggestion!
Hey @lisch7,
Thanks for reaching out - when you say subgroup analysis what are you trying to do exactly?
For the most part you can combine multiple groupings together and that will let you do a lot of things. For instance, say I wanted to look clonal trends in a patient and cluster specific manner:
seuratObj$new.variable <- paste0(seuratObj$patient, ".", seuratObj$seurat_clusters)
From there I can use group.by in most of the functions above to do the analysis.
Happy to work with you more, just let me know. I think it would be helpful for other users as well.
Nick
Hi Nick,
Thank you for your prompt response. I apologize for any confusion in my previous message; I’d like to clarify with an example.
The tutorial outlines how to use combineTCR
to merge different output files and addVariable
to append variables such as sample type and patient ID to the combined list. Subsequent steps like clonalQuant
are demonstrated for clonal analysis. However, the tutorial mainly showcases clonal analysis across different samples.
In my view, a critical aspect of TCR analysis involves detailed cell clustering. For instance, I am interested in comparing the clonal share of Cluster A before and after treatment to identify key changes during the treatment. Typically, detailed annotation would be handled with tools like Seurat
or Scanpy
. The tutorial guides on integrating the combineTCR
object with a Seurat
object, but it’s unclear how to apply clonalQuant
, clonalDiversity
, or similar functions to assess the state of cell clusters.
I’ve noticed discussions about the expression2list
function, which seems to split the integrated object into a list. However, it appears that direct usage of this function isn’t recommended, suggesting there might be a better approach that isn't covered in the tutorial.
Could you provide guidance on how to proceed after having a fully annotated Seurat object and a merged combineTCR object to effectively analyze and correlate clones across different conditions and subgroups?
Thank you for your assistance.
Best regards
Hey @lisch7,
Sorry for the confusion, functions like clonalQuant()
and clonalDiversity()
allow as inputs the output of combineTCR()/combineBCR()
and combineExpression()
- so you can feed the single-cell object directly into the function clonalQuant()
. By default if a seurat object, the function will then use the active identity, but you can select whatever you need in terms of variables using the group.by function.
There use to be an option of either using the suerat object directly or expression2List()
, this was confusing and I had to handle a lot of questions/github issues. It is still present, just an internal function now assists in converting single-cell meta data into plotly formats.
Hope that helps clarify, Nick
Hi Nick,
Thank you for your kind and prompt response. I successfully utilized the clonalOverlap
function with my combineExpression object, and it worked well. However, I encountered issues with other functions like clonalQuant
and clonalAbundance
. For instance, when I execute clonalQuant(sce_TCR, cloneCall="strict", chain = "both", group.by = c("Minor_anno"))
, I receive the following output:
And some information about my object was as follows:
> sce_TCR
An object of class Seurat
36601 features across 116854 samples within 1 assay
Active assay: RNA (36601 features, 0 variable features)
2 layers present: counts, data
1 dimensional reduction calculated: umap
> sce_TCR$Minor_anno %>% unique()
[1] CD8T_08_ZNF683 CD8T_07_GZMK CD8T_02_ANXA1 CD8T_09_XCL2 CD8T_06_GZMB CD8T_10_CXCL13 CD8T_03_SELL CD8T_04_SLC4A10 CD8T_11_HAVCR2
[10] CD8T_12_STMN1 CD8T_01_CCR7 CD8T_05_IFIT1
12 Levels: CD8T_01_CCR7 CD8T_02_ANXA1 CD8T_03_SELL CD8T_04_SLC4A10 CD8T_05_IFIT1 CD8T_06_GZMB CD8T_07_GZMK CD8T_08_ZNF683 CD8T_09_XCL2 ... CD8T_12_STMN1
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Shanghai
tzcode source: system (glibc)
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-3 scRepertoire_2.0.0 clusterProfiler_4.11.0.001 tidyHeatmap_1.8.1 scDblFinder_1.16.0
[6] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0 Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.6
[11] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 MatrixGenerics_1.14.0 matrixStats_1.3.0
[16] ggVolcano_0.0.2 ComplexHeatmap_2.18.0 corrplot_0.92 Rmisc_1.5.1 plyr_1.8.9
[21] lattice_0.22-5 rstatix_0.7.2 Hmisc_5.1-1 data.table_1.15.4 reticulate_1.36.0
[26] Azimuth_0.5.0 shinyBS_0.61.1 BPCells_0.1.0 Seurat_5.0.3 SeuratObject_5.0.1
[31] sp_2.1-3 pacman_0.5.1 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[36] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[41] ggplot2_3.5.0 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] DBI_1.2.2 lpSolve_5.6.20 httr_1.4.7 registry_0.5-1
[5] TFMPvalue_0.0.9 BiocParallel_1.36.0 prettyunits_1.2.0 yulab.utils_0.1.4
[9] gsl_2.1-8 ProtGenerics_1.34.0 ggplotify_0.1.2 GenomicAlignments_1.38.2
[13] sparseMatrixStats_1.14.0 spatstat.geom_3.2-9 survMisc_0.5.6 pillar_1.9.0
[17] R6_2.5.1 mime_0.12 edgeR_4.0.14 uwot_0.2.2
[21] viridis_0.6.5 Rhdf5lib_1.24.2 vegan_2.6-4 SeuratData_0.2.2.9001
[25] ROCR_1.0-11 fields_15.2 ggpubr_0.6.0 KMsurv_0.1-5
[29] limma_3.58.1 parallelly_1.37.1 iNEXT_3.0.0 GlobalOptions_0.1.2
[33] caTools_1.18.2 mgcv_1.9-0 polyclip_1.10-6 NMF_0.27
[37] beachmat_2.18.1 htmltools_0.5.8.1 fansi_1.0.6 e1071_1.7-14
[41] googledrive_2.1.1 remotes_2.4.2.1 ggrepel_0.9.5 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[45] car_3.1-2 stringfish_0.16.0 fgsea_1.28.0 scuttle_1.12.0
[49] spatstat.utils_3.0-5 HDO.db_0.99.1 maps_3.4.2 pec_2023.04.12
[53] survminer_0.4.9 rpart_4.1.23 clue_0.3-65 scatterpie_0.2.1
[57] fitdistrplus_1.1-11 goftest_1.2-3 tidyselect_1.2.1 sessioninfo_1.2.2
[61] RSQLite_2.3.5 truncdist_1.0-2 cowplot_1.1.3 GenomeInfoDbData_1.2.11
[65] utf8_1.2.4 ScaledMatrix_1.10.0 scattermore_1.2 spatstat.data_3.0-4
[69] gridExtra_2.3 fs_1.6.3 xgboost_1.7.7.1 timereg_2.0.5
[73] sctransform_0.4.1 graph_1.80.0 future.apply_1.11.2 R.oo_1.26.0
[77] CNEr_1.38.0 Signac_1.12.0 RcppHNSW_0.6.0 vipor_0.4.7
[81] rtracklayer_1.62.0 Rtsne_0.17 DelayedMatrixStats_1.24.0 lazyeval_0.2.2
[85] scales_1.3.0 carData_3.0-5 munsell_0.5.1 lava_1.7.3
[89] treeio_1.26.0 R.utils_2.12.3 profvis_0.3.8 pracma_2.4.4
[93] bitops_1.0-7 labeling_0.4.3 R.methodsS3_1.8.2 KEGGREST_1.42.0
[97] promises_1.3.0 shape_1.4.6 rhdf5filters_1.14.1 poweRlaw_0.80.0
[101] zoo_1.8-12 princurve_2.1.6 GSVA_1.50.0 GenomicFeatures_1.54.3
[105] locfit_1.5-9.8 DelayedArray_0.28.0 RSpectra_0.16-1 SeuratDisk_0.0.0.9021
[109] tools_4.3.1 ape_5.7-1 ensembldb_2.26.0 scater_1.30.1
[113] shiny_1.8.1.1 BiocFileCache_2.10.1 rlang_1.1.3 generics_0.1.3
[117] BiocSingular_1.18.0 ggridges_0.5.6 evaluate_0.23 TrajectoryUtils_1.10.0
[121] stringdist_0.9.12 BiocIO_1.12.0 devtools_2.4.5 reshape2_1.4.4
[125] colorspace_2.1-0 limSolve_1.5.7.1 ellipsis_0.3.2 withr_3.0.0
[129] gargle_1.5.2 RCurl_1.98-1.14 presto_1.0.0 restfulr_0.0.15
[133] xtable_1.8-4 aplot_0.2.2 MatrixModels_0.5-3 ks_1.14.2
[137] mclust_6.0.1 httpuv_1.6.15 cubature_2.1.0 rmarkdown_2.26
[141] metapod_1.10.1 EnsDb.Hsapiens.v86_2.99.0 MASS_7.3-60 dqrng_0.3.2
[145] broom_1.0.5 deldir_2.0-4 GO.db_3.18.0 rhdf5_2.46.1
[149] tensor_1.5 googlesheets4_1.1.1 SCP_0.5.6 vctrs_0.6.5
[153] lifecycle_1.0.4 readxl_1.4.3 proxy_0.4-27 codetools_0.2-19
[157] fastDummies_1.7.3 DT_0.31 nlme_3.1-163 future_1.33.2
[161] DESeq2_1.42.0 hash_2.2.6.3 progress_1.2.3 dbplyr_2.4.0
[165] pkgload_1.3.4 cellranger_1.1.0 Rcpp_1.0.12 shinydashboard_0.7.2
[169] rstudioapi_0.15.0 patchwork_1.2.0 stringi_1.8.3 AnnotationFilter_1.26.0
[173] sscVis_0.1.0 VGAM_1.1-9 hms_1.1.3 pbapply_1.7-2
[177] cachem_1.0.8 hdf5r_1.3.11 tidytree_0.4.6 listenv_0.9.1
[181] XVector_0.42.0 urlchecker_1.0.1 ggrastr_1.0.2 plotly_4.10.4
[185] ggtree_3.10.0 enrichplot_1.22.0 pkgbuild_1.4.3 GetoptLong_1.0.5
[189] ggfun_0.1.4 HDF5Array_1.30.1 SparseArray_1.2.4 htmlwidgets_1.6.4
[193] Formula_1.2-5 TFBSTools_1.40.0 dendextend_1.17.1 class_7.3-22
[197] memoise_2.0.1 crayon_1.5.2 gridGraphics_0.5-1 rappdirs_0.3.3
[201] S4Arrays_1.2.0 xml2_1.3.6 filelock_1.0.3 preprocessCore_1.64.0
[205] GOSemSim_2.28.1 png_0.1-8 progressr_0.14.0 tzdb_0.4.0
[209] evd_2.3-6.1 GSEABase_1.64.0 fastmap_1.1.1 tidygraph_1.3.1
[213] pkgconfig_2.0.3 cli_3.6.2 beeswarm_0.4.0 DOSE_3.28.2
[217] ggforce_0.4.1 IOBR_0.99.8 prodlim_2023.08.28 Startrac_0.1.0
[221] ggsignif_0.6.4 nnet_7.3-19 DirichletMultinomial_1.44.0 gridBase_0.4-7
[225] ggalluvial_0.12.5 JASPAR2020_0.99.10 lmtest_0.9-40 latex2exp_0.9.6
[229] usethis_2.2.2 RcppAnnoy_0.0.22 evmix_2.12 qs_0.25.7
[233] timechange_0.3.0 viridisLite_0.4.2 scran_1.30.2 ggdendro_0.1.23
[237] foreign_0.8-86 splines_4.3.1 blob_1.2.4 timeROC_0.4
[241] impute_1.76.0 annotate_1.80.0 XML_3.99-0.16.1 numDeriv_2016.8-1.1
[245] globals_0.16.3 ggbeeswarm_0.7.2 permute_0.9-7 knitr_1.46
[249] ggprism_1.0.4 RcppRoll_0.3.0 ica_1.0-3 spam_2.10-0
[253] compiler_4.3.1 rjson_0.2.21 RcppParallel_5.1.7 biomaRt_2.58.2
[257] bit_4.0.5 slingshot_2.10.0 BiocNeighbors_1.20.2 glue_1.7.0
[261] R.cache_0.16.0 digest_0.6.35 quadprog_1.5-8 irlba_2.3.5.1
[265] leiden_0.4.3.1 glmnet_4.1-8 graphlayouts_1.1.0 foreach_1.5.2
[269] RApiSerialize_0.1.2 seqLogo_1.68.0 spatstat.random_3.2-3 SparseM_1.81
[273] zlibbioc_1.48.0 dotCall64_1.1-1 tweenr_2.0.2 proxyC_0.3.4
[277] statmod_1.5.0 ggraph_2.1.0 rsvd_1.0.5 gson_0.1.0
[281] igraph_2.0.3 mvtnorm_1.2-4 yaml_2.3.8 ggnewscale_0.4.9
[285] qvalue_2.34.0 BSgenome_1.70.1 later_1.3.2 backports_1.4.1
[289] moduleColor_1.8-4 shadowtext_0.1.3 Rsamtools_2.18.0 AnnotationDbi_1.64.1
[293] parallel_4.3.1 quantreg_5.97 miniUI_0.1.1.1 gtable_0.3.4
[297] abind_1.4-5 xfun_0.43 Biostrings_2.70.2 curl_5.2.1
[301] doParallel_1.0.17 dynamicTreeCut_1.63-1 KernSmooth_2.23-22 survival_3.5-7
[305] jsonlite_1.8.8 magrittr_2.0.3 base64enc_0.1-3 iterators_1.0.14
[309] Matrix_1.6-5 RhpcBLASctl_0.23-42 km.ci_0.5-6 fastmatch_1.1-4
[313] checkmate_2.3.1 gtools_3.9.5 shinyjs_2.1.0 htmlTable_2.4.2
[317] spatstat.sparse_3.0-3 rngtools_1.5.2 RANN_2.6.1 bluster_1.12.0
[321] writexl_1.4.2 circlize_0.4.15 spatstat.explore_3.2-7 bit64_4.0.5
[325] cluster_2.1.4 farver_2.1.1
Additionally, I'm seeking guidance on how to apply the clonalCompare
function effectively within a combineExpression object. The example in the documentation combines different samples as shown below:
combined <- combineTCR(contig_list,
samples =c("P17B", "P17L", "P18B", "P18L",
"P19B","P19L", "P20B", "P20L"))
clonalCompare(combined,
top.clones = 5,
samples = c("P17B", "P17L"),
cloneCall="aa")
In this scenario, the combineTCR
function labels the samples within the combined list, allowing for direct comparisons, such as between P18B and P17L. However, the documentation also mentions that a combinedExpression object can be used. Could you please explain how this applies to analyzing cell clusters?
Thank you for your guidance.
I am going to use the built-in version of data to make a reproducible example:
Here is the way I am making the data:
data("contig_list")
combined.TCR <- combineTCR(contig_list,
samples = c("P17B", "P17L", "P18B", "P18L",
"P19B","P19L", "P20B", "P20L"))
#Loading a local copy of the Seurat Object I have
scRep_example <- readRDS("scRep_example_full.rds")
scRep_example <- combineExpression(combined.TCR,
scRep_example,
cloneCall="gene",
group.by = "sample",
proportion = FALSE,
cloneSize=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))
clonalQuant(scRep_example, cloneCall="strict", chain = "both", group.by = c("orig.ident"))
It looks like I did find a small bug in the code if the input is a factor. I did just push a fix for this in the dev branch - which will hopefully be up in the master branch of the github repo this week or early next week. I'm then planning to get it on bioconductor.
scRep_example$factor.orig.ident <- as.factor(scRep_example$orig.ident)
clonalQuant(scRep_example, cloneCall="strict", chain = "both", group.by = "factor.orig.ident")
After implementing the change:
clonalQuant(scRep_example, cloneCall="strict", chain = "both", group.by = "factor.orig.ident")
You should be able to state the variable you would like to use with the group.by parameter. Here is an example using the output of combineTCR()
clonalCompare(combined.TCR,
top.clones = 5,
samples = c("P17B", "P17L"),
cloneCall="aa")
Here is a similar graph using the output of combineExpression()
. Notice 2 things - 1) I state group.by = "orig.ident" and I call the specific groups I want using samples and 2) The graphs is slightly different from the one above - this is due to the clonal information only adding to cells with RNA values (so the numbers and ranks are slightly different).
clonalCompare(scRep_example,
group.by = "orig.ident",
top.clones = 5,
samples = c("P17B", "P17L"),
cloneCall="aa")
I hope that helps and let me know if you have any other questions or suggestions.
Thanks, Nick
Hi Nick,
Thank you so much for your prompt and detailed response! I am genuinely grateful for your enthusiasm and professionalism. Your attentiveness and dedication truly stand out. Having interacted with numerous projects on GitHub, I must say that your response time is the fastest I've encountered. I deeply appreciate your contributions to the community. Wishing you all the best in your ongoing and future endeavors!
Best regards
Thanks for the kind words - the best part for me about scRepertoire has been seeing how people use the package.
Hi,
First of all, thank you for developing this package! It has been incredibly helpful.
I am pretty new to analysis scTCR-seq data. I have been following your tutorial, which seems to focus on calculations for individual samples. I am interested in extending these analyses to the subgroup level. Could you provide some guidance on how to perform basic analyses for subgroups, such as assessing clonal diversity, analyzing overlaps, etc.?
Thank you for your time and assistance!
Best regards