overlap contigs table interpretation

Brawni commented 1 year ago

Hello!

Thanks for developing this great package! This is not an issue on the package but more like a question on the data so i apologize in advance. I have a dataset with 3 patient samples from tumor and pbmc. When i run the clonalOverlap function I can only see small overlaps between pairs that dont make any sense to me. My question is, can this low overlap be considered noise or suggests something is clearly wrong? Attached the heatmap. Thanks so much in advance for any comments on this!!

clonalOverlap(combined, 
              cloneCall = "strict", 
              method = "raw")

ncborcherding commented 1 year ago

Hey Bruno,

Thanks for reaching out! Agreed that this is highly suspect, unless you are matching genes/single chains, I don't think you should see that much crossover between samples (plus the lack of overlap in the paired samples). I have seen something similar before and the most likely explanation is mislabeled samples. Make sure your combineTCR() call samples and id match the order of the contig list exactly.

If that is not the case - let me know and I would be happy to troubleshoot with you. For that - sending along the code used for making 1) making the contig list, 2) combineTCR(), and 3) the output of sessionInfo() would help immensely!

Thanks again, Nick

Brawni commented 1 year ago

Hi Nick,

Thanks for your quick reply! I checked that the order is preserved so i dont think that's the problem. If you are willing to take a look at the code that would be awesome! The PBMC data was hashed so i used the previously hashed seurat object to split the contigs by sample. I'm also happy to share the data in case. Following is the code and..thanks so much for your help!


### Compare tumor and pbmc clonotypes ####

# Load seurat obj to dehash TCR contigs ####
srt_pbmc = readRDS ('srt.rds')
srt_pbmc = srt_pbmc[,srt_pbmc$batch == 'batch2']
meta_pbmc = srt_pbmc@meta.data

# Import TCR_contgis of PBMCs ####
data.path_pbmc1 = '2023-03-29-1-1/version=1/per_sample_outs/cellranger_multi_run/vdj_t/filtered_contig_annotations.csv'
data.path_pbmc2 = '2023-03-29-1-2/version=1/per_sample_outs/cellranger_multi_run/vdj_t/filtered_contig_annotations.csv'
vdj.dirs = c (data.path_pbmc1, data.path_pbmc2)

# Add hash pools info to seurat metadata ####
meta_pbmc$hash_pool = sapply (rownames(meta_pbmc), function(x) unlist(strsplit (x, '\\-'))[2])

# Strip barcodes to match TCR data ####
meta_pbmc$barcode = sapply (rownames (meta_pbmc), function(x) unlist (strsplit (x, '_'))[1])
head (meta_pbmc$barcode)
[1] "AAACCTGAGACAAAGG-1" "AAACCTGAGATAGTCA-1" "AAACCTGAGCGGCTTC-1"
[4] "AAACCTGCAAACGCGA-1" "AAACCTGCAAAGGTGC-1" "AAACCTGCAAGACGTG-1"

# Check barcodes per pool are unique ####
lapply (split (meta_pbmc$barcode,meta_pbmc$hash_pool), function(x) length(x) == length(unique(x)))
$`1_1_2`
[1] TRUE

$`1_2_2`
[1] TRUE

# Split TCR barcodes by pool map barcodes and split by sample ####
tcrL_pbmc = lapply (vdj.dirs, function(x) read.csv(x))
names (tcrL_pbmc) = unique (meta_pbmc$hash_pool)
tcrL_pbmc = lapply (names(tcrL_pbmc), function(x)
  {
  tcr_hashed = tcrL_pbmc[[x]][tcrL_pbmc[[x]]$barcode %in% meta_pbmc$barcode[meta_pbmc$hash_pool == x],]
  tcr_hashed$sampleID3 = meta_pbmc$sampleID3[match (tcr_hashed$barcode, meta_pbmc$barcode)]
  tcr_hashed
  })
tcrL_pbmc = do.call (rbind, tcrL_pbmc)
tcrL_pbmc = split (tcrL_pbmc, tcrL_pbmc$sampleID3)
names (tcrL_pbmc) = paste0(names(tcrL_pbmc) ,'_pbmc')

# Import TCR of tumors ####
data.path_p7_tumor = 'filtered_contig_annotations.csv'
data.path_p2_tumor = 'filtered_contig_annotations.csv'
data.path_p9_tumor = 'filtered_contig_annotations.csv'

vdj.dirs = c (data.path_p7_tumor,data.path_p2_tumor, data.path_p9_tumor)
tcrL_tumor = lapply (vdj.dirs, function(x) read.csv(x))
names (tcrL_tumor) = c('P7_tumor','P2_tumor','P9_tumor')

contig.list = c(tcrL_pbmc, tcrL_tumor)

combined <- combineTCR (contig_list, 
                samples = names (contig.list), 
                #ID = c("P", "P", "P", "T", "T", "T"), 
                cells ="T-AB")

clonalOverlap(combined, 
              cloneCall = "strict", 
              method = "raw")

sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS/LAPACK: /ahg/regevdata/projects/ICA_Lung/Bruno/conda/scrnatools/lib/libopenblasp-r0.3.21.so

locale:
[1] C

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] scRepertoire_1.8.0          rstatix_0.7.2
 [3] lubridate_1.9.2             forcats_1.0.0
 [5] stringr_1.5.0               purrr_1.0.1
 [7] readr_2.1.4                 tibble_3.2.1
 [9] tidyverse_2.0.0             igraph_1.4.3
[11] clusterProfiler_4.6.0       ggridges_0.5.4
[13] BiocParallel_1.32.5         viridis_0.6.3
[15] viridisLite_0.4.1           org.Mm.eg.db_3.16.0
[17] GO.db_3.16.0                AnnotationDbi_1.60.0
[19] plotly_4.10.1               scater_1.26.0
[21] scuttle_1.8.0               SingleCellExperiment_1.20.0
[23] SummarizedExperiment_1.28.0 Biobase_2.58.0
[25] GenomicRanges_1.50.0        GenomeInfoDb_1.34.9
[27] IRanges_2.32.0              S4Vectors_0.36.0
[29] BiocGenerics_0.44.0         MatrixGenerics_1.10.0
[31] matrixStats_0.63.0          harmony_0.1.0
[33] Rcpp_1.0.8.3                tidyr_1.3.0
[35] fgsea_1.24.0                ggrepel_0.9.3
[37] RColorBrewer_1.1-3          ComplexHeatmap_2.14.0
[39] patchwork_1.1.2             biomaRt_2.54.0
[41] ggpubr_0.6.0                Matrix_1.5-4.1
[43] dplyr_1.1.2                 ggtree_3.6.0
[45] gplots_3.1.3                ggplot2_3.4.2
[47] SeuratObject_4.1.3          Seurat_4.3.0

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3            SparseM_1.81
  [3] scattermore_1.1           evmix_2.12
  [5] bit64_4.0.5               irlba_2.3.5.1
  [7] DelayedArray_0.24.0       data.table_1.14.8
  [9] KEGGREST_1.38.0           RCurl_1.98-1.12
 [11] doParallel_1.0.17         generics_0.1.3
 [13] ScaledMatrix_1.6.0        cowplot_1.1.1
 [15] RSQLite_2.3.1             VGAM_1.1-8
 [17] shadowtext_0.1.2          RANN_2.6.1
 [19] future_1.32.0             tzdb_0.4.0
 [21] bit_4.0.5                 enrichplot_1.18.0
 [23] spatstat.data_3.0-1       xml2_1.3.4
 [25] httpuv_1.6.11             hms_1.1.3
 [27] promises_1.2.0.1          fansi_1.0.4
 [29] progress_1.2.2            caTools_1.18.2
 [31] dbplyr_2.3.2              DBI_1.1.3
 [33] htmlwidgets_1.6.2         powerTCR_1.18.0
 [35] spatstat.geom_3.2-1       stringdist_0.9.10
 [37] ellipsis_0.3.2            backports_1.4.1
 [39] permute_0.9-7             deldir_1.0-9
 [41] sparseMatrixStats_1.10.0  ggalluvial_0.12.5
 [43] vctrs_0.6.2               ROCR_1.0-11
 [45] abind_1.4-5               cachem_1.0.8
 [47] withr_2.5.0               ggforce_0.4.1
 [49] HDO.db_0.99.1             progressr_0.13.0
 [51] sctransform_0.3.5         vegan_2.6-4
 [53] treeio_1.22.0             prettyunits_1.1.1
 [55] goftest_1.2-3             cluster_2.1.4
 [57] DOSE_3.24.0               gsl_2.1-8
 [59] ape_5.7-1                 lazyeval_0.2.2
 [61] crayon_1.5.2              spatstat.explore_3.2-1
 [63] labeling_0.4.2            pkgconfig_2.0.3
 [65] tweenr_2.0.2              nlme_3.1-162
 [67] vipor_0.4.5               rlang_1.1.1
 [69] globals_0.16.2            lifecycle_1.0.3
 [71] miniUI_0.1.1.1            downloader_0.4
 [73] filelock_1.0.2            BiocFileCache_2.6.0
 [75] rsvd_1.0.5                polyclip_1.10-4
 [77] lmtest_0.9-40             aplot_0.1.10
 [79] carData_3.0-5             zoo_1.8-12
 [81] beeswarm_0.4.0            GlobalOptions_0.1.2
 [83] png_0.1-8                 rjson_0.2.21
 [85] bitops_1.0-7              gson_0.1.0
 [87] KernSmooth_2.23-21        Biostrings_2.66.0
 [89] blob_1.2.4                DelayedMatrixStats_1.20.0
 [91] shape_1.4.6               qvalue_2.30.0
 [93] parallelly_1.35.0         spatstat.random_3.1-5
 [95] gridGraphics_0.5-1        ggsignif_0.6.4
 [97] beachmat_2.14.0           scales_1.2.1
 [99] memoise_2.0.1             magrittr_2.0.3
[101] plyr_1.8.8                ica_1.0-3
[103] zlibbioc_1.44.0           compiler_4.2.3
[105] scatterpie_0.1.9          clue_0.3-64
[107] fitdistrplus_1.1-11       cli_3.6.1
[109] XVector_0.38.0            listenv_0.9.0
[111] pbapply_1.7-0             mgcv_1.8-42
[113] MASS_7.3-60               tidyselect_1.2.0
[115] stringi_1.7.12            GOSemSim_2.24.0
[117] BiocSingular_1.14.0       timechange_0.2.0
[119] fastmatch_1.1-3           tools_4.2.3
[121] future.apply_1.11.0       parallel_4.2.3
[123] circlize_0.4.15           foreach_1.5.2
[125] cubature_2.1.0            gridExtra_2.3
[127] farver_2.1.1              Rtsne_0.16
[129] ggraph_2.1.0              digest_0.6.31
[131] shiny_1.7.4               car_3.1-2
[133] broom_1.0.4               later_1.3.1
[135] RcppAnnoy_0.0.20          httr_1.4.6
[137] colorspace_2.1-0          XML_3.99-0.14
[139] tensor_1.5                reticulate_1.26
[141] splines_4.2.3             uwot_0.1.14
[143] yulab.utils_0.0.6         tidytree_0.4.2
[145] spatstat.utils_3.0-3      graphlayouts_1.0.0
[147] sp_1.6-0                  ggplotify_0.1.0
[149] xtable_1.8-4              truncdist_1.0-2
[151] jsonlite_1.8.4            tidygraph_1.2.3
[153] ggfun_0.0.9               R6_2.5.1
[155] pillar_1.9.0              htmltools_0.5.5
[157] mime_0.12                 glue_1.6.2
[159] fastmap_1.1.1             BiocNeighbors_1.16.0
[161] codetools_0.2-19          utf8_1.2.3
[163] lattice_0.21-8            spatstat.sparse_3.0-1
[165] evd_2.3-6.1               curl_4.3.3
[167] ggbeeswarm_0.7.2          leiden_0.4.3
[169] gtools_3.9.4              survival_3.5-5
[171] munsell_0.5.0             GetoptLong_1.0.5
[173] GenomeInfoDbData_1.2.9    iterators_1.0.14
[175] reshape2_1.4.4            gtable_0.3.3

ncborcherding commented 1 year ago

Hey Bruno,

Nice code!! I don't see any issue in the context of the code. For clonalOverlap() there is really no shuffle going on in terms of sample ID - it is counting the clones in each category. What is also kind of weird is that 2 of the 3 overlaps are occurring in the same tissue type (peripheral blood) and 1 is occurring across tissues.

If you are willing to share - more than happy to take a look at the hashed data (email: ncborch@gmail.com). If not, I'll keep brainstorming.

Nick

Brawni commented 1 year ago

Hey Nick,

Thanks! yes weird indeed, the PBMC data was hashed so i thought the overlap could be caused by noise in the dehashing step but the tumor samples have been sequenced separately. I'll send you the data right away and thanks so much for looking into this, really appreciate!

ncborcherding commented 1 year ago

Hey Bruno,

Thanks for sending me the data - again I don't see anything suspicious from your code or data. One thing you might want to check is the relative number of clones/clonotypes that are attaching to the single-cell object (or relative number of barcodes per sample that batch the GEX and TCR data). I think this would be a good sanity check - it could be as simple as:

seuratObj <- combineExpression(combined, seuratObj)
occupiedscRepertoire(seuratObj, x.axis = "orig.ident")

Nick

Brawni commented 1 year ago

Followed your suggestions and nothing popped up only to find back in the code that I had passed sample labels from another object called contig.list and not contig_list. Dang.

combined <- combineTCR (contig_list, 
                samples = names (contig.list), 
                #ID = c("P", "P", "P", "T", "T", "T"), 
                cells ="T-AB")

Now overlap looks interesting for sample P9 i would say

Thanks a lot for your support!

ncborcherding commented 1 year ago

Oh great to know - thanks so much for following up and being incredibly responsive!

Nick

ncborcherding / scRepertoire

overlap contigs table interpretation #241