Closed Brawni closed 1 year ago
Hey Bruno,
Thanks for reaching out! Agreed that this is highly suspect, unless you are matching genes/single chains, I don't think you should see that much crossover between samples (plus the lack of overlap in the paired samples). I have seen something similar before and the most likely explanation is mislabeled samples. Make sure your combineTCR()
call samples and id match the order of the contig list exactly.
If that is not the case - let me know and I would be happy to troubleshoot with you. For that - sending along the code used for making 1) making the contig list, 2) combineTCR()
, and 3) the output of sessionInfo()
would help immensely!
Thanks again, Nick
Hi Nick,
Thanks for your quick reply! I checked that the order is preserved so i dont think that's the problem. If you are willing to take a look at the code that would be awesome! The PBMC data was hashed so i used the previously hashed seurat object to split the contigs by sample. I'm also happy to share the data in case. Following is the code and..thanks so much for your help!
### Compare tumor and pbmc clonotypes ####
# Load seurat obj to dehash TCR contigs ####
srt_pbmc = readRDS ('srt.rds')
srt_pbmc = srt_pbmc[,srt_pbmc$batch == 'batch2']
meta_pbmc = srt_pbmc@meta.data
# Import TCR_contgis of PBMCs ####
data.path_pbmc1 = '2023-03-29-1-1/version=1/per_sample_outs/cellranger_multi_run/vdj_t/filtered_contig_annotations.csv'
data.path_pbmc2 = '2023-03-29-1-2/version=1/per_sample_outs/cellranger_multi_run/vdj_t/filtered_contig_annotations.csv'
vdj.dirs = c (data.path_pbmc1, data.path_pbmc2)
# Add hash pools info to seurat metadata ####
meta_pbmc$hash_pool = sapply (rownames(meta_pbmc), function(x) unlist(strsplit (x, '\\-'))[2])
# Strip barcodes to match TCR data ####
meta_pbmc$barcode = sapply (rownames (meta_pbmc), function(x) unlist (strsplit (x, '_'))[1])
head (meta_pbmc$barcode)
[1] "AAACCTGAGACAAAGG-1" "AAACCTGAGATAGTCA-1" "AAACCTGAGCGGCTTC-1"
[4] "AAACCTGCAAACGCGA-1" "AAACCTGCAAAGGTGC-1" "AAACCTGCAAGACGTG-1"
# Check barcodes per pool are unique ####
lapply (split (meta_pbmc$barcode,meta_pbmc$hash_pool), function(x) length(x) == length(unique(x)))
$`1_1_2`
[1] TRUE
$`1_2_2`
[1] TRUE
# Split TCR barcodes by pool map barcodes and split by sample ####
tcrL_pbmc = lapply (vdj.dirs, function(x) read.csv(x))
names (tcrL_pbmc) = unique (meta_pbmc$hash_pool)
tcrL_pbmc = lapply (names(tcrL_pbmc), function(x)
{
tcr_hashed = tcrL_pbmc[[x]][tcrL_pbmc[[x]]$barcode %in% meta_pbmc$barcode[meta_pbmc$hash_pool == x],]
tcr_hashed$sampleID3 = meta_pbmc$sampleID3[match (tcr_hashed$barcode, meta_pbmc$barcode)]
tcr_hashed
})
tcrL_pbmc = do.call (rbind, tcrL_pbmc)
tcrL_pbmc = split (tcrL_pbmc, tcrL_pbmc$sampleID3)
names (tcrL_pbmc) = paste0(names(tcrL_pbmc) ,'_pbmc')
# Import TCR of tumors ####
data.path_p7_tumor = 'filtered_contig_annotations.csv'
data.path_p2_tumor = 'filtered_contig_annotations.csv'
data.path_p9_tumor = 'filtered_contig_annotations.csv'
vdj.dirs = c (data.path_p7_tumor,data.path_p2_tumor, data.path_p9_tumor)
tcrL_tumor = lapply (vdj.dirs, function(x) read.csv(x))
names (tcrL_tumor) = c('P7_tumor','P2_tumor','P9_tumor')
contig.list = c(tcrL_pbmc, tcrL_tumor)
combined <- combineTCR (contig_list,
samples = names (contig.list),
#ID = c("P", "P", "P", "T", "T", "T"),
cells ="T-AB")
clonalOverlap(combined,
cloneCall = "strict",
method = "raw")
sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
Matrix products: default
BLAS/LAPACK: /ahg/regevdata/projects/ICA_Lung/Bruno/conda/scrnatools/lib/libopenblasp-r0.3.21.so
locale:
[1] C
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] scRepertoire_1.8.0 rstatix_0.7.2
[3] lubridate_1.9.2 forcats_1.0.0
[5] stringr_1.5.0 purrr_1.0.1
[7] readr_2.1.4 tibble_3.2.1
[9] tidyverse_2.0.0 igraph_1.4.3
[11] clusterProfiler_4.6.0 ggridges_0.5.4
[13] BiocParallel_1.32.5 viridis_0.6.3
[15] viridisLite_0.4.1 org.Mm.eg.db_3.16.0
[17] GO.db_3.16.0 AnnotationDbi_1.60.0
[19] plotly_4.10.1 scater_1.26.0
[21] scuttle_1.8.0 SingleCellExperiment_1.20.0
[23] SummarizedExperiment_1.28.0 Biobase_2.58.0
[25] GenomicRanges_1.50.0 GenomeInfoDb_1.34.9
[27] IRanges_2.32.0 S4Vectors_0.36.0
[29] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
[31] matrixStats_0.63.0 harmony_0.1.0
[33] Rcpp_1.0.8.3 tidyr_1.3.0
[35] fgsea_1.24.0 ggrepel_0.9.3
[37] RColorBrewer_1.1-3 ComplexHeatmap_2.14.0
[39] patchwork_1.1.2 biomaRt_2.54.0
[41] ggpubr_0.6.0 Matrix_1.5-4.1
[43] dplyr_1.1.2 ggtree_3.6.0
[45] gplots_3.1.3 ggplot2_3.4.2
[47] SeuratObject_4.1.3 Seurat_4.3.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 SparseM_1.81
[3] scattermore_1.1 evmix_2.12
[5] bit64_4.0.5 irlba_2.3.5.1
[7] DelayedArray_0.24.0 data.table_1.14.8
[9] KEGGREST_1.38.0 RCurl_1.98-1.12
[11] doParallel_1.0.17 generics_0.1.3
[13] ScaledMatrix_1.6.0 cowplot_1.1.1
[15] RSQLite_2.3.1 VGAM_1.1-8
[17] shadowtext_0.1.2 RANN_2.6.1
[19] future_1.32.0 tzdb_0.4.0
[21] bit_4.0.5 enrichplot_1.18.0
[23] spatstat.data_3.0-1 xml2_1.3.4
[25] httpuv_1.6.11 hms_1.1.3
[27] promises_1.2.0.1 fansi_1.0.4
[29] progress_1.2.2 caTools_1.18.2
[31] dbplyr_2.3.2 DBI_1.1.3
[33] htmlwidgets_1.6.2 powerTCR_1.18.0
[35] spatstat.geom_3.2-1 stringdist_0.9.10
[37] ellipsis_0.3.2 backports_1.4.1
[39] permute_0.9-7 deldir_1.0-9
[41] sparseMatrixStats_1.10.0 ggalluvial_0.12.5
[43] vctrs_0.6.2 ROCR_1.0-11
[45] abind_1.4-5 cachem_1.0.8
[47] withr_2.5.0 ggforce_0.4.1
[49] HDO.db_0.99.1 progressr_0.13.0
[51] sctransform_0.3.5 vegan_2.6-4
[53] treeio_1.22.0 prettyunits_1.1.1
[55] goftest_1.2-3 cluster_2.1.4
[57] DOSE_3.24.0 gsl_2.1-8
[59] ape_5.7-1 lazyeval_0.2.2
[61] crayon_1.5.2 spatstat.explore_3.2-1
[63] labeling_0.4.2 pkgconfig_2.0.3
[65] tweenr_2.0.2 nlme_3.1-162
[67] vipor_0.4.5 rlang_1.1.1
[69] globals_0.16.2 lifecycle_1.0.3
[71] miniUI_0.1.1.1 downloader_0.4
[73] filelock_1.0.2 BiocFileCache_2.6.0
[75] rsvd_1.0.5 polyclip_1.10-4
[77] lmtest_0.9-40 aplot_0.1.10
[79] carData_3.0-5 zoo_1.8-12
[81] beeswarm_0.4.0 GlobalOptions_0.1.2
[83] png_0.1-8 rjson_0.2.21
[85] bitops_1.0-7 gson_0.1.0
[87] KernSmooth_2.23-21 Biostrings_2.66.0
[89] blob_1.2.4 DelayedMatrixStats_1.20.0
[91] shape_1.4.6 qvalue_2.30.0
[93] parallelly_1.35.0 spatstat.random_3.1-5
[95] gridGraphics_0.5-1 ggsignif_0.6.4
[97] beachmat_2.14.0 scales_1.2.1
[99] memoise_2.0.1 magrittr_2.0.3
[101] plyr_1.8.8 ica_1.0-3
[103] zlibbioc_1.44.0 compiler_4.2.3
[105] scatterpie_0.1.9 clue_0.3-64
[107] fitdistrplus_1.1-11 cli_3.6.1
[109] XVector_0.38.0 listenv_0.9.0
[111] pbapply_1.7-0 mgcv_1.8-42
[113] MASS_7.3-60 tidyselect_1.2.0
[115] stringi_1.7.12 GOSemSim_2.24.0
[117] BiocSingular_1.14.0 timechange_0.2.0
[119] fastmatch_1.1-3 tools_4.2.3
[121] future.apply_1.11.0 parallel_4.2.3
[123] circlize_0.4.15 foreach_1.5.2
[125] cubature_2.1.0 gridExtra_2.3
[127] farver_2.1.1 Rtsne_0.16
[129] ggraph_2.1.0 digest_0.6.31
[131] shiny_1.7.4 car_3.1-2
[133] broom_1.0.4 later_1.3.1
[135] RcppAnnoy_0.0.20 httr_1.4.6
[137] colorspace_2.1-0 XML_3.99-0.14
[139] tensor_1.5 reticulate_1.26
[141] splines_4.2.3 uwot_0.1.14
[143] yulab.utils_0.0.6 tidytree_0.4.2
[145] spatstat.utils_3.0-3 graphlayouts_1.0.0
[147] sp_1.6-0 ggplotify_0.1.0
[149] xtable_1.8-4 truncdist_1.0-2
[151] jsonlite_1.8.4 tidygraph_1.2.3
[153] ggfun_0.0.9 R6_2.5.1
[155] pillar_1.9.0 htmltools_0.5.5
[157] mime_0.12 glue_1.6.2
[159] fastmap_1.1.1 BiocNeighbors_1.16.0
[161] codetools_0.2-19 utf8_1.2.3
[163] lattice_0.21-8 spatstat.sparse_3.0-1
[165] evd_2.3-6.1 curl_4.3.3
[167] ggbeeswarm_0.7.2 leiden_0.4.3
[169] gtools_3.9.4 survival_3.5-5
[171] munsell_0.5.0 GetoptLong_1.0.5
[173] GenomeInfoDbData_1.2.9 iterators_1.0.14
[175] reshape2_1.4.4 gtable_0.3.3
Hey Bruno,
Nice code!! I don't see any issue in the context of the code. For clonalOverlap()
there is really no shuffle going on in terms of sample ID - it is counting the clones in each category. What is also kind of weird is that 2 of the 3 overlaps are occurring in the same tissue type (peripheral blood) and 1 is occurring across tissues.
If you are willing to share - more than happy to take a look at the hashed data (email: ncborch@gmail.com). If not, I'll keep brainstorming.
Nick
Hey Nick,
Thanks! yes weird indeed, the PBMC data was hashed so i thought the overlap could be caused by noise in the dehashing step but the tumor samples have been sequenced separately. I'll send you the data right away and thanks so much for looking into this, really appreciate!
Hey Bruno,
Thanks for sending me the data - again I don't see anything suspicious from your code or data. One thing you might want to check is the relative number of clones/clonotypes that are attaching to the single-cell object (or relative number of barcodes per sample that batch the GEX and TCR data). I think this would be a good sanity check - it could be as simple as:
seuratObj <- combineExpression(combined, seuratObj)
occupiedscRepertoire(seuratObj, x.axis = "orig.ident")
Nick
Followed your suggestions and nothing popped up only to find back in the code that I had passed sample labels from another object called contig.list and not contig_list. Dang.
combined <- combineTCR (contig_list,
samples = names (contig.list),
#ID = c("P", "P", "P", "T", "T", "T"),
cells ="T-AB")
Now overlap looks interesting for sample P9 i would say
Thanks a lot for your support!
Oh great to know - thanks so much for following up and being incredibly responsive!
Nick
Hello!
Thanks for developing this great package! This is not an issue on the package but more like a question on the data so i apologize in advance. I have a dataset with 3 patient samples from tumor and pbmc. When i run the
clonalOverlap
function I can only see small overlaps between pairs that dont make any sense to me. My question is, can this low overlap be considered noise or suggests something is clearly wrong? Attached the heatmap. Thanks so much in advance for any comments on this!!