Closed HaiqiaoSun closed 3 weeks ago
Hey @HaiqiaoSun
Thanks for reaching out.
Please make a reproducible example for this issue so I know what you are experiencing.
Also with the example, please make sure to include the output of sessionInfo().
Thanks a lot, Nick
Thank you for your prompt response. Here is the data and example I used.
https://gist.github.com/HaiqiaoSun/698bc2d3e14950feb085fa19a6d56077
> library(scRepertoire)
> csv1 <- read.csv("E:/R/filtered_contig_annotations.csv", stringsAsFactors = F)
> combined <- combineBCR(csv1, samples = c("sp_exp"),
+ ID = c(NULL))
> head(combined[[1]])
barcode sample IGH cdr3_aa1
1 sp_exp_ATCCACGCCGCAGTCCAGTA sp_exp IGHV3-33.IGHD3-3.IGHJ6.IGHM CARDRQVLRFLEWLLPYYYYGMDVW
2 sp_exp_AGTATCTAACCACTTGTAAG sp_exp IGHV3-7.NA.IGHJ4.IGHM CARDGGDDGYW
3 sp_exp_AACTGGCGAGCAATAACACT sp_exp IGHV3-21.IGHD3-10.IGHJ4.IGHM CARIPPVTMVRGVIKFFDYW
4 sp_exp_GACAGCGAATGGAGTTAAGG sp_exp IGHV3-53.NA.IGHJ4.IGHM CARDSYSSHYW
5 sp_exp_TTGTCGAAGGCTCATACGCT sp_exp IGHV4-34.NA.IGHJ6.IGHM CASGIKQQLAYYYYGMDVW
6 sp_exp_TAGCGTTCGTAACGATTCAA sp_exp <NA> <NA>
cdr3_nt1 IGLC cdr3_aa2
1 TGTGCGAGAGATCGTCAAGTATTACGATTTTTGGAGTGGTTATTACCCTACTACTACTACGGTATGGACGTCTGG IGKV2-24.IGKJ2.IGKC CTQATQSPYTF
2 TGTGCGAGAGATGGCGGTGACGACGGGTACTGG IGKV2-24.IGKJ2.IGKC CTQATQFPYTF
3 TGTGCGAGAATACCTCCGGTTACTATGGTTCGGGGAGTTATTAAATTCTTTGACTACTGG IGKV2-24.IGKJ2.IGKC CTQATQFPYTF
4 TGTGCGAGAGATTCGTATAGCAGCCATTACTGG IGKV2-24.IGKJ1.IGKC CTQATQFPWTF
5 TGTGCGAGTGGAATTAAGCAGCAGCTGGCCTACTACTACTACGGTATGGACGTCTGG IGKV2-24.IGKJ4.IGKC CTQATQFPHTF
6 <NA> IGKV2-24.IGKJ5.IGKC CTQATQFPHF
cdr3_nt2 CTgene
1 TGCACGCAAGCTACACAATCCCCGTACACTTTT IGHV3-33.IGHD3-3.IGHJ6.IGHM_IGKV2-24.IGKJ2.IGKC
2 TGCACGCAAGCTACACAATTTCCGTACACTTTT IGHV3-7.NA.IGHJ4.IGHM_IGKV2-24.IGKJ2.IGKC
3 TGCACGCAAGCTACACAATTTCCGTACACTTTT IGHV3-21.IGHD3-10.IGHJ4.IGHM_IGKV2-24.IGKJ2.IGKC
4 TGCACGCAAGCTACACAATTTCCGTGGACGTTC IGHV3-53.NA.IGHJ4.IGHM_IGKV2-24.IGKJ1.IGKC
5 TGCACGCAAGCTACACAATTTCCTCACACTTTC IGHV4-34.NA.IGHJ6.IGHM_IGKV2-24.IGKJ4.IGKC
6 TGCACGCAAGCTACACAATTTCCTCACTTC NA_IGKV2-24.IGKJ5.IGKC
CTnt
1 TGTGCGAGAGATCGTCAAGTATTACGATTTTTGGAGTGGTTATTACCCTACTACTACTACGGTATGGACGTCTGG_TGCACGCAAGCTACACAATCCCCGTACACTTTT
2 TGTGCGAGAGATGGCGGTGACGACGGGTACTGG_TGCACGCAAGCTACACAATTTCCGTACACTTTT
3 TGTGCGAGAATACCTCCGGTTACTATGGTTCGGGGAGTTATTAAATTCTTTGACTACTGG_TGCACGCAAGCTACACAATTTCCGTACACTTTT
4 TGTGCGAGAGATTCGTATAGCAGCCATTACTGG_TGCACGCAAGCTACACAATTTCCGTGGACGTTC
5 TGTGCGAGTGGAATTAAGCAGCAGCTGGCCTACTACTACTACGGTATGGACGTCTGG_TGCACGCAAGCTACACAATTTCCTCACACTTTC
6 NA_TGCACGCAAGCTACACAATTTCCTCACTTC
CTaa CTstrict
1 CARDRQVLRFLEWLLPYYYYGMDVW_CTQATQSPYTF IGH:Cluster.32.IGHV3-33_IGLC:Cluster.14.IGKV2-24
2 CARDGGDDGYW_CTQATQFPYTF IGH.403.IGHV3-7_IGLC:Cluster.14.IGKV2-24
3 CARIPPVTMVRGVIKFFDYW_CTQATQFPYTF IGH.33.IGHV3-21_IGLC:Cluster.14.IGKV2-24
4 CARDSYSSHYW_CTQATQFPWTF IGH.1202.IGHV3-53_IGLC:Cluster.14.IGKV2-24
5 CASGIKQQLAYYYYGMDVW_CTQATQFPHTF IGH:Cluster.56.IGHV4-34_IGLC:Cluster.14.IGKV2-24
6 NA_CTQATQFPHF NA.NA_IGLC:Cluster.14.IGKV2-24
> #igraph
> igraph.object <- clonalCluster(combined,
+ chain = "IGH",
+ sequence = "aa",
+ threshold = 0.85,
+ samples = NULL,
+ exportGraph = TRUE)
错误于clonalCluster(combined, chain = "IGH", sequence = "aa", threshold = 0.85, :
No clusters detected with current parameters.`
sessionInfo() R version 4.4.1 (2024-06-14 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8
time zone: Asia/Shanghai tzcode source: internal
attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggraph_2.2.1 scater_1.32.1 scuttle_1.14.0 ggsci_3.2.0
[5] scRepertoire_2.0.7 plot1cell_0.0.0.9000 ComplexHeatmap_2.20.0 wordcloud_2.6
[9] simplifyEnrichment_1.14.1 GEOquery_2.72.0 rlang_1.1.4 cowplot_1.1.3
[13] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.28.1 AnnotationFilter_1.28.0 GenomicFeatures_1.56.0
[17] AnnotationDbi_1.66.0 loomR_0.2.0 itertools_0.1-3 iterators_1.0.14
[21] R6_2.5.1 hdf5r_1.3.11 data.table_1.16.2 DoubletFinder_2.0.4
[25] ComplexUpset_1.3.3 purrr_1.0.2 ggbeeswarm_0.7.2 reshape2_1.4.4
[29] biomaRt_2.60.1 RColorBrewer_1.1-3 progress_1.2.3 scales_1.3.0
[33] MASS_7.3-61 ggh4x_0.2.8 circlize_0.4.16 plotly_4.10.4
[37] ggrastr_1.0.2 shiny_1.9.1 Matrix_1.7-1 SeuratWrappers_0.3.0
[41] msigdbr_7.5.1 SCPA_1.6.2 Nebulosa_1.14.0 ggplot2_3.5.1
[45] viridis_0.6.5 viridisLite_0.4.2 patchwork_1.3.0 Seurat_5.1.0
[49] SeuratObject_5.0.2 sp_2.1-4 dplyr_1.1.4 monocle3_1.2.9
[53] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
[57] IRanges_2.38.1 S4Vectors_0.42.1 MatrixGenerics_1.16.0 matrixStats_1.4.1
[61] Biobase_2.64.0 BiocGenerics_0.50.0
loaded via a namespace (and not attached):
[1] igraph_2.1.1 hash_2.2.6.3 ica_1.0-3 Formula_1.2-5
[5] zlibbioc_1.50.0 tidyselect_1.2.1 bit_4.5.0 doParallel_1.0.17
[9] clue_0.3-65 tm_0.7-14 lattice_0.22-6 rjson_0.2.23
[13] evmix_2.12 blob_1.2.4 stringr_1.5.1 S4Arrays_1.4.1
[17] parallel_4.4.1 png_0.1-8 cli_3.6.3 ProtGenerics_1.36.0
[21] goftest_1.2-3 textshaping_0.4.0 BiocIO_1.14.0 BiocNeighbors_1.22.0
[25] stringdist_0.9.12 uwot_0.2.2 curl_5.2.3 mime_0.12
[29] evaluate_1.0.1 leiden_0.4.3.1 stringi_1.8.4 backports_1.5.0
[33] gsl_2.1-8 XML_3.99-0.17 httpuv_1.6.15 magrittr_2.0.3
[37] rappdirs_0.3.3 splines_4.4.1 mclust_6.1.1 org.Hs.eg.db_3.19.1
[41] sctransform_0.4.1 DBI_1.2.3 terra_1.7-83 jquerylib_0.1.4
[45] withr_3.0.2 systemfonts_1.1.0 lmtest_0.9-40 tidygraph_1.3.1
[49] rtracklayer_1.64.0 BiocManager_1.30.25 htmlwidgets_1.6.4 fs_1.6.4
[53] ggrepel_0.9.6 labeling_0.4.3 SparseArray_1.4.8 reticulate_1.39.0
[57] zoo_1.8-12 XVector_0.44.0 knitr_1.48 UCSC.utils_1.0.0
[61] foreach_1.5.2 fansi_1.0.6 quantreg_5.99 R.oo_1.26.0
[65] RSpectra_0.16-2 irlba_2.3.5.1 fastDummies_1.7.4 lazyeval_0.2.2
[69] yaml_2.3.10 survival_3.7-0 scattermore_1.2 crayon_1.5.3
[73] RcppAnnoy_0.0.22 tidyr_1.3.1 progressr_0.14.0 tweenr_2.0.3
[77] later_1.3.2 ggridges_0.5.6 codetools_0.2-20 base64enc_0.1-3
[81] GlobalOptions_0.1.2 KEGGREST_1.44.1 Rtsne_0.17 shape_1.4.6.1
[85] limma_3.60.6 Rsamtools_2.20.0 filelock_1.0.3 foreign_0.8-87
[89] pkgconfig_2.0.3 xml2_1.3.6 spatstat.univar_3.0-1 GenomicAlignments_1.40.0
[93] iNEXT_3.0.1 evd_2.3-7.1 spatstat.sparse_3.1-0 xtable_1.8-4
[97] highr_0.11 plyr_1.8.9 httr_1.4.7 tools_4.4.1
[101] globals_0.16.3 beeswarm_0.4.0 htmlTable_2.4.3 checkmate_2.3.2
[105] nlme_3.1-166 dbplyr_2.5.0 MatrixModels_0.5-3 assertthat_0.2.1
[109] lme4_1.1-35.5 digest_0.6.37 farver_2.1.2 tzdb_0.4.0
[113] ks_1.14.3 yulab.utils_0.1.7 rpart_4.1.23 glue_1.8.0
[117] cachem_1.1.0 BiocFileCache_2.12.0 nbpMatching_1.5.6 polyclip_1.10-7
[121] Hmisc_5.1-3 generics_0.1.3 proxyC_0.4.1 Biostrings_2.72.1
[125] ggalluvial_0.12.5 mvtnorm_1.3-1 parallelly_1.38.0 multicross_2.1.0
[129] statmod_1.5.0 RcppHNSW_0.6.0 ragg_1.3.3 crossmatch_1.4-0
[133] ScaledMatrix_1.12.0 minqa_1.2.8 pbapply_1.7-2 httr2_1.0.5
[137] spam_2.11-0 utf8_1.2.4 graphlayouts_1.2.0 gridExtra_2.3
[141] GenomeInfoDbData_1.2.12 R.utils_2.12.3 RCurl_1.98-1.16 memoise_2.0.1
[145] rmarkdown_2.28 R.methodsS3_1.8.2 future_1.34.0 RANN_2.6.2
[149] Cairo_1.6-2 spatstat.data_3.1-2 rstudioapi_0.17.1 cluster_2.1.6
[153] spatstat.utils_3.1-0 hms_1.1.3 fitdistrplus_1.2-1 munsell_0.5.1
[157] colorspace_2.1-1 ggdendro_0.2.0 DelayedMatrixStats_1.26.0 sparseMatrixStats_1.16.0
[161] truncdist_1.0-2 dotCall64_1.2 ggforce_0.4.2 xfun_0.48
[165] remotes_2.5.0 abind_1.4-8 GOSemSim_2.30.2 tibble_3.2.1
[169] readr_2.1.5 bitops_1.0-9 promises_1.3.0 RSQLite_2.3.7
[173] leidenbase_0.1.31 DelayedArray_0.30.1 proxy_0.4-27 GO.db_3.19.1
[177] compiler_4.4.1 prettyunits_1.2.0 boot_1.3-31 beachmat_2.20.0
[181] SparseM_1.84-2 listenv_0.9.1 Rcpp_1.0.13 BiocSingular_1.20.0
[185] tensor_1.5 BiocParallel_1.38.0 cubature_2.1.1 babelgene_22.9
[189] spatstat.random_3.3-2 fastmap_1.2.0 vipor_0.4.7 ROCR_1.0-11
[193] rsvd_1.0.5 nnet_7.3-19 gtable_0.3.6 KernSmooth_2.23-24
[197] miniUI_0.1.1.1 deldir_2.0-4 htmltools_0.5.8.1 bit64_4.5.2
[201] spatstat.explore_3.3-3 lifecycle_1.0.4 NLP_0.3-0 nloptr_2.1.1
[205] restfulr_0.0.15 sass_0.4.9 vctrs_0.6.5 isoband_0.2.7
[209] rsconnect_1.3.2 VGAM_1.1-12 slam_0.1-54 spatstat.geom_3.3-3
[213] future.apply_1.11.3 pracma_2.4.4 bslib_0.8.0 pillar_1.9.0
[217] jsonlite_1.8.9 GetoptLong_1.0.5
I found that setting the exportGraph parameter of the clonalCluster function to F allows the function to run and generate the igraph.object, but it does not produce the clusters.
To generate an igraph similar to the one which Nick did in https://www.bioconductor.org/packages/devel/bioc/vignettes/scRepertoire/inst/doc/vignette.html#1_Introduction, I tried using the cluster generated by combineBCR function to draw the clone connection plot, which produces a similar effect, though not as visually appealing as Nick's example. Below is the code I used. Note that my dataset does not include multiple groups, and I did not adjust the node size based on the cluster size.
#igraph BCR connection plot
library(scRepertoire)
library(igraph)
library(ggraph)
library(tidyverse)
# Load data
csv1 <- read.csv("E:/R/filtered_contig_annotations.csv", stringsAsFactors = F)
combined <- combineBCR(csv1, samples = c("sp_exp"), ID = NULL,
threshold = 0.95)
# Create clonal graph object
igraph.object <- clonalCluster(combined,
chain = "IGH",
sequence = "aa",
threshold = 0.95,
samples = NULL,
exportGraph = F)
# Convert graph object to a dataframe
df <- as.data.frame(igraph.object)
if (!is.data.frame(df)) {
df <- as.data.frame(df)
}
# Extract IGH:Cluster part from sp_exp.CTstrict column to define each node's cluster
df <- df %>%
dplyr::mutate(IGH_cluster = sapply(strsplit(as.character(sp_exp.CTstrict), "_"), function(x) x[1]))
# Filter rows containing IGH cluster information and remove NA values
df_igh <- df %>% dplyr::filter(!is.na(IGH_cluster))
# Keep only clusters with at least 3 nodes
df_filtered <- df_igh %>%
group_by(IGH_cluster) %>%
dplyr::filter(n() >= 3) %>% # Keep only clusters with at least 3 nodes
ungroup()
# Create an edge list, connecting samples within the same cluster
edge_list <- df_filtered %>%
group_by(IGH_cluster) %>%
summarise(pairs = list(combn(sp_exp.barcode, 2, simplify = FALSE))) %>%
unnest_longer(pairs) %>%
mutate(from = sapply(pairs, `[`, 1),
to = sapply(pairs, `[`, 2)) %>%
dplyr::select(from, to)
# Remove possible NA values
edge_list <- edge_list %>% dplyr::filter(!is.na(from) & !is.na(to))
# Create graph object, only keep nodes that are in the filtered edge list
graph <- graph_from_data_frame(edge_list, directed = TRUE)
# Update graph node attributes
V(graph)$cluster <- df_filtered$IGH_cluster[match(V(graph)$name, df_filtered$sp_exp.barcode)]
V(graph)$size <- 1.5 # Node size
# Assign a unique color to each cluster
library(viridis)
unique_clusters <- unique(V(graph)$cluster)
color_map <- setNames(viridis(length(unique_clusters)), unique_clusters)
col_samples <- color_map[V(graph)$cluster]
# Use MDS layout for graph plotting
layout <- layout_with_mds(graph)
plot(
graph,
layout = layout,
vertex.size = V(graph)$size,
vertex.label = NA,
edge.arrow.size = 0.1,
vertex.color = col_samples,
vertex.frame.width = 0.2
)
@HaiqiaoSun
Thank you so much for the excellent followup and rundown of the problem. I was able to find the issue - evaluation statement during exportGraph = TRUE assumed that there would be more than 1 sample - so feeding in a single sequencing run broke it.
We now combine the entire graph before we evaluate: https://github.com/ncborcherding/scRepertoire/blob/f9a4a123dce119158f3073962dedd5daf703b9ed/R/clonalCluster.R#L134
Here is a rundown of the fixed function:
BCR <- read.csv("https://www.borch.dev/uploads/contigs/b_contigs.csv")
combined <- combineBCR(BCR,
samples = "Patient1",
threshold = 0.85)
igraph.object <- clonalCluster(combined,
chain = "IGH",
sequence = "aa",
threshold = 0.85,
samples = NULL,
exportGraph = TRUE)
plot(igraph.object)
Fix is up in the dev branch - @Qile0317 and I are working on some minor changes before we get this version shipped and pushed to bioconductor/main branch of github
Thanks again, Nick
@ncborcherding Thank you for your response and for the continuous updates to scRepertoire. The updated clonalCluster function runs perfectly on my single sequencing dataset. Not only does it facilitate research on T cells, but scRepertoire also provides convenience for B cell researchers like me. Thanks again.
Haiqiao
My combined <- combineBCR(csv1, samples = c("sp_exp"), ID = c(NULL)) successfully created Cluster in the CTstrict column with the default threshold of 0.85, which seems perfectly correct.
combined[["sp_exp"]][["CTstrict"]] [1] "IGH.1376.IGHV3-43_IGLC:Cluster.12.IGKV2-24" "IGH.1376.IGHV3-43_IGLC:Cluster.12.IGKV2-24"
[3] "IGH.1505.IGHV1-18_IGLC:Cluster.12.IGKV2-24" "IGH.1741.IGHV3-64D_IGLC:Cluster.12.IGKV2-24"
However, when using the clonalCluster function as follows:
igraph.object <- clonalCluster(combined, chain = "IGH", sequence = "aa", threshold = 0.85, samples = NULL, exportGraph = TRUE) it returns an error: clonalCluster(combined, chain = "IGH", sequence = "aa", threshold = 0.85, : No clusters detected with current parameters.
Do you have any ideas about this? Thank you.