ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
297 stars 48 forks source link

subscript out of bounds error #159

Closed beginner984 closed 2 years ago

beginner984 commented 2 years ago

Hi Likely I have a lot of NAs in my TCR

quantContig(combined, cloneCall="gene+nt", scale = T)

Rplot

When I try to remove my NAs I get this error

> combined <- combineTCR(contig_list, 
+                        samples = c("p1", "p2"),removeNA=TRUE)
> quantContig(combined, cloneCall="gene+nt", scale = T)
Error in df[[i]] : subscript out of bounds
>

Thank you for any idea

ncborcherding commented 2 years ago

Hey Beginner984,

Thanks for reaching out - would you mind giving me the version of scRepertoire you are using?

Also what is the structure of your contig_list? How many samples, etc?

I ask because your quantContig(combined, cloneCall="gene+nt", scale = T) should not have NA samples unless there might be 3 contigs?

What is the output of:

lapply(contig_list, dim)
lapply(combined, dim)

In terms of the issue you are posting:

> combined <- combineTCR(contig_list, 
+                        samples = c("p1", "p2"),removeNA=TRUE)
> quantContig(combined, cloneCall="gene+nt", scale = T)
Error in df[[i]] : subscript out of bounds

That is an error in either reading the combinedTCR() in the quantContig() function or from having an issue with removing all the data during filtering of NAs (this shouldn't happen as there should be a warning message)

beginner984 commented 2 years ago

Thank you so much for replying me

I have 5' GEX + V(J)D for two healthy PBMCs

I have attached my TCR and BCR for one sample here

b.csv t.csv

> lapply(contig_list, dim)
[[1]]
[1] 5318   31

[[2]]
[1] 5773   31

> lapply(combined, dim)
$p1
[1] 2231   13

$p2
[1] 2306   13

>
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.2.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sp_1.4-7           SeuratObject_4.1.0 Seurat_4.1.0       scRepertoire_1.5.3
[5] ggplot2_3.3.6     

Thanks for any help

ncborcherding commented 2 years ago

Hey Beginner984,

Everything looks correct there - I tested your code with the t.csv sample and functioned correctly with v 1.5.3.

What was your code to load and form the contig_list?

Thanks for bearing with me.

Nick

beginner984 commented 2 years ago

I should be thankful

S1 <- read.csv("/Users/angel/Downloads/p1/t.csv")
S2 <- read.csv("/Users/angel/Downloads/p2/t.csv")

contig_list <- list(S1, S2)

combined <- combineTCR(contig_list,samples = c("p1", "p2"),removeNA=TRUE)
beginner984 commented 2 years ago

Sorry I have made a Seurat object from GEX but I get error in combining that with TCR

> unique(pbmc@meta.data$sample)
[1] "PN0340_0001_control" "PN0340_0002_control"
> seurat <- combineExpression(combined, pbmc, 
                              +                             cloneCall="gene", group.by = "sample", proportion = FALSE, 
                              +                             cloneTypes=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))
Error in data.frame(sc[[]], slot(sc, "active.ident")) : 
  arguments imply differing number of rows: 11977, 10825
> 
> pbmc
An object of class Seurat 
19381 features across 10825 samples within 1 assay 
Active assay: RNA (19381 features, 2000 variable features)
2 dimensional reductions calculated: pca, umap
>

Thank you any idea

ncborcherding commented 2 years ago

Hey Beginner984,

S1 <- read.csv("/Users/angel/Downloads/p1/t.csv") S2 <- read.csv("/Users/angel/Downloads/p2/t.csv") contig_list <- list(S1, S2) combined <- combineTCR(contig_list,samples = c("p1", "p2"),removeNA=TRUE)

Would you mind emailing me you p2 t.csv sample - I would like to match exactly what you are doing in order to debug.

Error in data.frame(sc[[]], slot(sc, "active.ident")) : arguments imply differing number of rows: 11977, 10825

Sorry this seems like a very strange error message stemming from pulling the meta data from the single-cell object. What it is saying is the number of rows (or number of cells) in seurat@meta.data does not match seurat@active.ident for some reason - are there NAs in the active Identity that you are currently using?

Nick

beginner984 commented 2 years ago

Sorry for some reason I get error to email you

I have attached filtered_contig_annotations for TCR for my both samples as p1 and p2

Thanks for any help

p1.csv p2.csv

beginner984 commented 2 years ago

Sorry I removed NAs in Seurat metadata but this time I get this error

> combineExpression(combined, pbmc, 
                      +                   cloneCall="gene", group.by = "sample", proportion = FALSE, 
                      +                   cloneTypes=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))
Error in `$<-.data.frame`(`*tmp*`, "cloneType", value = NA) : 
  replacement has 1 row, data has 0
>

This is the link to my full R image including Seurat object and S1, S2, combined

https://www.dropbox.com/s/qv9b1djonv9fzs2/pbmc.RData?dl=0

Thanks for any help

ncborcherding commented 2 years ago

Hey Beginner984,

Thanks for providing everything!! I really appreciate that.

I am not sure what is going on when it comes to the in quantContig issue (specifically how you got a NA column), here is what I get with your code:

S1 <- read.csv("p1.csv")
S2 <- read.csv("p2.csv")
contig.list <- list(S1, S2)
combined <- combineTCR(contig.list,samples = c("p1", "p2"),removeNA=TRUE)
quantContig(combined, cloneCall="gene+nt", scale = TRUE)

image

One thing I have noticed with users is sometimes when using contig_list, there might be issues with overwriting your data with the built-in scRepertoire data. I would suggest probably avoiding storing your data as contig_list - here is what happens if I take your code and accidentally use the built-in data:

combined <- combineTCR(contig_list,samples = c("p1", "p2"),removeNA=TRUE)
quantContig(combined, cloneCall="gene+nt", scale = TRUE)

Error in df[[i]] : subscript out of bounds

In terms of the error during the combining of the Seurat and TCR data:

combineExpression(combined, pbmc, 
                      cloneCall="gene", group.by = "sample", proportion = FALSE, 
                      cloneTypes=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))

Error in $<-.data.frame(*tmp*, "cloneType", value = NA) : replacement has 1 row, data has 0

This is due to issues with matching the cell barcodes between the Seurat Object and the combined file.

contig.list <- list(S1, S2)
combined <- combineTCR(contig.list,samples = c("p1", "p2"),removeNA=TRUE)
combined$p1$barcode[1:10] #This is looking at the barcodes in the first combinedTCR sample

[1] "p1_AAACCTGCAGGACGTA-1" "p1_AAACCTGGTACGACCC-1" "p1_AAACCTGGTAGAGTGC-1" "p1_AAACCTGGTATGCTTG-1" "p1_AAACCTGGTCTAGCGC-1" [6] "p1_AAACCTGGTGGTACAG-1" "p1_AAACCTGTCCTAAGTG-1" "p1_AAACCTGTCCTACAGA-1" "p1_AAACCTGTCTGTTTGT-1" "p1_AAACGGGAGTTGTAGA-1"

rownames(pbmc[[]])[1:10] #Barcodes in the Seurat Object

"AAACCTGCAACAACCT-1" "AAACCTGCAAGCCGTC-1" "AAACCTGCAGGACGTA-1" "AAACCTGGTACGACCC-1" "AAACCTGGTAGAGTGC-1" [6] "AAACCTGGTATGCTTG-1" "AAACCTGGTCTAGCGC-1" "AAACCTGGTGGTACAG-1" "AAACCTGTCAAAGTAG-1" "AAACCTGTCCTAAGTG-1"

beginner984 commented 2 years ago

Thank you so much

When I matched the barcodes in Seurat and combined I get this error


> DimPlot(pbmc, reduction = "umap")
Error: Must request at least one colour from a hue palette.
> 
ncborcherding commented 2 years ago

Hey Beginner984,

Glad you got the barcode match figured out.

In terms of your error - that is an issue from Seurat itself. There are several similar issues reported on their github repo (see here, which might help you diagnose and solve the issue.

Hope that helps and thanks again for being so forthcoming with the data sharing, it really quickens the troubleshooting and I appreciate it!

Nick

beginner984 commented 2 years ago

Thank you so much, I had two healthy PBMC using 10x (5' + TCR +BCR)

I have analyzed my data like this

Rplot18 Rplot07 Rplot11 Rplot15 Rplot17

Looking at these plots are you seeing any abnormality in my data please?

My main concern is if I have a good TCR results but by these plots I am nor sure how I could know if my data is good or bad

Thanks for any point

ncborcherding commented 2 years ago

Hey the data itself looks good - there are always going to be TCRs recovered in non-Tcell clusters (I've noticed it, particularly in mono/macrophage cells). Your data looks well integrated. My one piece of advice would be to check the small cell clusters that are somewhat independent for doublets and mitochondria reads -

167892178-490d9dd0-5e25-43b3-9630-6f19f62951c6

I've noticed in my work these small clusters have a tendency to contain artifact. That being said, they may also be rare cell types.

Nick

beginner984 commented 2 years ago

Hello @ncborcherding

I have analysed my TCR + 5' GEX with your software

I got a few questions please

In the below plot, I have two samples s1 and s2 but what does the unique clone in Y axis means? For instance TRBV18 is common between two samples? Picture 1

ncborcherding commented 2 years ago

Hey beginner984,

I am unfamiliar with what you are plotting? How did you get the graph from scRepertoire?

Nick

beginner984 commented 2 years ago

Sorry I had attached a wrong image

Picture 1

In Y axis I could not figure out what unique clones means?

Please can you help me with that?

Thank you so much in advance

ncborcherding commented 2 years ago

Hey beginner984,

The y-axis in this instance is:

(number of unique clonotypes / total number of cells with TCR data)*100

So in essence it is the percent of unique clonotypes across the whole sample repertoire.

Nick

beginner984 commented 2 years ago

Thank you much

In this figure

Picture 1

Thank you so much for answering me

I see for a given TRB some times we have more than one TRA (I do not know if biologically this makes sense ) like image with image and image

How I know which of these two TRA is an actual pair? and which one is an artefact?

ncborcherding commented 2 years ago

The presence of 2 alpha chains in the field of TCR is controversial, but in a given pathological state/model there seems to range from 5-20% dual-TCRA. Some of these likely code for functional dual alphas (this is a pretty good review on the topic). Having looked at a lot of 10x data, there are also up to ~5% dual TCRB - this does not make sense to me due to allelic exclusion and I am not sure if anyone has convincingly shown dual TCRB.

If you don't like the dual TCRA, you can filter for the highest expressing chain using:

combineTCR(..., filterMulti = TRUE)
beginner984 commented 2 years ago

Thank you very much

How one can know if the CDR3 sequences in the above plot are novel or known?

ncborcherding commented 2 years ago

That is going to vary - the best resource is probably is the AIRR-Seq group which has a giant repository of sequences. Another great resource is VDJdb - however there is no centralized database so determining novelty is difficult.

beginner984 commented 2 years ago

Thanks a lot to be this helpful

Might be out of questioning regarding your software, but I need your thoughts please

With a Library Preparation Kit : 10X Genomics Next GEM Single Cell 5' v2 (Dual Index) targeting 8000 cells per PBMC sample and when in total for GEX:TCR:BCR, we would need 300M read pairs per sample.

Please can you give me a clue how likely we can capture a T cell clone? Rare or expanded

Thanks in advance for any guidance

ncborcherding commented 2 years ago

Hey beginner984,

This is going to vary by disease process or model, with clonal stimulus being the key. I am in the tumor field and that estimate sounds correct for my interests - but I am not focused on rare clones. I've had discussions with people in the neuro autoimmune field and they feel that they need to target 1e5-1e6 T cells to detect possibly pathogenic clones (via the chromium X system). You might check the literature in the field and see what the level of targeting has been.

beginner984 commented 2 years ago

Thank you very much

That was extremely helpful

I am in cancer sciences too, comparing people who remain cancer free for years after treatment and people who can live with cancer for long time

beginner984 commented 1 year ago

Hi

I have PBMC 5' GEX from 10x on which I run TRUST4

I have attached my TRUST4 outputs here

I got error by running your software

Any help please?

S1.csv S2.csv

> S1 <- read.csv("t_0001_GEX_S3_L001_R2_001_barcode_report.csv")

> S2 <- read.csv("t_0002_GEX_S3_L001_R2_001_barcode_report.csv")
> contig_list <- list(S1, S2)
> combined <- combineTCR(contig_list,samples = c("p1", "p2"),removeNA=TRUE)
Error in `mutate()`:
! Problem while computing `TCR1 = ifelse(...)`.
Caused by error in `ifelse()`:
! object 'chain' not found
Run `rlang::last_error()` to see where the error occurred.

Thanks for any help

ncborcherding commented 1 year ago

You'll need to either use combineTRUST4() to generate a list of clonotype by barcode or check out loadContigs() in the dev version to make the TRUST4 output compatible with combineTCR()

beginner984 commented 1 year ago

Thanks a lot I got this error

> combined <- combineTRUST4(contig_list,samples = c("p1", "p2"),removeNA=TRUE)
Error in checkList(df) : could not find function "checkList"
> 
ncborcherding commented 1 year ago

What version of scRepertoire are you currently using? What is the output of sessionInfo()?

beginner984 commented 1 year ago

Thanks a lot

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.2.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] scRepertoire_1.7.2 ggplot2_3.3.6     

loaded via a namespace (and not attached):
  [1] cubature_2.0.4.4            VGAM_1.1-7                 
  [3] colorspace_2.0-3            ellipsis_0.3.2             
  [5] rprojroot_2.0.3             evd_2.3-6.1                
  [7] XVector_0.34.0              GenomicRanges_1.46.1       
  [9] fs_1.5.2                    rstudioapi_0.13            
 [11] listenv_0.8.0               farver_2.1.1               
 [13] remotes_2.4.2               graphlayouts_0.8.0         
 [15] gsl_2.1-7.1                 ggrepel_0.9.1              
 [17] fansi_1.0.3                 codetools_0.2-18           
 [19] splines_4.1.2               doParallel_1.0.17          
 [21] cachem_1.0.6                polyclip_1.10-0            
 [23] pkgload_1.3.0               evmix_2.12                 
 [25] cluster_2.1.3               rgeos_0.5-9                
 [27] ggforce_0.3.3               compiler_4.1.2             
 [29] assertthat_0.2.1            SeuratObject_4.1.0         
 [31] Matrix_1.4-1                fastmap_1.1.0              
 [33] cli_3.3.0                   tweenr_1.0.2               
 [35] prettyunits_1.1.1           tools_4.1.2                
 [37] igraph_1.3.2                gtable_0.3.0               
 [39] glue_1.6.2                  GenomeInfoDbData_1.2.7     
 [41] reshape2_1.4.4              dplyr_1.0.9                
 [43] Rcpp_1.0.8.3                Biobase_2.54.0             
 [45] vctrs_0.4.1                 nlme_3.1-158               
 [47] progressr_0.10.1            iterators_1.0.14           
 [49] ggraph_2.0.6                ggalluvial_0.12.3          
 [51] stringr_1.4.0               globals_0.15.1             
 [53] ps_1.7.1                    lifecycle_1.0.1            
 [55] devtools_2.4.3              stringdist_0.9.8           
 [57] future_1.26.1               zlibbioc_1.40.0            
 [59] MASS_7.3-57                 scales_1.2.0               
 [61] tidygraph_1.2.1             MatrixGenerics_1.6.0       
 [63] parallel_4.1.2              SummarizedExperiment_1.24.0
 [65] SparseM_1.81                curl_4.3.2                 
 [67] SingleCellExperiment_1.16.0 memoise_2.0.1              
 [69] gridExtra_2.3               truncdist_1.0-2            
 [71] stringi_1.7.6               S4Vectors_0.32.4           
 [73] foreach_1.5.2               permute_0.9-7              
 [75] BiocGenerics_0.40.0         pkgbuild_1.3.1             
 [77] GenomeInfoDb_1.30.1         rlang_1.0.3                
 [79] pkgconfig_2.0.3             matrixStats_0.62.0         
 [81] bitops_1.0-7                lattice_0.20-45            
 [83] purrr_0.3.4                 tidyselect_1.1.2           
 [85] processx_3.6.1              parallelly_1.32.0          
 [87] plyr_1.8.7                  magrittr_2.0.3             
 [89] R6_2.5.1                    IRanges_2.28.0             
 [91] generics_0.1.3              DelayedArray_0.20.0        
 [93] DBI_1.1.3                   powerTCR_1.14.0            
 [95] pillar_1.7.0                withr_2.5.0                
 [97] mgcv_1.8-40                 RCurl_1.98-1.7             
 [99] sp_1.5-0                    tibble_3.1.7               
[101] future.apply_1.9.0          crayon_1.5.1               
[103] utf8_1.2.2                  viridis_0.6.2              
[105] usethis_2.1.6               grid_4.1.2                 
[107] callr_3.7.0                 vegan_2.6-2                
[109] digest_0.6.29               tidyr_1.2.0                
[111] stats4_4.1.2                munsell_0.5.0              
[113] viridisLite_0.4.0           sessioninfo_1.2.2      
ncborcherding commented 1 year ago

Hey beginner984,

Both the dev and master branch have the checkList function, so I am not sure how you are getting the error? Can you show me the code you are using to load the contigs (I am assuming with loadContigs() because you are using combineTCR())

Thanks Nick

apal6 commented 1 year ago

Hey! I wonder why is my clonotype slot empty after integration?

When I simply try to merge using barcode in my seurat object which are barcodes from the combined TCR, I get this error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 971, 7901

ncborcherding commented 1 year ago

Hey @apal6,

Could you give me your:

1) sessionInfo() 2) What specific code you are getting the error call from. 3) Any additional information on the workflow to create the combineTCR() object?

Thanks, Nick

apal6 commented 1 year ago
Screenshot 2023-04-16 at 1 58 53 PM

Hey Nick,

I was able to figure out my error. Thank you. However, when I am trying to plot the two conditions that I have by defining the group.by = "group" which have my two timepoints, I get this plot with NA. Not sure why my other time point is missing from the plot.

Code: slot(seurat, "meta.data")$cloneType <- factor(slot(seurat, "meta.data")$cloneType, levels = c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)", "Medium (5 < X <= 20)", "Small (1 < X <= 5)", "Single (0 < X <= 1)", NA)) DimPlot(seurat, group.by = "group") + scale_color_manual(values = colorblind_vector(5), na.value="grey") + theme(plot.title = element_blank())

unique(seurat@meta.data$group) [1] "Day_16" "Infusion"

I do have several NA's in my object.

Thanks, Aastha

ncborcherding commented 1 year ago

Hey Asstha,

Good to hear that you figured out the problem. Interesting problem with the NA values on DimPlot(). Have you tried to re-organize the levels of the variable group? (like the cloneType code above?) Or added the group variable using another source? The meta data will amend matching columns when using AddMetaData() or combineExpression()

What is the output of:

table(seurat@meta.data$group, useNA = "ifany")

I do not think this is a scRepertoire issue per say - but happy to help debug if you are interested. If you wouldn't mind emailing me your code or we can troubleshoot via email to keep the issue area cleared up.

Thanks, Nick

apal6 commented 1 year ago

Sure, I will email.

Code: table(seurat@meta.data$group, useNA = "ifany")

output:
`Day_16` Infusion 
    1287     8574 

Thanks, Aastha

apal6 commented 1 year ago

Hey Nick,

I have emailed my script. I also just realized that my cloneType is entirely empty. Not sure why though

beginner984 commented 1 year ago

Sorry @ncborcherding how I can tackle this problem please

> seurat <- combineExpression(list.receptors, 
                              +                            seurat, 
                              +                             cloneCall="gene", 
                              +                             proportion = TRUE)
Warning message:
  In combineExpression(list.receptors, seurat, cloneCall = "gene",  :
                         < 1% of barcodes match: Ensure the barcodes in 
                       the Seurat object match the 
                       barcodes in the combined immune receptor list from 
                       scRepertoire - most common issue is the addition of the 
                       prefixes corresponding to `samples` and 'ID' in the combineTCR/BCR() 
                       functions
                       >
TCR$HC1_C$barcode
[1] "HC1_C_AAACCTGCAGGACGTA-1" "HC1_C_AAACCTGGTACGACCC-1" "HC1_C_AAACCTGGTAGAGTGC-1" "HC1_C_AAACCTGGTATGCTTG-1"

> rownames(seurat@meta.data)
[1] "AAACCTGAGACAGGCT-1_1" "AAACCTGAGAGACTAT-1_1" "AAACCTGAGCCCAGCT-1_1" "AAACCTGAGTACGTTC-1_1" "AAACCTGCAAGTACCT-1_1"

Thanks for any help

ncborcherding commented 1 year ago

Good rundown of the solution is #69