ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
311 stars 54 forks source link

Clonaldiversity plot with seurat object not same as combineTCR #439

Closed bapoorva closed 2 weeks ago

bapoorva commented 2 weeks ago

Hi,

First of all, thanks for the great package. However, I do have some questions and issues and I hope you can help me with it.

I have TCRSeq and single cell RNASeq data from 12 samples (3 Controls, 6 from diseased condition A(Sepsis) and 3 from disease condition B (non-sepsis)) and I want to look at diversity a few different ways. I followed the tutorial and ran the basic functions

vdj= paste0(files,"/outs/per_sample_outs/",files,"/vdj_t/filtered_contig_annotations.csv",sep="")
contig.list=list()
names(contig.list)=samples

#load data
for (i in 1:length(samples)){
 contig.list[[samples[i]]]=read.csv(paste0(files[i],"/outs/per_sample_outs/",files[i],"/vdj_t/filtered_contig_annotations.csv",sep=""))
}

for(i in seq_along(contig.list)) {
  contig.list[[i]]$barcode <- paste0(contig.list[[i]]$barcode, "_", i)
}
combined.TCR <- combineTCR(contig.list, 
                           samples = samples,
                           removeNA = FALSE, 
                           removeMulti = FALSE, 
                           filterMulti = FALSE)
combined.TCR <- addVariable(combined.TCR, 
                            variable.name = "Group", 
                            variables = c("Sepsis","Sepsis","Non-sepsis","Sepsis","Sepsis","Non-sepsis","Healthy","Sepsis","Non-sepsis","Sepsis","Healthy","Healthy"))

scrna <- readRDS("Tcellsubset.RDS")
scrna@meta.data$sample=scrna@meta.data$orig.ident

#make sure rownames of seurat metadata and tcr data match
scrna=RenameCells(scrna, new.names = paste0(scrna@meta.data$orig.ident,"_",rownames(scrna@meta.data),sep=""))

#Combine Expression
scrna <- combineExpression(combined.TCR, scrna, cloneCall="gene", group.by = "sample", proportion = TRUE)

So far so good. For the diversity, I want to look at the diversity indices at sample level, by condition /group and by celltype. To specifically look at diversity by celltype, I'd need to use the seurat object. When I run Clonaldiversity using all the same arguments but the combinedTCR as input in one and seurat object as input in the other, I get different results. Even when I export the table, i get the shannon indices only from 4 out of 12 samples. Why is that so?

clonalDiversity(scrna, cloneCall = "gene",metrics = "shannon",x.axis="sample") +clonalDiversity(combined.TCR, cloneCall = "gene",metrics = "shannon",x.axis="sample")

Screenshot 2024-11-06 at 10 26 02 AM

This seems to be a problem with the seurat object in general. I wanted to see differences in diversity index between control and diseased in specific cell groups

scrna@meta.data$Group=factor(scrna@meta.data$Group,levels = unique(scrna@meta.data$Group))
scrna@meta.data$celltype_condition=paste0(scrna@meta.data$celltype,scrna@meta.data$Group,sep="_")
scrna@meta.data$celltype_condition=factor(scrna@meta.data$celltype_condition,levels = unique(scrna@meta.data$celltype_condition))
clonalDiversity(scrna, cloneCall = "gene",group.by  = "celltype_condition",metrics = "shannon",x.axis="Group")

Screenshot 2024-11-06 at 10 38 58 AM In this case, I'm getting a numeric x-axis despite making the group a factor and it is missing all the non-sepsis factors.

Finally, is there a way to subset the data ? For instance, I am only interested in the CD8 populations and not the others. If I subset the single cell data by celltype and look at diversity, the indices get recalculated and they look different. But all I want is the same figure with just selected groups. Is it possible to do that ?

ncborcherding commented 2 weeks ago

Hey @bapoorva,

Thanks for reaching out.

So far so good. For the diversity, I want to look at the diversity indices at sample level, by condition /group and by celltype. To specifically look at diversity by celltype, I'd need to use the seurat object. When I run Clonaldiversity using all the same arguments but the combinedTCR as input in one and seurat object as input in the other, I get different results. Even when I export the table, i get the shannon indices only from 4 out of 12 samples. Why is that so?

You will get different results for 2 major reasons -

  1. The overlap of clones single-cell object and the output of combineTCR() is not perfect. Typically ~85% of clones map onto the single-cell object after combineExpression() and this will alter the diversity
  2. clonalDiversity() uses bootstapping so there will always be small differences in the resulting values

In the context of your sample difference, my guess is that you are not getting all of your clones attached during the combineExpression() step. I would check your meta data to check this and ensure you are matching your barcodes.

In this case, I'm getting a numeric x-axis despite making the group a factor and it is missing all the non-sepsis factors.

This is hard to troubleshoot as I do not know what your meta data and variables look like.

Finally, is there a way to subset the data ? For instance, I am only interested in the CD8 populations and not the others. If I subset the single cell data by celltype and look at diversity, the indices get recalculated and they look different. But all I want is the same figure with just selected groups. Is it possible to do that ?

Subsetting should not affect the total magnitude and trend of the indices, but maybe the slightly the end value (again due to boot strapping). There is no subsetting feature within the functions themselves, might be a good future feature.

Hope that helps and let me know if you have any other questions.

Nick