ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
301 stars 49 forks source link

Error in highlighing clonotypes #162

Closed MaryGoAround closed 2 years ago

MaryGoAround commented 2 years ago

Hi

I want to highlight important clonotypes, but I do not know how to find the most important clonotype to highlight

I tried one randomely

seurat <- highlightClonotypes(seurat, 
              cloneCall= "aa", 
              sequence = c("CAVNGGSQGNLIF_CSAEREDTDTQYF"))
DimPlot(seurat, group.by = "highlight") + 
  theme(plot.title = element_blank())

I get this error

Error: Must request at least one colour from a hue palette.

How I can have a list of the most important clones to plot?

ncborcherding commented 2 years ago

Hey MaryGoAround,

Thanks for reaching out.

Did highlightClonotypes() make a column in your metadata with the selected clone? Also what version of scRepertoire are you using?

In terms of your error - that is an issue from Seurat itself. There are several similar issues reported on their github repo (see here, which might help you diagnose and solve the issue.

Nick

MaryGoAround commented 2 years ago

Thank you I get this

> highlightClonotypes(seurat)
 An object of class Seurat 
 19381 features across 10861 samples within 1 assay 
 Active assay: RNA (19381 features, 2000 variable features)
 2 dimensional reductions calculated: pca, umap
 Warning messages:
         1: In if (x %in% c("CTnt", "CTgene", "CTaa", "CTstrict")) { :
                         the condition has length > 1 and only the first element will be used
                 2: In if (x == "gene" | x == "genes") { :
                                 the condition has length > 1 and only the first element will be used
ncborcherding commented 2 years ago

What does the following look like:

head(seurat@meta.data)

MaryGoAround commented 2 years ago

scRepertoire_1.5.3

 > head(seurat@meta.data)

 sample orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 seurat_clusters RNA_snn_res.0.4            barcode
 AAACCTGCAACAACCT-1     p1     pbmc3k       1113          575   3.593890               4               4               4               <NA>
         AAACCTGCAAGCCGTC-1     p1     pbmc3k       7510         2433   6.631158               1               1               1               <NA>
         AAACCTGCAGGACGTA-1     p1     pbmc3k       2461          876   5.891914               0               0               0 AAACCTGCAGGACGTA-1
 AAACCTGGTACGACCC-1     p1     pbmc3k       2284          968   4.290718               0               0               0 AAACCTGGTACGACCC-1
 AAACCTGGTAGAGTGC-1     p1     pbmc3k       2962         1091   4.017556               2               2               2 AAACCTGGTAGAGTGC-1
 AAACCTGGTATGCTTG-1     p1     pbmc3k       2853         1094   2.804066               0               0               0 AAACCTGGTATGCTTG-1
 CTgene
 AAACCTGCAACAACCT-1                                                              <NA>
         AAACCTGCAAGCCGTC-1                                                              <NA>
         AAACCTGCAGGACGTA-1                     TRAV13-1.TRAJ48.TRAC_TRBV5-1.NA.TRBJ1-5.TRBC1
 AAACCTGGTACGACCC-1                       TRAV35.TRAJ58.TRAC_TRBV5-5.NA.TRBJ2-7.TRBC2
 AAACCTGGTAGAGTGC-1                       TRAV30.TRAJ45.TRAC_TRBV5-1.NA.TRBJ2-1.TRBC2
 AAACCTGGTATGCTTG-1 TRAV12-1.TRAJ43.TRAC;TRAV8-4.TRAJ10.TRAC_TRBV4-2.NA.TRBJ2-3.TRBC2
 CTnt
 AAACCTGCAACAACCT-1                                                                                                                                <NA>
         AAACCTGCAAGCCGTC-1                                                                                                                                <NA>
         AAACCTGCAGGACGTA-1                                            TGTGCAGCTCCTTACGTGGGGTTTGGAAATGAGAAATTAACCTTT_TGCGCCAGCAGCTTAGCCGGTATCAATCAGCCCCAGCATTTT
 AAACCTGGTACGACCC-1                                            TGTGCTGGGCAGCCTTCAGGAGCCAGTGGCTCTAGGTTGACCTTT_TGTGCCAGCAGCTTAACAGGGAAGTCCTACGAGCAGTACTTC
 AAACCTGGTAGAGTGC-1                                               TGCGGCACAGCTCCAGGAGGAGGTGCTGACGGACTCACCTTT_TGCGCCAGCAGCGGTGGGGGTTCCTACAATGAGCAGTTCTTC
 AAACCTGGTATGCTTG-1 TGTGTGGTGATGCCCTACAATAACAATGACATGCGCTTT;TGTGCTGTGAGTACATTCACGGGAGGAGGAAACAAACTCACCTTT_TGTGCCAGCAGCGTAGGGCGGGGGGACACAGATACGCAGTATTTT
 CTaa
 AAACCTGCAACAACCT-1                                          <NA>
         AAACCTGCAAGCCGTC-1                                          <NA>
         AAACCTGCAGGACGTA-1                CAAPYVGFGNEKLTF_CASSLAGINQPQHF
 AAACCTGGTACGACCC-1                CAGQPSGASGSRLTF_CASSLTGKSYEQYF
 AAACCTGGTAGAGTGC-1                 CGTAPGGGADGLTF_CASSGGGSYNEQFF
 AAACCTGGTATGCTTG-1 CVVMPYNNNDMRF;CAVSTFTGGGNKLTF_CASSVGRGDTDTQYF
 CTstrict
 AAACCTGCAACAACCT-1                                                                                                                                                                                                  <NA>
         AAACCTGCAAGCCGTC-1                                                                                                                                                                                                  <NA>
         AAACCTGCAGGACGTA-1                                                                TRAV13-1.TRAJ48.TRAC_TGTGCAGCTCCTTACGTGGGGTTTGGAAATGAGAAATTAACCTTT_TRBV5-1.NA.TRBJ1-5.TRBC1_TGCGCCAGCAGCTTAGCCGGTATCAATCAGCCCCAGCATTTT
 AAACCTGGTACGACCC-1                                                                  TRAV35.TRAJ58.TRAC_TGTGCTGGGCAGCCTTCAGGAGCCAGTGGCTCTAGGTTGACCTTT_TRBV5-5.NA.TRBJ2-7.TRBC2_TGTGCCAGCAGCTTAACAGGGAAGTCCTACGAGCAGTACTTC
 AAACCTGGTAGAGTGC-1                                                                     TRAV30.TRAJ45.TRAC_TGCGGCACAGCTCCAGGAGGAGGTGCTGACGGACTCACCTTT_TRBV5-1.NA.TRBJ2-1.TRBC2_TGCGCCAGCAGCGGTGGGGGTTCCTACAATGAGCAGTTCTTC
 AAACCTGGTATGCTTG-1 TRAV12-1.TRAJ43.TRAC;TRAV8-4.TRAJ10.TRAC_TGTGTGGTGATGCCCTACAATAACAATGACATGCGCTTT;TGTGCTGTGAGTACATTCACGGGAGGAGGAAACAAACTCACCTTT_TRBV4-2.NA.TRBJ2-3.TRBC2_TGTGCCAGCAGCGTAGGGCGGGGGGACACAGATACGCAGTATTTT
 Frequency cloneType highlight
 AAACCTGCAACAACCT-1           NA      <NA>        NA
 AAACCTGCAAGCCGTC-1           NA      <NA>        NA
 AAACCTGCAGGACGTA-1 0.0004608295      <NA>        NA
 AAACCTGGTACGACCC-1 0.0004608295      <NA>        NA
 AAACCTGGTAGAGTGC-1 0.0004608295      <NA>        NA
 AAACCTGGTATGCTTG-1 0.0004608295      <NA>        NA
 > 

For instance I tried this clone and I got this plot

 seurat <- highlightClonotypes(seurat, 
                               cloneCall= "aa", 
                               sequence = c("CAAPYVGFGNEKLTF_CASSLAGINQPQHF"))
 DimPlot(seurat, group.by = "highlight") + 
         theme(plot.title = element_blank())

Rplot27

ncborcherding commented 2 years ago

Hey MaryGoAround,

It looks like highlightClonotypes() is adding the column to your seurat object. And in your test case with the code:

seurat <- highlightClonotypes(seurat, 
                               cloneCall= "aa", 
                               sequence = c("CAAPYVGFGNEKLTF_CASSLAGINQPQHF"))
 DimPlot(seurat, group.by = "highlight") + 
         theme(plot.title = element_blank())

You are generating a DimPlot effectively - how many copies of the "CAAPYVGFGNEKLTF_CASSLAGINQPQHF" do you have in your data?

Nick

MaryGoAround commented 2 years ago

how many copies of the "CAAPYVGFGNEKLTF_CASSLAGINQPQHF" do you have in your data?

I do not know this

How I could discover this please?

My main goal is identifying main clones in my data

ncborcherding commented 2 years ago

Apologies should have included some code to help out - I think this should work

length(which(seurat@meta.data$CTaa == "CAAPYVGFGNEKLTF_CASSLAGINQPQHF"))
MaryGoAround commented 2 years ago

Thank you so much

> length(which(seurat@meta.data$CTaa == "CAAPYVGFGNEKLTF_CASSLAGINQPQHF"))
 [1] 1
 >

Sorry, is there any piece of code to print a list of clones with the highest copy number?

ncborcherding commented 2 years ago

Hey MaryGoAround,

Ok that makes sense - the DimPlot() is working but we are essentially looking for 1 red dot (which may be obscured by the grey dots)

The easiest way to look at relative frequency is to use the meta data of your Seurat object and look at the column "Frequency".

meta <- seurat@meta.data

Then you can just view the meta object in Rstudio

MaryGoAround commented 2 years ago

Thank you so much

So I should look at columns 11 and 13 , right? CTaa and Frequency ?

> meta <- seurat@meta.data
 > View(meta)
 > colnames(meta)
 [1] "sample"          "orig.ident"      "nCount_RNA"      "nFeature_RNA"    "percent.mt"      "RNA_snn_res.0.5" "seurat_clusters"
 [8] "RNA_snn_res.0.4" "barcode"         "CTgene"          "CTnt"            "CTaa"            "CTstrict"        "Frequency"      
 [15] "cloneType"       "highlight"      
 >

I have tried this for the frequency of 0.5 and I got two dots

But both belong to B cells highlighted in CD4 T and Monocyte clusters :(

seurat <- highlightClonotypes(seurat, 
                               cloneCall= "aa", 
                               sequence = c("CARSVDKTVGTGPSYYGMAVW_CQQLDTYPWTF","CARVSHFSAFDPW_CQSYDSSLSGPYVF"))
 DimPlot(seurat, group.by = "highlight",pt.size = 1) + 
         theme(plot.title = element_blank()) 

Rplot01

ncborcherding commented 2 years ago

Hey MaryGoAround,

Yes columns 11 and 13.

Um based on your results and the frequency you are quoting - I am thinking there is an issue in the number of TCRs in your seurat object. Default coimbineExpression() will add frequency based on the proportion of the sample, so if you are getting 0.5 and plotting 2 TCRs, do you only have 2 TCRs in your seurat object? I am able to see a single, red dot in the most recent picture down in the lower right cluster.

MaryGoAround commented 2 years ago

Thank you

I just sorted frequency column and copied pasted CTaa with the frequency of 0.5 :(

I got worry about my TCR number

I have attached 10x report here

Screenshot 2022-05-12 at 16 06 28

Screenshot 2022-05-12 at 16 07 09

In the report I see frequencies of 0.71 :(

I do not know what is going on here

From 10x report when I tried two top clones by

seurat <- highlightClonotypes(seurat, 
                               cloneCall= "aa", 
                               sequence = c("CALAQDNYGQNFVF_CASRDYNEQFF","CAMREGGGGYSTLTF_CASSTSGQGDYEQYF"))
 DimPlot(seurat, group.by = "highlight",pt.size = 1) + 
         theme(plot.title = element_blank()) 

I got this plot Rplot02

The frequency of these two is almost 0.005069124

So I am not sure how 10x selected the tops (which criteria) and how your software does

ncborcherding commented 2 years ago

Hey MaryGoAround,

So it does look like you are getting TCR attached to your seurat object. 0.005069 sounds like a reasonable value unless you are expecting a clonal or malignant process.

Proportion (stored under Frequency in the meta data of the seurat object) for scRepertoire is calculated using the number of each clone divided by the total size of the repertoire. The repertoire itself is defined by the group.by parameter in combineExpression(), as a default, each sequencing run is used.

I am going to close this issue as we have gotten a little far from the original technical issue. Please feel free to keep posting or email me if you have further questions.

MaryGoAround commented 2 years ago

Thank you so much for bearing with me

I have attached cellranger report of ten top clones

Screenshot 2022-05-13 at 12 17 49

For instance, the report says that there are 23 clones of CALAQDNYGQNFVF_CASRDYNEQFF

By your software I get this figure

compareClonotypes(combined, numbers = 10, samples = c("sample1", "sample2"), 
                    cloneCall="aa", graph = "alluvial")

Picture 1

But by this plot CALAQDNYGQNFVF_CASRDYNEQFF does not seem to be the most top clone anymore

Please can you give me a hand how I can have a list of truely top clones?

Thank you once more

ncborcherding commented 2 years ago

The clone numbers after combining to a seurat object are going to be heavily dependent on the cells that passed the quality control steps for the gene expression. It is common to have a 5-15% difference between the TCR barcodes and Seurat object depending on your filtering criteria and recovery from the sequencing runs itself.

MaryGoAround commented 1 year ago

Sorry, I am right that scRepertoire can not visualize TRUST4 data comes from 10x BCR kit?

> S1 <- read.delim("BCR1.tsv")
> S2 <- read.delim("BCR2.tsv")
> contig_list <- list(S1, S2)
> combined <- combineTRUST4(contig_list,samples = c("bcr1","bcr2"),removeNA=TRUE)
> quantContig(combined, cloneCall="gene+nt", scale = TRUE)

Rplot01


bcr=read.delim("BCR1.tsv")
> head(bcr[1:2,1:4])
X.barcode cell_type chain1
1 ATGAGCCACACGATGT         B      *
  2 CTACTTTGTTCAACCA         B      *
  chain2
1 IGLV1-40*01,*,IGLJ3*02,*,TGCCAGTCCTTTGACAACAGCCTGAGTGGTCCGGTGTTT,CQSFDNSLSGPVF,1.00,ATGAGCCACACGATGT_28117,94.29,0
2            IGKV3-11*01,*,IGKJ3*01,*,TGTCAGCAGCGTAGCAACTGGTTCACTTTC,CQQRSNWFTF,1.97,CTACTTTGTTCAACCA_53325,100.00,0
>

Thanks for any idea

ncborcherding commented 1 year ago

Hey MaryGoRound,

I think the first step troubleshooting is making sure things got loaded properly (thanks for the head()) call. It looks like your TRUST4 data is comma separated - try:

bcr=read.csv("BCR1.tsv")

Nick

MaryGoAround commented 1 year ago

Thanks a lot for answering me

Please have a look at what I have done so far

I am using tsv file

> bcr=read.delim("BCR1.tsv")
> S1 <- read.delim("BCR1.tsv")
> S2 <- read.delim("BCR2.tsv")
> contig_list <- list(S1, S2)
> combined <- combineTRUST4(contig_list,samples = c("bcr1","bcr2"),removeNA=TRUE)
Warning messages:
  1: In if (cells == "T-AB") { :
      the condition has length > 1 and only the first element will be used
    2: In if (cells == "T-AB") { :
        the condition has length > 1 and only the first element will be used
      3: In if (cells == "B") { :
          the condition has length > 1 and only the first element will be used
        4: In if (cells == "T-AB") { :
            the condition has length > 1 and only the first element will be used
          5: In if (cells != "B") { :
              the condition has length > 1 and only the first element will be used
            6: In if (cells != "B") { :
                the condition has length > 1 and only the first element will be used
              7: In df[[i]]$cell_type == cells :
                longer object length is not a multiple of shorter object length
              8: In df[[i]]$cell_type == cells :
                longer object length is not a multiple of shorter object length
   > quantContig(combined, cloneCall="gene+nt", scale = TRUE)
   >

Which gave me an empty plot again

ncborcherding commented 1 year ago

Hey MaryGoAround,

I think you are loading the data incorrectly - instead of S1 <- read.delim("BCR1.tsv") try S1 <- read.csv("BCR1.tsv")

Nick

MaryGoAround commented 1 year ago

Thank you very much

"BCR1.tsv" is the exact output of TRUST4 I tried with read.csv but gives error

>  S1 <- read.csv("BCR1.tsv")
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names
>  S2 <- read.csv("BCR2.tsv")
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names