Open beginner984 opened 1 year ago
It is expected to only get hundreds of TCRs from bulk RNA-seq data. This is because TCRs are not as highly expressed as many other genes, and it also depends on the infiltration level of the immune cells. You can also try TRUST4 on the bulk RNA-seq data, which I think usually is more sensitive than MiXCR.
In the single-cell data, since there is a good amount of T cells, so we are guaranteed to find thousands of T cells, and TRUST4 can reconstruct TCRs for many of them.
You can think that in single-cell data, the gene expression within a cell is not confounded by other cells, so it is easier to get the cell-specific sequences.
Based on the number, it feels like the single-cell data has higher clonality. In bulk RNA-seq data, the clonotype count is the expression value. In scRNA-seq data, the clonotype count is the number of cells bearing that clonotype, which is a more accurate estimation.
I'm not sure I get your questions, and hope this helps.
It's very similar to the command of running the 10X data, except that you don't need the --barcode option. A typical command can be:
{TRUST4_PATH}/run-trust4 -f {TRUST4_PATH}/human_IMGT+C.fa --ref {TRUST4_PATH}/human_IMGT+C.fa -1 P_0001_S19_L001_R1_001.fastq.gz -2 P_0001_S19_L001_R2_001.fastq.gz -o P_0001_S19 -t 8
Thank you very much
As I had mentioned I have had two PBMCs
on them we profiled with conventional Bulk RNA-seq
and 10X 5' scRNA-seq (GEX)
This time, I run TRUST4
on my bulk RNA-seq
and scRNA-seq
separately and tried to compared the results
I have attached some visualisation here
My interpretation is: scRNA-seq
gives more richness while bulk RNA-seq
gives more evenness (and clonality ?)
Please correct me if my interpretation is wrong
Thanks a lot in advance
Just want to confirm, is the single-cell data filtered by Seurat QC'ed cells (or other method you used)? Because of the leaked mRNAs to the empty droplets, there can be null cells with TCRs/BCRs. These cells should not be considered valid.
Based on the current results you showed, scRNA-seq has higher richness. It seems there are some super-expanded clonotypes in scRNA-seq, so it also has higher clonality (lower evenness). So your interpretation is correct.
Thanks a lot
I have used fastq files as input for TRUST4 so I have not filtered anything Im affraid🙁
Sorry I got confused here, as I googled, eveness is the same as clonality so having clonally expanded T cells in single cell means a lower eveness, that means clonality negatively correlated with eveness, right? Please correct me if I am wrong
Then I guess the current clonality from scRNA-seq might be inflated if we considered those null cells. This is because the null cells are usually contaminated by relatively highly expressed mRNAs, so if one T cell's TCR has high expression, we will see it in multiple null cells. So we need to filter the cells with too few gene expressed or other QC criteria to make sure the clonotype analysis is accurate.
Evenness is defined as the normalized Shannon Entropy, and clonality is 1-evenness. So they are perfectly negatively correlated.
Thank you very much
At the moment I have fastq/bam files from 10x 5' single cell RNA-seq (GEX)
I really do not know how to filter those null cells before feeding my fastq files into TRUST4
Any clue/intuition please to get rid of droplets
I need an accurate comparision between bulk RNA-seq and scRNA-seq
Thank you for any help
You can run Seurat to conduct standard scRNA-seq analysis. The barcodes kept in the clusters are the valid cells. I think once you have processed the data with Seurat, you can use tools like scRepertoire to compute the diversities.
Thank you very much But the question is, which input I should feed into TRUST4 to construct clonotype structure after filtering? I can not understand the relashionship between Seurat and TRUST4 here. From cellranger output, I can do Seurat on my 5' scRNA-seq, but, after that, how I could feed that into TRUST4?
You shall have the trust4_barcode_report.tsv or trust4_barcode_airr.tsv files in your output for the scRNA-seq data. These files have barcode information in them, where each barcode(cell) has the assembled TCR or BCR in it. Then you can filter the cells not retained in Seurat analysis.
Sorry I need your kindly point of view in making some conclusion from my data
I have had two healthy PBMCs
; before running my main experiments on more samples
I profiled those two samples with conventional bulk RNA-seq
, 10x 5' GEX
, 10x BCR
and 10x TCR
I then run TRUST4
on my fastq files (paired end)
And I finally visualized TRUST4 report.tsv
with Immunarch
Does my conclusion is right that with 10x TCR kit I get more clonotypes so 5' GEX can not compensate 10x TCR in getting full length CDR3 even by running TRUST4 on my data?
To be honest, I read your paper, but I do not know yet having these data how I could know if 5' GEX is enough? Do I still need dedicated TCR/BCR kits to get full length CDRs
Although likely bulk RNA-seq gives less clonotypes
Actually I do not know how to compare the results to my make a conclusion
Thanks for any help in advance
For the single-cell data, you shall run TRUST4 with "--barcode" option, and use the "trust_barcode_report.tsv" file for the analysis. Otherwise, it will treat the sample as a pseudobulk and you lose the cell information. Running as pseudobulk also makes the results incomparable.
If you are comparing GEX vs 10x TCR/BCR kit, the TCR/BCR kit definitely has higher sensitivity as it amplified the data experimentally. However, the advantage of 5' GEX is the cost. The 5' GEX costs the same as the 3' scRNA-seq data, and it also provides a decent information of immune repertoire. So in sum, if you have enough funding, 10X TCR/BCR will give you a nice coverage of immune repertoire of the good-quality cells. If the cost is a big concern, you can use 5' GEX and run TRUST4 to extract the CDR3s, where the sensitivity can be about half of the TCR/BCR kit (could be higher for certain cell types).
Thank you very much
Regarding the Seurat filtering
For 5' GEX
> list.files()
[1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"
>
# Load the PBMC dataset
pbmc.data <- Read10X(data.dir = getwd())
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 10)
So, I should use the remained barcodes in Seurat object to only filter TRUST_GEX_S6_L001_R2_001_barcode_report.tsv
Or I should also filter TRUST_TCR_S6_L001_R2_001_barcode_report.tsv
and TRUST_BCR_S6_L001_R2_001_barcode_report.tsv
files by the remained barcodes in Seurat object?
I have done like this
gex1=read.delim("TRUST_GEX_S6_L001_R2_001_barcode_report.tsv")
> dim(gex1)
[1] 2344 6
gex1$X.barcode <- paste0(gex1$X.barcode, "-1")
I then filtered TRUST4_barcode.tsv
by remained barcodes in Seurat after filtering by Seurat criteria
`
gex11=gex1[!gex1$X.barcode%in%colnames(pbmc@assays$RNA@data),]
gex11$X.barcode = substr(gex11$X.barcode,1,nchar(gex11$X.barcode)-2)
write.table(gex11,"GEX11.tsv",quote = FALSE,sep = "\t") `
> dim(gex11)
[1] 743 6
You are seeing from 2344 clonotypes, 743 remained
Am I doing right please? Shall I repeat this for BCT and TCR barcode.tsv too?
I have attached the resulted file after some manipulation here
But Immunarch can not read this anymore
> immdata <- repLoad(.path="/data/file/folder/TRUST4-master/scripts/TCR/a/")
== Step 1/3: loading repertoire files... ==
Processing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/a/" ...
-- [1/1] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/a//GEX11.tsv" -- unsupported format, skipping
== Step 2/3: checking metadata files and merging files... ==
Processing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/a/" ...
-- Metadata file not found; creating a dummy metadata...
== Step 3/3: processing paired chain data... ==
Done!
Warning messages:
1: Unknown or uninitialised column: `Sample`.
2: Unknown or uninitialised column: `Sample`.
I do not know in which part of filtering I am doing wrong that Immunarch does not recognise that anymore
Thank you once more
I think what you did is right. You shall also filter the TCR, BCR barcode report files because TCR/BCR kit also has information on the null cells.
Immuarch is for bulk data. For the single-cell data analysis, you can use tools like scRepertoire or Platypus for the analysis.
Sorry you meant I should use trust_barcode_report.tsv
for scRepertoire or Platypus as an input? Shall I use trust-barcoderep-to-10X.pl
to convert trust_barcode_report.tsv
first? Although I tried and I got error please have a look at my error
fi1d18@Server2:/TRUST4-master/scripts/TCR/scripts/scripts$ ls
AddSequenceToCDR3File.pl GetFullLengthAssembly.pl trust-cluster.py
barcoderep-expand.py README.md trust-stats.py
barcoderep-filter.py _t.csv
_b.csv trust-barcoderep-to-10X.pl
fi1d18@Server2:/TRUST4-master/scripts/TCR/scripts/scripts$ perl trust-barcoderep-to-10X.pl /data/TRUST4-master/scripts/TCR/scripts/TCR/TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_report.tsv
Use of uninitialized value $prefix in concatenation (.) or string at trust-barcoderep-to-10X.pl line 54.
Use of uninitialized value $prefix in concatenation (.) or string at trust-barcoderep-to-10X.pl line 55.
Any help please?
They surrport trust_barcode_report.tsv format I believe, so there is no need to convert it to 10X format.
The trust-barcoderep-to-10X.pl requires a second arguments to specify the output file prefix, and it outputs as "XXX_t.csv" and "XXX_b.csv"
Thank you, specifying the output file prefix, and it outputs as "XXX_t.csv" did not give error
fi1d18@Server2:/data//TRUST4-master/scripts/TCR/scripts/scripts$ perl trust-barcoderep-to-10X.pl /dataTRUST4-master/scripts/TCR/scripts/TCR/TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_report.tsv /data/TRUST4-master/scripts/TCR/scripts/TCR/TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv
But I am confused why from TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_report.tsv
input which is from TCR kit, I have BCR results too
I got these two from TCR
TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv_t.csv
and
TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv_b.csv
There might be some false positive BCRs in the TCR data, but I believe the fraction will be quite tiny and these could be from the null/ambient cells. They usually will be filtered by matching with Seurat. You can safely ignore the _b.csv file for the TCR kit file.
Thank you very much, although Immunarch was working very smooth with TRUST4_report.tsv file but likely does not work with barcode file even after conversion with the perl script (or might be I am making a mistake somewhere)
> immdata <- repLoad(.path="/data/TRUST4-master/scripts/TCR/scripts/TCR/a/")
== Step 1/3: loading repertoire files... ==
Processing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/scripts/TCR/a/" ...
-- [1/2] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/scripts/TCR/a//TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv_b.csv" -- -- [2/2] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/scripts/TCR/a//TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv_t.csv" --
== Step 2/3: checking metadata files and merging files... ==
Processing "/data/Continuum/Angel/TRUST4-master/scripts/TCR/scripts/TCR/a/" ...
-- Metadata file not found; creating a dummy metadata...
== Step 3/3: processing paired chain data... ==
Done!
Warning messages:
1: Unknown or uninitialised column: `Sample`.
2: Unknown or uninitialised column: `Sample`.
> list.files()
[1] "TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv_b.csv" "TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t.csv_t.csv"
> immdata$data
named list()
>
As you are seeing immunarch object is empty :(
Immuarch is not for single-cell analysis, so it does not read any single-cell format. You can use other methods specific for single-cell immune repertoire analysis, such as scRepertoire. If you just want to get some simple statistics values, you can use the "trust-stats.py" in the scripts folder. You can specify the format through the "-f" option there.
Thank you very much for bearing with users
I think those statistics would be very useful
I tried this
fi1d18@Server2:/data/TRUST4-master/scripts/TCR/scripts/scripts$ python trust-stats.py -f csv /data/TRUST4-master/scripts/TCR/scripts/TCR/TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_report.tsv /data/TRUST4-master/scripts/TCR/scripts/TCR/TRUST_PN0340_0002_TCR_S4_L001_R2_001_barcode_t1.csv
usage: trust-stats.py [-h] -r REPFILE [-f FORMAT] [--ntaa NTAA]
trust-stats.py: error: argument -r is required
You need to use -r option to specify the report file. If it is trust_barcode_report file you need to add "-f TRUST4_barcode_report". Note that this script will NOT filter the barcodes with Seurat object. It is still better to use scRepertoire for more rigorous analysis on the single-cell data.
Thanks a million
After filtering by Seurat object and conversion with the perl script, I have taken intersection of clonotypes from TCR/BCR kit and 5' GEX, like below
I looked at VDJdb
database, from those 65 unique clonotypes specific to 5' GE
X, only one was found in VDJdb' related to
B2M, and from those 229 clonotypes common between 5' GEX and TCR kit, 95 found in
VDJdb` database, does this mean if the rest are novel TCRs or just false positive/ambient/null cells please?
Thank you so much for sharing your ideas with us
From my experience, those clonotypes only found in GEX are still likely to be real in PBMC. On example is that TRUST4 can find gdT cells and the 10X kit will miss those (I think). On tumor samples, there will be some false positives, especially in BCR space due to the overexpression of BCR mRNAs from plasma cells. In that case, you can run the "barcoderep-filter.py" in the scripts folder. The VDJdb annotation is very sparse, so most of the TCRs from a study will be novel TCRs. You can also check the 4368 clonotypes from TCR1.
It seems the sensitivity for the TCRs from GEX part is surprisingly low, are you using TCR alpha/beta pairing as the definition of clonotype? Since the capture rate in GEX for TCR/BCR is low, I usually only use one chain to represent the clonotype, TRB/IGH.
I used cdr3 sequences from GEX and TCR for Venn diagram creation Am I right, please? Thank you very much
Yes, this is right. Are they from TRB and IGH, respectively?
Thanks a lot For these Venn diagram I have used TRB/IGH only
Although samples are healthy PBMCs (not tumour), for BCR filtering I used
$ python3 barcoderep-filter.py -b /data/TRUST4-master/scripts/TCR/BCR1.tsv > newreport.tsv
Which gives
Shall I filter my other files (GEX, TCR) too? Am I doing right please? I am just wondering why I am getting more BCRs after filtering
Also, I noticed scRepertoire does not visualise BCR data (might be I am wring)
> S1 <- read.delim("BCR1.tsv")
> S2 <- read.delim("BCR2.tsv")
> contig_list <- list(S1, S2)
> combined <- combineTRUST4(contig_list,samples = c("bcr1","bcr2"),removeNA=TRUE)
> quantContig(combined, cloneCall="gene+nt", scale = TRUE)
bcr=read.delim("BCR1.tsv")
> head(bcr[1:2,1:4])
X.barcode cell_type chain1
1 ATGAGCCACACGATGT B *
2 CTACTTTGTTCAACCA B *
chain2
1 IGLV1-40*01,*,IGLJ3*02,*,TGCCAGTCCTTTGACAACAGCCTGAGTGGTCCGGTGTTT,CQSFDNSLSGPVF,1.00,ATGAGCCACACGATGT_28117,94.29,0
2 IGKV3-11*01,*,IGKJ3*01,*,TGTCAGCAGCGTAGCAACTGGTTCACTTTC,CQQRSNWFTF,1.97,CTACTTTGTTCAACCA_53325,100.00,0
>
Thanks a lot in advance
Hi
I need your personal point of view please
I have two
bulk RNA-seq
patients (PBMC) on which I runmixcr
For the same two patients I have
5' single cell
on which I runTRUST4
I am seeing more clones derived by
sc RNA-seq
and more clonality derived byBulk RNA-seq
I named my bulk RNA-seq samples as TCR and my sc RNA-seq as TRUST
Please, personally do you have any idea what could be an interpretation for this?
Thank you for any help