liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
285 stars 49 forks source link

Inquiry about the Usage of TRUST4 in 10X 3' scRNA-seq Data Analysis #224

Open yuyang3 opened 1 year ago

yuyang3 commented 1 year ago

I hope you're doing well. I recently came across your paper, 'TRUST4: immune repertoire reconstruction from bulk and single-cell RNA-seq data,' published in Nature Methods. I was highly impressed by the TRUST4 tool's capabilities and have successfully applied it in my own research project.

I am reaching out to seek your insights on using TRUST4, specifically in the context of 10x 3' scRNA-seq data analysis. Your paper demonstrates TRUST4's exceptional performance in 10X 5' scRNA-seq data, which prompted me to use it for B-cell receptor (BCR) sequence assembly from both 10X 3' and 10X 5' scRNA-seq fastq files in my project. Subsequently, I conducted a series of analyses on the assembled BCR sequences. During the quantitative assessment of somatic hypermutation rates in BCRs, I observed an unexpectedly high SHM rate in B cells from 10X 3' scRNA-seq data compared to those from 10X 5' scRNA-seq data, particularly among naive B cells. One possible explanation I've considered is the notable difference in BCR V sequence lengths assembled from 10X 3' data compared to those from 10X 5' data, which could potentially impact SHM rate calculations due to their primary dependence on V sequence mutation rates.

In light of this, I am planning to exclude 10X 3' end data from SHM-related analyses in my upcoming work. However, for other analyses, such as clone expansion levels and the selection of IGHC genes, I intend to retain some of the 10X 3' data. I would greatly appreciate your thoughts and suggestions on how best to handle the 10X 3' end data in this context.

Thank you very much for your time and consideration. Your expertise and insights would be immensely valuable to me as I proceed with my research.

Warm regards

mourisl commented 1 year ago

Thank you for using TRUST4. I agree that the short anchor of V genes in the assembled contig consensus may cause inaccurate estimation of SHMs. The CDR3 sequence is more robust to use, so I think you can use it for the downstream analysis, along with the isotypes associated with those CDR3s.