liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
283 stars 49 forks source link

Should I use the airr or the barcode_airr tsv file? #280

Open xingyongma opened 5 months ago

xingyongma commented 5 months ago

I am analyzing data similar to 10X format(seekone DD_5) using TRUST4 , which is single-cell VDJ data with barcodes and UMI. The results of the TRUST4 analysis include both airr and barcode_airr tsv results. At the same time, I also use cellranger to analyze this data. I found that the airr results are closer to the contig number of cellranger, while the contig number of barcode_airr is about 3-4 times as much. So, what is the difference between airr and barcode airr? The sequence_id in airr also contain the barcode, and does barcode_airr consider some additional conditions? From your perspective, which should I use, the airr or barcode_airr tsv results?

mourisl commented 5 months ago

For single-cell data, you shall use the barcode_airr data and the barcode is in the cell_id column. The "airr" file kind of grouping the same clonotype from multiple cells into one entry, and use one of the cell's sequence to fill up the airr columns. This file can be.useful for some quick examination of clonal expansion. Hope this helps.

xingyongma commented 5 months ago

Thank you very much for your prompt response. Currently, I have filtered the contigs where both complete_vdj and productive are True. Do you think it is necessary to consider other criteria (such as based on the number of umis) for further filtering?

mourisl commented 5 months ago

I think that depends. Is there a paired scRNA-seq gene expression data? If so, you can check whether the cells with TCR/BCR match the cell type well, and decide whether further QC is needed.

xingyongma commented 5 months ago

Great suggestion. I have paired scRNA-seq gene expression data, and in the next few days, I will conduct tests on some samples. Then, I will provide feedback on the test results.

xingyongma commented 4 months ago

The results obtained by filtering with umi > 5 are very consistent with the results of the paired transcriptome.