liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
274 stars 47 forks source link

Size of the *_report.tsv files vary a lot across different samples #312

Open binbinZhao2017 opened 3 weeks ago

binbinZhao2017 commented 3 weeks ago

Thanks for the wonderful tools. I ran the tool on our tumor bulk RNAseq dataset and found the size of the *_report.TSV files vary a lot. Some samples have many more TCRs (Both types and reads) while some samples only have 1-2 TCRs, I know this may be due to the presence of fewer T cells in these samples, but there are also other possibilities, maybe these samples have less reads overall, I tried some deconvolution tools to deconvolute the immune cell types in the same data and found samples with almost same T cells fractions have different TCR reads and types. So my question is do you do some kind of QC before analyzing the TCR and BCR so that data with too few reads can not go to the next step? Or you have any suggestions on how I can make the different samples comparable? Thanks.

mourisl commented 3 weeks ago

Depending on the applications. For example, if you are focusing on tracking clonotypes, even a sample with a few clonotype could be informative. If you are calculating diversities, a sample with too few 1-2 TCRs may make the estimation highly skewed. In this case, I usually only consider samples with at least 10 distinct clonotypes (Other threshold should also work).