Question about consensus_count

liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

MIT License

270 stars 46 forks source link

Question about consensus_count #306

Open Henrik-huang opened 1 week ago

Henrik-huang commented 1 week ago

Hello Trust4 Team,

Thank you for developing such a useful tool. I’ve encountered an issue and don’t know how to deal with it. I have 8 bulk RNA-seq samples, and their FASTQ files are approximately the same size, with two samples being 1 GB larger than the others. However, the proportion of assembled reads for these two larger samples is significantly higher compared to the rest.

Additionally, I’ve observed that some sequences have unusually high consensus counts. Is this a normal occurrence?

Is this something that typically happens with the other six samples?

mourisl commented 1 week ago

The higher number of assembled reads could be just that there are more T cells or B cells in the sample (higher immune cell infiltration.)

The high consensus counts mean there are more reads from this chain. It usually comes from clonal expansion and plasma B cells.

Henrik-huang commented 1 week ago

Thank you for your answer. So, is it possible that the two samples have abnormal clone amplification? Should I exclude these obviously increased clones when calculating VDJ frequencies?

mourisl commented 1 week ago

No need to filter those. If feels like the top clonotype have relatively high abundance in the two samples than others, it could be that there are multiple clonal expanded clonotypes, or the library captures BCR read better somehow. Nevertheless, once you normalize them into frequencies or fractions, I think they are comparable.