The question about the single-cell data results

liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

MIT License

274 stars 47 forks source link

The question about the single-cell data results #32

Closed toddey closed 3 years ago

toddey commented 3 years ago

Hi, we want to analyze the TCR repertoires of a naive T cell cluster of our single cell data, but there are some BCRs identified in the TRUST results. Can we consider the BCRs as wrong results and discard them?

mourisl commented 3 years ago

That mostly depends on your analysis. Though BCR could be real, such as from doubleton or mis-clustering, I think it's safe to ignore them if the proportion is small.

toddey commented 3 years ago

That mostly depends on your analysis. Though BCR could be real, such as from doubleton or mis-clustering, I think it's safe to ignore them if the proportion is small.

Very thanks for your reply, I have another small question: in the trust_barcode_report.tsv file, the chain1 and chain2 columns are the most abundance pair of chains, so the secondary chains are the second aboundace. Do I understand right?

mourisl commented 3 years ago

Yes. The secondary column contains all other CDR3s less abundance than the primary CDR3.

toddey commented 3 years ago

Yes. The secondary column contains all other CDR3s less abundance than the primary CDR3.

Thanks again for your patience. I saw the Readme.txt says that, for the chain information in CSV file, the last two number stand for read count and CDR3 germline similarity. And I am a little puzzled that some read count is not a integer?

mourisl commented 3 years ago

There could be some a read compatible with multiple CDR3s (partially overlapped). TRUST4 applied the expectation-maximization algorithm to assign those ambiguous reads. Therefore, the abundance/read count could be a float.