liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
270 stars 46 forks source link

CDR3 clustering among multiple samples (bulk RNA-seq) #260

Closed 322029 closed 5 months ago

322029 commented 5 months ago

Thank you for your great tool. I have three questions as for clustering.

  1. As indicated in the title, is it possible to cluster CDR3nt among multiple samples? Although CDR3nt in a sample can be clustered by "trust-cluster.py", inputting multiple trust_report.tsv files is not allowed. I suppose putting together the trust_report.tsv files into a whole and executing trust-cluster.py will work. Is it correct?

  2. Alternatively, trust_report.tsv files provide "cid" (assemble~), and the cids seem to be consistent with other samples'. So, can I substitute cid for cluster?

  3. I'm wondering whether CDR3nt in the same cluster show similar antigenic specificities or their sequences are just similar.

I'd appreciate if you could reply.

mourisl commented 5 months ago
  1. Yes, you can put them together and then run the cluster script.
  2. Though the cluster script does not use that column information, but you shall still rename the assembled contig's names in the cid column to track the samples.
  3. The cluster is just based sequence similarity. For BCR, this is mainly for clustering the clonotypes from the same lineage that become different due to SHM.
322029 commented 5 months ago

Thank you for your prompt response! As for cid, I found some of the same cids separating into a few lines, and their CDR3nt are slightly different. So, I'm interested in the difference between cid and cluster. Would you tell me about this?

mourisl commented 5 months ago

cid is the consensus ID, where each consensus sequence may encode multiple CDR3 sequences. They will be automatically in the same cluster.

322029 commented 5 months ago

I understand. Thank you so much!