s175573 / GIANA

Ultrafast TCR clustering algorithm based on geometric isometry
Other
63 stars 30 forks source link

Subsequent filtering steps ? #16

Open Liz-m57 opened 3 weeks ago

Liz-m57 commented 3 weeks ago

Hi! I want to get credible antigen-specific tcrs from disease samples and performed giana for each sample. Now I wonder if any other filter steps are required after i got sample--RotationEncodingBL62.txt.I would appreciate it if you could answer my questions below.

  1. "small world effect" I noticed "small world effect" is mentioned in your paper :"For each sample, we first removed TCR clusters with more than 100 samples, as these TCRs were likely generated from small-world connections and not informative to disease specificity". Is this a step that will make the result better? Do these removed identical clusters mean identical including CDR3 sequences and V genes, or just CDR3 sequences? Besides, how to determine in how many samples occurrences will be removed?

  2. MergeClusters.py It doesn't appear to filter out clusters in this script, just merge some and keep the rest.Given that I only need to get the specific tcr and don't care about the clustering assignments, this step doesn't seem to be required for me.

If you have any other suggestions about filtering, thanks a lot for being able to suggest!