phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
80 stars 18 forks source link

columns in `obs.tsv`: good_score_mask, tcr_clumping_pvalues, test #15

Closed markgene closed 2 years ago

markgene commented 3 years ago

Hi there,

Can you clarify what these columns in the result *obs.tsv mean: good_score_mask, tcr_clumping_pvalues, test? Below is an excerpt of my obs.tsv file. Thanks!

    va  ja  cdr3a   cdr3a_nucseq    vb  jb  cdr3b   cdr3b_nucseq    n_genes percent_mito    n_counts    clone_sizes gex_variation   louvain_gex clusters_gex    louvain_tcr clusters_tcr    nndists_gex nndists_tcr is_invariant    test    tcr_clumping_pvalues    conga_scores    good_score_mask genex_clusters
GCTGCGAAGAAGAAGC-5  TRAV1*01    TRAJ17*01   CAVREDSAGNKLTF  tgtgctgtgagggaagacagtgcagggaacaagctaactttt  TRBV17*01   TRBJ1-1*01  CASSSGTEVFF tgtgctagcagtagcgggacagaagtcttcttt   2239    0.018585505 8017.0  4   13.633640174552784  0   0   0   0   13.410091180067798  0.567562858634753   False   none    2600.0  142.84692726858285  False   0
AATCGGTCAATTGCTG-5  TRAV1*01    TRAJ17*01   CAVRTNSAGNKLTF  tgtgctgtgaggactaacagtgcagggaacaagctaactttt  TRBV10*01   TRBJ2-3*01  CASSPGGASAETLYF tgtgccagcagccctgggggggcgagtgcagaaacgctgtatttt   1715    0.004017467 5725.0  2   14.277908034334514  2   2   12  12  11.087446585000412  0.613808748480102   False   none    2600.0  553.1331204646976   False   2
phbradley commented 3 years ago

Hi Mark thanks for the question. We are working on building out a minimal set of documentation of the outputs (tables and plots) in time for the manuscript publication. Should be done in the next couple weeks. I'm sorry that it's not in place yet!

To your question:

good_score_mask is a boolean array that should be True if conga_score<=1

tcr_clumping_pvalues -- these are the per-clonotype adjusted P-values for the TCR clumping analysis, which looks for TCRs that have more TCR sequence neighbors than you would expect by chance. This TCR clumping analysis does not use the GEX information at all. There should also be some plots (tcr_clumpingpng) which will show the results in a graphical form. Clonotypes with significance clumping P-values are grouped based on TCR similarity and groups of size at least 3 (this can be changed) are shown in *tcr_clumping_logos.png

test -- this is a temporary tag used during the differential expression analysis; it can be safely ignored. I should remove it.

Let me know if that's not clear.

markgene commented 3 years ago

Thank you so much, @phbradley!

phbradley commented 2 years ago

There's a bit more info on the README describing the core conga data slots, so I'm closing this issue for now.