phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

KeyError: 'graph_vs_graph_logos' #23

Closed chris-rands closed 2 years ago

chris-rands commented 2 years ago

Thanks for the nice looking tool! I'm trying to locally run the simple_conga_pipeline.ipynb. I get the KeyError: 'graph_vs_graph_logos' error here:

tag = conga.tags.GRAPH_VS_GRAPH_LOGOS
pngfile = adata.uns['conga_results'][tag]

I previously ran conga.plotting.make_graph_vs_graph_logos(), but I guess it did not populate the dict?

print(conga.tags.GRAPH_VS_GRAPH_LOGOS, '#', adata.uns['conga_results'].keys())
# graph_vs_graph_logos # dict_keys(['graph_vs_graph', 'graph_vs_graph_help'])
phbradley commented 2 years ago

Hi Chris, Thanks for trying conga out! Quick question: is this running on your own data or on the same dataset from the notebook? I'm wondering whether conga found any graph-vs-graph clusters at all. In the output after the make_graph_vs_graph_logos() block, do you see lines like "making cluster logos: 0 4 tmp_hs_pbmc_graph_vs_graph_logos.png" (like in the original notebook on github)?

Either way, sounds like we should add a check for this. Take care, Phil

sschattgen commented 2 years ago

This error is due to no conga clusters passing the min_cluster_size threshold so nothing is being stored in adata.uns['conga_results']['graph_vs_graph_logos']. You could try decreasing the threshold to 2, the minimum, and see if you have any luck.

phbradley commented 2 years ago

Thanks Stefan, great suggestion!

chris-rands commented 2 years ago

Thank you both for the quick and helpful responses! To clarify, I am using my own data (not the test data). There is no output from the make_graph_vs_graph_logos()- the cell runs without error but does not print anything (unlike in the tutorial notebook). Running make_graph_vs_graph_logos() with min_cluster_size = 2 leads to the same error. Here is conga stats:

OrderedDict([('num_cells_w_gex', 3865),
             ('num_features_start', 36601),
             ('num_cells_w_tcr', 2589),
             ('min_genes_per_cell', 200),
             ('max_genes_per_cell', 2500),
             ('max_percent_mito', 0.1),
             ('num_filt_max_genes_per_cell', 98),
             ('num_filt_max_percent_mito', 4),
             ('num_antibody_features', 0),
             ('num_TR_genes', 91),
             ('num_TR_genes_in_hvg_set', 87),
             ('num_highly_variable_genes', 1579),
             ('num_cells_after_filtering', 2487),
             ('num_clonotypes', 1972),
             ('max_clonotype_size', 127),
             ('num_singleton_clonotypes', 1879)])

The conga score plot above is not looking colourful: image

I'm guessing something failed/performed poorly upstream? I can dig into it

phbradley commented 2 years ago

Huh, no, that's not very colorful at all! It's certainly possible that conga is not finding much correlation between GEX and TCR in the dataset. But it would be nice to rule out a technical issue, since the software is still relatively new. What does the graph-vs-graph results dataframe look like (shape, range of conga scores)? This would be shown in the block after the code:

results = conga.correlations.run_graph_vs_graph(
    adata, all_nbrs, outfile_prefix=outfile_prefix)

or accessible from adata in adata.uns['conga_results']['graph_vs_graph']

Also, I'm curious whether the other analyses are returning anything... To rule out some failure in neighbor-finding or the like.

chris-rands commented 2 years ago

Thanks again! adata.uns['conga_results']['graph_vs_graph'] contains:

  | conga_score | num_neighbors_tcr | cluster_size | overlap | overlap_corrected | mait_fraction | clone_index | nbr_frac | graph_overlap_type | gex_cluster | tcr_cluster | va | ja | cdr3a | vb | jb | cdr3b
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
0.570129 | 19 | 324 | 10 | 10 | 0.0 | 884 | 0.01 | gex_cluster_vs_tcr_nbr | 2 | 7 | TRAV26-1*01 | TRAJ48*01 | CIVCGFGNEKLTF | TRBV10-3*01 | TRBJ2-3*01 | CAISARAAGEDTQYF
0.973314 | 19 | 345 | 10 | 10 | 0.0 | 705 | 0.01 | gex_cluster_vs_tcr_nbr | 0 | 6 | TRAV22*01 | TRAJ42*01 | CAVDYGGSQGNLIF | TRBV19*01 | TRBJ2-5*01 | CASRGQGRGPETQYF

I'm guessing there should be more rows? Maybe there isn't much correlation between GEX and TCR for this data, but I'm surprised because it's just 'vanilla' PBMCs. I'm getting plenty of other plots like UMAPs and phylogenies, although I can't say that I have yet digested their meaning. I could upload the full notebook if helpful and you have time to look at it?

phbradley commented 2 years ago

Sure, I'd be happy to take a look. If you'd rather keep it semi-private, you could also send me a dropbox link (pbradley@fredhutch.org). I can forward to Stefan and see if either of us has any ideas. I agree that PBMCs usually have some "positive controls" for conga, like MAITs or CD4 vs CD8 TCR sequence differences, but we haven't looked at all that many datasets... Thanks for exploring this with us!