phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

AssertionError in conga.plotting.make_graph_vs_graph_logos() #49

Open cr2106 opened 1 year ago

cr2106 commented 1 year ago

Hi!

I'm trying to run CoNGA on some B cell data. I get the AssertionError when I run conga.plotting.make_graph_vs_graph_logos:

~/conga/conga/plotting.py in make_logo_plots(adata, nbrs_gex, nbrs_tcr, min_cluster_size, logo_pngfile, logo_genes, gene_logo_width, clusters_gex, clusters_tcr, clusters_gex_names, clusters_tcr_names, ignore_tcr_cluster_colors, show_real_clusters_gex, good_bicluster_tcr_scores, rank_genes_uns_tag, include_alphadist_in_tcr_feature_logos, max_expn_for_gene_logo, show_pmhc_info_in_logos, nocleanup, conga_scores, conga_scores_name, good_score_mask, make_batch_bars, batch_keys, make_cluster_gex_logos, draw_edges_between_conga_hits, add_conga_scores_colorbar, add_gex_logos_colorbar, pretty, gex_header_genes, make_gex_header, make_gex_header_raw, make_gex_header_nbrZ, gex_header_tcr_score_names, include_full_tcr_cluster_names_in_logo_lines, lit_matches)
   1336             #     old_make_tcr_logo( [ tcrs[x] for x in nodes ], ab, organism, pngfile )
   1337             # else: # new way
-> 1338             make_tcr_logo_for_tcrs( [ tcrs[x] for x in nodes ], ab, organism, pngfile,
   1339                                     tcrdist_calculator=tcrdist_calculator )
   1340             image = mpimg.imread(pngfile)

~/conga/conga/tcrdist/make_tcr_logo.py in make_tcr_logo_for_tcrs(tcrs, chain, organism, pngfile, tcrdist_calculator)
    504             for ivj, vj in enumerate('vj'):
    505                 gene = tcr[iab][ivj]
--> 506                 assert gene in all_genes.all_genes[organism]
    507                 for tag in 'gene genes rep reps'.split():
    508                     info[f'{vj}{ab}_{tag}'] = gene

AssertionError: 

The logos plot is only produced until the TCR logo.

adata.uns['conga_stats']

OrderedDict([('num_cells_w_gex', 191798),
             ('num_features_start', 21587),
             ('num_cells_w_tcr', 8849),
             ('min_genes_per_cell', 200),
             ('max_genes_per_cell', 2500),
             ('max_percent_mito', 0.1714),
             ('num_filt_max_genes_per_cell', 388),
             ('num_filt_max_percent_mito', 490),
             ('num_antibody_features', 0),
             ('num_TR_genes', 85),
             ('num_TR_genes_in_hvg_set', 33),
             ('num_highly_variable_genes', 976),
             ('num_cells_after_filtering', 7968),
             ('num_clonotypes', 7846),
             ('max_clonotype_size', 8),
             ('num_singleton_clonotypes', 7744)])

Thanks a lot for your help and great package!

phbradley commented 1 year ago

Hi there, Thanks for trying conga! It looks like there's an "unrecognized" TCR gene name. That could mean that there's a mismatch between the "TCR" (or BCR) type and the organism. Can you double-check that the organism being passed into the plotting function is "human_ig"? Are you running this from the notebook or the command line?

You could investigate with a snippet of code like this:

organism = 'human_ig'
tcrs = conga.preprocess.retrieve_tcrs_from_adata(adata)
for tcr in tcrs:
    print(tcr)
    va, ja = tcr[0][:2]
    vb, jb = tcr[1][:2]
    assert va in conga.tcrdist.all_genes.all_genes[organism]
    assert ja in conga.tcrdist.all_genes.all_genes[organism]
    assert vb in conga.tcrdist.all_genes.all_genes[organism]
    assert jb in conga.tcrdist.all_genes.all_genes[organism]