prabhakarlab / Banksy_py

GNU General Public License v3.0
28 stars 3 forks source link

Python version of Banksy does not produce scaled_gaussian result (anymore) #14

Open bmanzato opened 1 month ago

bmanzato commented 1 month ago

Hi,

I wrote a notebook a while back to use Banksy on brain spatial data (from slideseqv2_analysis.ipynb vignette). It worked without any issue, I re run it today and the only result I get is the clustering when no spatial information is used (i.e. results_df only has one row: the decay=nonspatial one, the scaled_gaussian is completely missing and also the scatterplot of the scaled_gaussian decay is missing from the output).

My code:

# set params
plot_graph_weights = True
k_geom = 8 # n neighbors
max_m = 1 # azumithal transform up to kth order
nbr_weight_decay = "scaled_gaussian" # can also be "reciprocal", "uniform" or "ranked"

# Find median distance to closest neighbours, the median distance will be `sigma`
nbrs = median_dist_to_nearest_neighbour(adata, key = coord_keys[2])

banksy_dict = initialize_banksy(
    adata,
    coord_keys,
    k_geom,
    nbr_weight_decay=nbr_weight_decay,
    max_m=max_m,
    plt_edge_hist=True,
    plt_nbr_weights=True,
    plt_agf_angles=False, # takes long time to plot
    plt_theta=True,
)

# Main hyperparameters for BANKSY:
resolutions = [0.5] # clustering resolution for UMAP
pca_dims = [20] # Dimensionality in which PCA reduces to
lambda_list = [1] # list of lambda parameters

banksy_dict, banksy_matrix = generate_banksy_matrix(adata, banksy_dict, lambda_list, max_m)

banksy_dict["nonspatial"] = {
    # Here we simply append the nonspatial matrix (adata.X) to obtain the nonspatial clustering results
    0.0: {"adata": concatenate_all([adata.X], 0, adata=adata), }
}

pca_umap(banksy_dict,
         pca_dims = pca_dims,
         add_umap = True,
         plt_remaining_var = False)

results_df, max_num_labels = run_Leiden_partition(
        banksy_dict,
        resolutions,
        num_nn = 50,
        num_iterations = -1,
        partition_seed = seed,
        match_labels = True)

output_path = '../banksy_output/'

c_map =  'tab20' # specify color map
weights_graph =  banksy_dict['scaled_gaussian']['weights'][0]

plot_results(
        results_df,
        weights_graph,
        c_map,
        match_labels = True,
        coord_keys = coord_keys,
        max_num_labels  =  max_num_labels, 
        save_path = output_path,
        save_fig = True, # save the spatial map of all clusters
        save_seperate_fig = True, # save the figure of all clusters plotted seperately
)
chousn commented 1 month ago

Hi Benedetta, could you let us know if there were any error messages or warnings as you ran the code, and also what banksy_dict looks like (1) after initialize_banksy and (2) after generate_banksy_matrix (i.e. did it have a "scaled_gaussian" entry or is it an empty dictionary)?

bmanzato commented 1 month ago

banksy_dict after initialize_banksy :

{'scaled_gaussian': {'weights': {0: <37068x37068 sparse matrix of type '<class 'numpy.float64'>'
    with 889632 stored elements in Compressed Sparse Row format>,
   1: <37068x37068 sparse matrix of type '<class 'numpy.complex128'>'
    with 1779264 stored elements in Compressed Sparse Row format>}}}

banksy_dict after generate_banksy_matrix:

{'scaled_gaussian': {'weights': {0: <37068x37068 sparse matrix of type '<class 'numpy.float64'>'
    with 889632 stored elements in Compressed Sparse Row format>,
   1: <37068x37068 sparse matrix of type '<class 'numpy.complex128'>'
    with 1779264 stored elements in Compressed Sparse Row format>},
  'norm_counts_concatenated': array([[0.        , 0.        , 0.        , ..., 0.        , 0.05099043,
          0.04596571],
         [1.        , 0.        , 0.        , ..., 0.        , 0.02055614,
          0.        ],
         [1.        , 0.        , 0.        , ..., 0.01948217, 0.07366589,
          0.03091624],
         ...,
         [0.        , 0.        , 0.        , ..., 0.00473737, 0.10097207,
          0.01808154],
         [0.        , 0.        , 0.        , ..., 0.03501979, 0.06134723,
          0.02725116],
         [1.        , 1.        , 0.        , ..., 0.06179583, 0.24296682,
          0.01280443]]),
  1: {'adata': AnnData object with n_obs × n_vars = 37068 × 3366
       obs: 'brain_section_label', 'feature_matrix_label', 'donor_label', 'donor_genotype', 'donor_sex', 'cluster_alias', 'x', 'y', 'z', 'subclass_confidence_score', 'cluster_confidence_score', 'high_quality_transfer', 'neurotransmitter', 'class', 'subclass', 'supertype', 'cluster', 'neurotransmitter_color', 'class_color', 'subclass_color', 'supertype_color', 'cluster_color'
       var: 'gene_symbol', 'name', 'mapped_ncbi_identifier', 'is_nbr', 'k'}},
 'nonspatial': {0.0: {'adata': AnnData object with n_obs × n_vars = 37068 × 1122
       obs: 'brain_section_label', 'feature_matrix_label', 'donor_label', 'donor_genotype', 'donor_sex', 'cluster_alias', 'x', 'y', 'z', 'subclass_confidence_score', 'cluster_confidence_score', 'high_quality_transfer', 'neurotransmitter', 'class', 'subclass', 'supertype', 'cluster', 'neurotransmitter_color', 'class_color', 'subclass_color', 'supertype_color', 'cluster_color'
       var: 'gene_symbol', 'name', 'mapped_ncbi_identifier', 'is_nbr', 'k'}}} 

No errors or warning but then results_df looks like this:

decay   lambda_param    num_pcs resolution  num_labels  labels  adata   relabeled 
nonspatial_pc20_nc0.00_r0.50**  nonspatial  0.0 20  0.5 19  Label object:\nNumber of labels: 19, number of...   [[[View of AnnData object with n_obs × n_vars ...   Label object:\nNumber of labels: 19, number of...