novosparc exhausts memory when anndata.X in sparse matrix format (CSC or CSR)

Hi, I've been trying to run novosparc with an anndata object, which has its matrix '.X' in sparse dataformat. I've tried both 'scipy.sparse.csc_matrix' and 'scipy.sparse.csr_matrix' and for both, the 'tissue.reconstruct' function hangs, and requests enormous amounts of memory (after 250GB I killed the process). If I convert the sparse matrix into a regular 'numpy' array, there is no problem, novosparc runs.

Below is a code example:


import numpy as np
import pandas as pd
import scanpy as sc

import novosparc

data_dir = '/data/local/rajewsky/home/tsztank/projects/spatial/repos/sts-paper/'
data_dir += 'projects/slideseq_v2/processed_data/slideseq_v2_mouse_hippo_new/'
data_dir += 'illumina/complete_data/automated_analysis/slideseq/umi_cutoff_50'

dataset = sc.read(f'{data_dir}/results.h5ad')
num_cells = 2000

#sc.pp.subsample(dataset, n_obs=num_cells)
dataset = dataset[dataset.obs.total_counts>1000, :]

gene_names = dataset.var.index.tolist()

num_cells, num_genes = dataset.shape

is_var_gene = dataset.var['highly_variable']
# select only 100 genes
var_genes = list(is_var_gene.index[is_var_gene])[:100]

dge_rep = dataset.to_df()[var_genes]

from scipy.sparse import csc_matrix, csr_matrix

# if you uncomment this, dataset.X will be stored not as sparse matrix, rather
# as a regular numpy array. uncommenting this doesnt throw an error
# M = dataset.X.toarray()
# del dataset.X
# dataset.X = M

num_locations = 1000
locations_circle = novosparc.gm.construct_circle(num_locations = num_locations)

tissue = novosparc.cm.Tissue(dataset=dataset, locations=locations_circle)

num_neighbors_s = num_neighbors_t = 5

tissue.setup_smooth_costs(dge_rep=dge_rep, num_neighbors_s=num_neighbors_s, 
    num_neighbors_t=num_neighbors_t)

tissue.reconstruct(alpha_linear=0, epsilon=5e-3)

rajewsky-lab / novosparc

novosparc exhausts memory when anndata.X in sparse matrix format (CSC or CSR) #54