jianhuupenn / SpaGCN

SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network
MIT License
200 stars 59 forks source link

High memory demands with large samples #64

Open vladimirkovacevic opened 1 year ago

vladimirkovacevic commented 1 year ago

I'm trying to process a sample with 90 000 observations and it gets killed (deficit of memory) during the execution of _lsearch function. The machine I'm using has 36 CPUs and 72 GB of RAM. Do you maybe have any suggestions on how to optimize it? Would it help not to use "easy mode"?

def detect_spatial_domains_ez_mode(adata, img, x_array, y_array, x_pixel, y_pixel, n_clusters, histology=True, s=1, b=49, p=0.5, r_seed=100, t_seed=100, n_seed=100):
    adj=calculate_adj_matrix(x=x_pixel,y=y_pixel, x_pixel=x_pixel, y_pixel=y_pixel, image=img, beta=b, alpha=s, histology=histology)
    prefilter_genes(adata,min_cells=3) # avoiding all genes are zeros
    prefilter_specialgenes(adata)
    sc.pp.normalize_per_cell(adata)
    sc.pp.log1p(adata)
    l=search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
    res=search_res(adata, adj, l, n_clusters, start=0.7, step=0.1, tol=5e-3, lr=0.05, max_epochs=20, r_seed=r_seed, t_seed=t_seed, n_seed=n_seed)
    clf=SpaGCN()
    clf.set_l(l)
    random.seed(r_seed)
    torch.manual_seed(t_seed)
    np.random.seed(n_seed)
    clf.train(adata,adj,init_spa=True,init="louvain",res=res, tol=5e-3, lr=0.05, max_epochs=200)
    y_pred, prob=clf.predict()
    return y_pred