scverse / muon

muon is a multimodal omics Python framework
https://muon.scverse.org/
BSD 3-Clause "New" or "Revised" License
206 stars 28 forks source link

core dumped when run 'mu.pp.neighbors(mdata, key_added='wnn')' #81

Open 111kakaluote opened 1 year ago

111kakaluote commented 1 year ago

Describe the bug when I test the muon pipeline by using data of 6400cells, error happens when run 'mu.pp.neighbors(mdata, key_added='wnn')'' like : Error in `python': malloc(): smallbin double linked list corrupted: 0x0000558a3b39a900

System

gtca commented 1 year ago

Hi @111kakaluote, 6400 cells should not be an issue for mu.pp.neighbors though it depends on the available resources of course. For instance, you can find a tutorial with CITE-seq data of similar size here.

What is the size of the feature space that is being used? In standard workflows, reduced representation like PCA is used prior to calculating cell neighbourhood graphs, is it the case here as well?

gabumon0 commented 1 year ago

Hi @111kakaluote, 6400 cells should not be an issue for mu.pp.neighbors though it depends on the available resources of course. For instance, you can find a tutorial with CITE-seq data of similar size here.

What is the size of the feature space that is being used? In standard workflows, reduced representation like PCA is used prior to calculating cell neighbourhood graphs, is it the case here as well?

@gtca hi, there are 20015 gene and 18 protein feature, and I has reduced representation by PCA, my script is

##clr normalize
    pt.pp.clr(malldata['prot'])
    sc.pp.scale(malldata['prot'], max_value=10)
    sc.tl.pca(malldata['prot'])
##rna analysis
    malldata['rna'].layers['counts'] = malldata['rna'].X.copy()
##filter cell
    malldata['rna'].var['mt'] = malldata['rna'].var_names.str.contains("^[Mm][Tt]-")  # annotate the group of mitochondrial genes as 'mt'
    sc.pp.calculate_qc_metrics(malldata['rna'], qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
    mu.pp.filter_obs(malldata['rna'], 'pct_counts_mt', lambda x: x <= args.mtfilter)
##rna normalize
    sc.pp.normalize_total(malldata['rna'], target_sum=1e4)
    sc.pp.log1p(malldata['rna'])
    sc.pp.highly_variable_genes(malldata['rna'], min_mean=0.02, max_mean=4, min_disp=0.5)
    malldata['rna'].raw = malldata['rna']
    sc.pp.scale(malldata['rna'], max_value=10)
    sc.tl.pca(malldata['rna'], svd_solver='arpack')
##subset cells in the protein modality
    mu.pp.intersect_obs(malldata)
    sc.pp.neighbors(malldata['rna'])
    sc.pp.neighbors(malldata['prot'])

# Calculate weighted nearest neighbors
    mu.pp.neighbors(malldata, key_added='wnn',low_memory=True)
    mu.tl.umap(malldata, neighbors_key='wnn', random_state=10)

and now by using parameter low_memory=True, the memory used may be less.

gtca commented 1 year ago

Thank you, @gabumon0. Do you encounter the same issue at the line with mu.pp.neighbors()? Is there any log that you might be able to share?

gabumon0 commented 1 year ago

Thank you, @gabumon0. Do you encounter the same issue at the line with mu.pp.neighbors()? Is there any log that you might be able to share?

@gtca sorry, I am @111kakaluote too, @gabumon0 is my another ID and I forget to switch the github ID.