scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

"sc.pp.neighbors" kills kernel #2359

Closed Brycealong closed 7 months ago

Brycealong commented 1 year ago

At the stage of finding neighbors, my jupyter kept showing this error:

Screen Shot 2022-10-22 at 2 51 46 PM

the error:

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
And it killed the kernel entirely. 

I try to make this work by running this in Linux but it got killed again.

Screen Shot 2022-10-22 at 3 13 47 PM

Below is my basic workflow:

def pp(adata):
    sc.pp.filter_cells(adata, min_genes=200) #get rid of cells with fewer than 200 genes
    sc.pp.filter_genes(adata, min_cells=3) #get rid of genes that are found in fewer than 3 cells
    adata.var['mt'] = adata.var_names.str.startswith('MT-')  # annotate the group of mitochondrial genes as 'mt'
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
    upper_lim = np.quantile(adata.obs.n_genes_by_counts.values, .98)
    lower_lim = np.quantile(adata.obs.n_genes_by_counts.values, .02)
    adata = adata[(adata.obs.n_genes_by_counts < upper_lim) & (adata.obs.n_genes_by_counts > lower_lim)]
    adata = adata[adata.obs.pct_counts_mt < 25]
    sc.pp.normalize_total(adata, target_sum=1e4) #normalize every cell to 10,000 UMI
    sc.pp.log1p(adata) #change to log counts
    sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5) #these are default values
    adata.raw = adata #save raw data before processing values and further filtering
    adata = adata[:, adata.var.highly_variable] #filter highly variable
    sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt']) #Regress out effects of total counts per cell and the percentage of mitochondrial genes expressed
    sc.pp.scale(adata, max_value=10) #scale each gene to unit variance
    sc.tl.pca(adata, svd_solver='arpack')
    sc.pp.neighbors(adata, n_neighbors=10, n_pcs=20)
    sc.tl.umap(adata)
    return adata

adata = sc.read_csv("./myfile.csv", first_column_names=True)
adata = pp(adata)

My computer is Mac book Intel i5.

Thanks!

Versions

----- anndata 0.8.0 scanpy 1.9.1 ----- OpenSSL 22.0.0 PIL 9.2.0 PyObjCTools NA absl NA appnope 0.1.2 astunparse 1.6.3 attr 21.4.0 backcall 0.2.0 bcrypt 3.2.0 beta_ufunc NA binom_ufunc NA boto3 1.24.28 botocore 1.27.28 bottleneck 1.3.5 brotli NA certifi 2022.09.24 cffi 1.15.1 chardet 4.0.0 charset_normalizer 2.0.4 chex 0.1.5 cloudpickle 2.0.0 colorama 0.4.5 contextlib2 NA cryptography 37.0.1 cycler 0.10.0 cython_runtime NA cytoolz 0.11.0 dask 2022.7.0 dateutil 2.8.2 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 deprecate 0.3.2 dill 0.3.4 docrep 0.3.2 entrypoints 0.4 etils 0.8.0 flax 0.6.1 fsspec 2022.7.1 google NA graphviz 0.20 h5py 3.7.0 idna 3.4 igraph 0.10.2 ipykernel 6.15.2 ipython_genutils 0.2.0 ipywidgets 7.6.5 jax 0.3.23 jaxlib 0.3.22 jedi 0.18.1 jinja2 2.11.3 jmespath 0.10.0 joblib 1.1.1 jupyter_server 1.18.1 kiwisolver 1.4.2 leidenalg 0.8.10 llvmlite 0.39.1 louvain 0.8.0 lz4 3.1.3 markupsafe 2.0.1 matplotlib 3.5.2 matplotlib_inline 0.1.6 ml_collections NA mpl_toolkits NA msgpack 1.0.3 mudata 0.2.0 multipledispatch 0.6.0 natsort 8.1.0 nbinom_ufunc NA numba 0.56.3 numexpr 2.8.3 numpy 1.22.4 numpyro 0.10.1 opt_einsum v3.3.0 optax 0.1.3 packaging 21.3 pandas 1.4.4 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA plotly 5.9.0 prompt_toolkit 3.0.20 psutil 5.9.0 ptyprocess 0.7.0 pydev_ipython NA pydevconsole NA pydevd 2.6.0 pydevd_concurrency_analyser NA pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.11.2 pyparsing 3.0.9 pyro 1.8.2 pytorch_lightning 1.7.7 pytz 2022.1 regex 2.5.116 requests 2.28.1 rich NA scipy 1.7.3 scvi 0.18.0 session_info 1.0.0 setuptools 63.4.1 simplejson 3.17.6 six 1.16.0 sklearn 1.1.2 snappy NA socks 1.7.1 sphinxcontrib NA storemagic NA tblib 1.7.0 tensorboard 2.9.1 texttable 1.6.4 threadpoolctl 2.2.0 tlz 0.11.0 toolz 0.11.2 torch 1.12.1 torchmetrics 0.10.0 torchvision 0.13.1 tornado 6.1 tqdm 4.64.1 traitlets 5.1.1 tree 0.1.7 typing_extensions NA urllib3 1.26.12 wcwidth 0.2.5 wrapt 1.14.1 yaml 6.0 zipp NA zmq 23.2.0 zope NA ----- IPython 7.31.1 jupyter_client 7.3.4 jupyter_core 4.11.1 jupyterlab 3.4.4 notebook 6.4.12 ----- Python 3.9.12 (main, Jun 1 2022, 06:36:29) [Clang 12.0.0 ] macOS-10.16-x86_64-i386-64bit ----- Session information updated at 2022-10-22 15:12
xinyuejohn commented 1 year ago

Same issue.

Zethson commented 1 year ago

Can both of you ensure that you're not running out of memory, please?

xinyuejohn commented 1 year ago

Can both of you ensure that you're not running out of memory, please?

I can ensure that I'm have enough memory. But it might be my environment problem. I will check it again. Thanks!

Zethson commented 1 year ago

Honestly a bit lost elsewise. Think that what is shown above is only a Numba warning, but not an error. Not sure what kills the kernel...

Anybody else has an idea @scverse/scanpy ?

xinyuejohn commented 1 year ago

I reinstalled the environment and solved the issue.

Zethson commented 1 year ago

@Brycealong could you also try this in a new isolated environment, please? There might be some dependency that's interfering. Would be glad to know which one, but it's tricky...

mbuttner commented 1 year ago

Hi there, I have seen that sc.pp.neighbors leads to a dead kernel (core dump) on Apple Silicon M1. See tensorflow issue.

Brycealong commented 1 year ago

@Brycealong could you also try this in a new isolated environment, please? There might be some dependency that's interfering. Would be glad to know which one, but it's tricky...

Ofc. I can run the code on google colab and i'm stick to that. I think there's something interferring the process in my own computer...

mxposed commented 1 year ago

Has anyone found a solution for this? I run into segfault with the same message when trying to run sc.pp.calculate_qc_metrics on my M2. Latest clean installation. I have the core dump as well, but I don't know how to get useful information from there.

NeuroRookie commented 1 year ago

The same issue both on my M2 and Intel 12400, I'm sure the memory is not running out. I created a new envs but not helpful.

mbuttner commented 1 year ago

I recently installed the miniforge3 distribution on my Apple with M1 and both sc.pp.neighbors and sc.pp.calculate_qc_metrics work nice and quiet. Not sure if that helps with the issue here, but might be worth a try. My versions:

-----
anndata     0.9.1
scanpy      1.9.3
-----
PIL                         9.5.0
anndata2ri                  1.2.dev11
appnope                     0.1.3
asttokens                   NA
backcall                    0.2.0
backports                   NA
beta_ufunc                  NA
binom_ufunc                 NA
cffi                        1.15.1
colorama                    0.4.6
comm                        0.1.3
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.6.7
decorator                   5.1.1
defusedxml                  0.7.1
executing                   1.2.0
h5py                        3.8.0
hypergeom_ufunc             NA
igraph                      0.10.4
importlib_resources         NA
ipykernel                   6.22.0
ipython_genutils            0.2.0
ipywidgets                  8.0.6
jedi                        0.18.2
jinja2                      3.1.2
joblib                      1.2.0
kiwisolver                  1.4.4
leidenalg                   0.9.1
llvmlite                    0.39.1
markupsafe                  2.1.2
matplotlib                  3.7.1
mpl_toolkits                NA
natsort                     8.3.1
nbinom_ufunc                NA
ncf_ufunc                   NA
numba                       0.56.4
numpy                       1.22.0
packaging                   23.1
pandas                      1.2.5
parso                       0.8.3
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
platformdirs                3.2.0
prompt_toolkit              3.0.38
psutil                      5.9.5
ptyprocess                  0.7.0
pure_eval                   0.2.2
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.15.1
pyparsing                   3.0.9
pytz                        2023.3
pytz_deprecation_shim       NA
rpy2                        3.5.11
scipy                       1.9.1
scrublet                    NA
seaborn                     0.12.2
session_info                1.0.0
six                         1.16.0
sklearn                     1.2.2
stack_data                  0.6.2
statsmodels                 0.13.5
texttable                   1.6.7
threadpoolctl               3.1.0
tornado                     6.3
traitlets                   5.9.0
typing_extensions           NA
tzlocal                     NA
wcwidth                     0.2.6
yaml                        6.0
zipp                        NA
zmq                         25.0.2
-----
IPython             8.12.0
jupyter_client      8.2.0
jupyter_core        5.3.0
notebook            6.5.4
-----
Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:13) [Clang 14.0.6 ]
macOS-13.2.1-arm64-arm-64bit
NeuroRookie commented 1 year ago

Thanks a lot!

chrissymkcn commented 1 year ago

It happened many times on centos os I am using and I have been pulling at my hair. Finally what solved my issue is reinstalling traitlets to 5.9.0, which is apparently critical to operations in jupyter notebook. Reading the output logs of the crashed sessions really helps.

flying-sheep commented 7 months ago

OK, issues like this are almost always either memory or dependency problems: Something’s miscompiled or compiled for the wrong architecture (e.g. a newer CPU than you have) or simply buggy.

We have no native code in Scanpy, so we don’t cause segfaults. If there’s anything we can mitigate, we will, if someone demonstrates a reproducible problem with up-to-date dependencies