scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.89k stars 595 forks source link

LLVM ERROR: Symbol not found: __svml_sqrtf8 #1696

Open TiongSun opened 3 years ago

TiongSun commented 3 years ago

Hi all,

When running "sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)" from this tutorial , the following error arises

"LLVM ERROR: Symbol not found: __svml_sqrtf8 error when running"

numba = 0.51.2 scanpy = 1.7.1

Does anyone encounter similar issue?

Thanks!

giovp commented 3 years ago

can you try to update to numba=0.52 and see if it's still an issue?

wflynny commented 3 years ago

FWIW, I stumbled upon a related issue this morning where my kernel just crashes/restarts computing neighbors.

For me it appears to crop up when the number of neighbors is <15, metric doesn't appear to matter. I've been upgrading/downgrading various dependencies, and I'm fairly certain this has to do with the call to NNDescent in umap.umap_.py as if I import that directly, it raises the same errors.

Currently have numba=0.52 llvmlite=0.35.0 scanpy=1.7.1 pynndescent=0.5.2 umap-learn=0.5.1. Rebuilding my environment from scratch and will update with a complete package list.

giovp commented 3 years ago

what python version are you running also btw? I remember there were issues with 0.52 on 3.9 and so they put out an RC.

wflynny commented 3 years ago

Fresh install in a new env gives me the same error (jupyter kernel crashes):

conda create --name squidpy python=3.8 seaborn scikit-learn statsmodels numba pytables
conda activate squidpy
conda install -c conda-forge leidenalg python-igraph
pip install scanpy squidpy imctools stardist

And here's the sc.logging.print_versions():

-----
anndata     0.7.5
scanpy      1.7.1
sinfo       0.3.1
-----
PIL                 8.1.2
anndata             0.7.5
asciitree           NA
backcall            0.2.0
cairo               1.20.0
cffi                1.14.5
cmocean             2.0
constants           NA
cycler              0.10.0
cython_runtime      NA
dask                2021.03.0
dateutil            2.8.1
decorator           4.4.2
docrep              0.3.2
fasteners           NA
get_version         2.1
h5py                2.10.0
highs_wrapper       NA
igraph              0.8.3
imagecodecs         2020.12.24
imageio             2.9.0
ipykernel           5.5.0
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.18.0
joblib              1.0.1
kiwisolver          1.3.1
legacy_api_wrap     1.2
leidenalg           0.8.3
llvmlite            0.35.0
matplotlib          3.3.4
mpl_toolkits        NA
natsort             7.1.1
networkx            2.5
numba               0.52.0
numcodecs           0.7.3
numexpr             2.7.3
numpy               1.20.1
packaging           20.9
pandas              1.2.3
parso               0.8.1
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.17
ptyprocess          0.7.0
pycparser           2.20
pygments            2.8.1
pyparsing           2.4.7
pytz                2021.1
pywt                1.1.1
scanpy              1.7.1
scipy               1.6.0
seaborn             0.11.1
sinfo               0.3.1
six                 1.15.0
skimage             0.18.1
sklearn             0.24.1
squidpy             1.0.0
statsmodels         0.12.2
storemagic          NA
tables              3.6.1
texttable           1.6.3
tifffile            2021.3.5
tornado             6.1
traitlets           5.0.5
typing_extensions   NA
wcwidth             0.2.5
xarray              0.17.0
yaml                5.4.1
zarr                2.6.1
zmq                 22.0.3
-----
IPython             7.21.0
jupyter_client      6.1.11
jupyter_core        4.7.1
notebook            6.2.0
-----
Python 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0]
Linux-3.10.0-1062.1.2.el7.x86_64-x86_64-with-glibc2.10
72 logical CPU cores, x86_64
-----
Session information updated at 2021-03-12 11:42
giovp commented 3 years ago

thanks @wflynny ! I just repeated exactly your snippet for fresh conda env and am not able to reproduce:

import scanpy as sc
adata = sc.datasets.paul15()
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
>>> adata
AnnData object with n_obs × n_vars = 2730 × 3451
    obs: 'paul15_clusters'
    uns: 'iroot', 'neighbors'
    obsm: 'X_pca'
    obsp: 'distances', 'connectivities'

I also tried with sc.datasets.pbmc3k_processed() and still get it to run. Can you both please paste the full traceback? also maybe some more infos on the anndata would be useful?

thank you!

giovp commented 3 years ago

and I'm fairly certain this has to do with the call to NNDescent in umap.umap_.py as if I import that directly, it raises the same errors.

sorry just read this, this sounds it could be potentially data specific, have you tried playing around with other nndescent params?

wflynny commented 3 years ago

Yeah, I can't reproduce it with a canned dataset either --- I'm doing something a bit weird and transforming imaging mass cytometry data into AnnData objects (hence the imctools dependency). I have an object that looks like:

AnnData object with n_obs × n_vars = 68865 × 29
    obs: 'nuclei_counts', 'n_antibodies_by_intensity', 'log1p_n_antibodies_by_intensity', 'total_intensity', 'log1p_total_intensity', 'n_counts'
    var: 'ab_mass', 'ab_name', 'n_cells_by_intensity', 'mean_intensity', 'log1p_mean_intensity', 'pct_dropout_by_intensity', 'total_intensity', 'log1p_total_intensity', 'highly_variable'
    uns: 'spatial', 'log1p', 'pca',
    obsm: 'X_spatial', 'X_spatial_lowres', 'X_pca'
    varm: 'PCs'
    layers: 'cleaned', 'normed', 'lognormed'

I will probably raise this with pynndescent then because

sc.pp.neighbors(imc42, n_pcs=10, metric="euclidean", n_neighbors=15)  # <-- works
sc.pp.neighbors(imc42, n_pcs=10, metric="correlation", n_neighbors=15)  # <-- works
sc.pp.neighbors(imc42, n_pcs=10, metric="euclidean", n_neighbors=11)  # <-- crashes
sc.pp.neighbors(imc42, n_pcs=10, metric="correlation", n_neighbors=11)  # <-- crashes
sc.pp.neighbors(imc42, n_pcs=10, metric="euclidean", n_neighbors=5)  # <-- crashes
sc.pp.neighbors(imc42, n_pcs=10, metric="correlation", n_neighbors=5)  # <-- crashes

Sorry for hijacking this issue @giovp and @TiongSun .

giovp commented 3 years ago

Yeah, I can't reproduce it with a canned dataset either --- I'm doing something a bit weird and transforming imaging mass cytometry data into AnnData objects (hence the imctools dependency). I have an object that looks like:

thank you for reporting, this is very interesting use case! and thanks for the detailed evaluation. I would also try with different number of PCs to see whether that has an impact.

if you open an issue on pynndescent, would you mind referencing this issue or pinging me there, would be interested to see what's the proposed solution/bug

@TiongSun let us know about your use case, thanks!

wflynny commented 3 years ago

@giovp Looking more into the crashing I was getting with my strange use case, it turns out that I had both (a) a pair of completely correlated features, and (b) very strange count distributions. Once I used a proper variance stabilizing transform (arcsinh in this case) and remove redundant features, I can't reliably reproduce this issue.

giovp commented 3 years ago

(a) a pair of completely correlated features, and (b) very strange count distributions

can you elaborate more on this? what does it mean "completely correlated", like an identical copy ?

Once I used a proper variance stabilizing transform (arcsinh in this case) and remove redundant features

Interesting, never seen this used in scRNA-seq, is it common in IMC ?

wflynny commented 3 years ago

Yes, we had what seems like an identical copy of one channel copied to another channel (which should have been empty). This appears to be an issue with the acquisition/initial file generation, but honestly I'm still scratching my head how this happened.

And yeah, this vst is new to me too. At least in my experience the mean-variance relationship with scRNA-seq is roughly quadratic whereas at least empirically IMC data has a quadratic relationship at high mean expression but is dominated by a noise term at low mean expression, which leads to a y ~ asinh(cofactor * x). And at least others in the field have reached the same conclusion. FWIW, this might be standard knowledge in other fields, but was at least new to me (mostly deal with poisson or (negative-)binomial data).

giovp commented 3 years ago

very interesting, also new to me. I think this boils down to issues in pynndescent not being able to handle such edge cases. I wonder if this happens with other metrics as well...

@TiongSun can you update us on whether this is a similar issue for you?

LuckyMD commented 3 years ago

Just skimmed this very briefly. Isn't there usually a background effect that is removed for CyTOF or IMC, so that you get negative values and therefore need an arcsinh transformation? This also leads to negative values i guess. Could this be an issue for pynndescent?

wflynny commented 3 years ago

There's a possibility of negative values depending on how careful you are with compensation and whether or not you clip values, but at least in my case the counts matrix was always non-negative. Edit: But that shouldn't matter because NNDescent is routinely called on PCA-embedded data which is zero centered, right?

If I can find a small subset of the matrix that produces this error reliably, I will share that with the pynndescent repo and link back here. Currently that's challenging given the original size of the matrix (a few million observations).

TiongSun commented 3 years ago

Sorry for the late reply. The issue seems to only occur in Win10 but not Linux. No error on linux using the exact same codes.

likeben commented 2 years ago

https://github.com/lmcinnes/umap/issues/702#issuecomment-1002396093 Solve this problem