yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.24k stars 114 forks source link

Can not import ngtpy with torch (segmentation fault) #37

Closed ilham-bintang closed 4 years ago

ilham-bintang commented 4 years ago

I did not know this exactly pytorch issue or ngt issue. I also submit this issue in pytorch https://github.com/pytorch/pytorch/issues/26405#issue-495238678

πŸ› Bug

I did not know what happened. I use ngt (https://github.com/yahoojapan/NGT)

To Reproduce

Steps to reproduce the behavior:

  1. Import torch, then other library -> Segfault
  2. Import other library, then torch -> Free(): invalid pointer

Expected behavior

can import properly

Environment

PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: Tesla V100-DGXS-32GB
GPU 1: Tesla V100-DGXS-32GB
GPU 2: Tesla V100-DGXS-32GB
GPU 3: Tesla V100-DGXS-32GB

Nvidia driver version: 410.104
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.5.0

Versions of relevant libraries:
[pip3] numpy==1.17.2
[pip3] pytorch-transformers==1.2.0
[pip3] torch==1.2.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2019.3                      199
[conda] mkl-service               1.1.2            py37he904b0f_5
[conda] mkl_fft                   1.0.10           py37ha843d7b_0
[conda] mkl_random                1.0.2            py37hd81dba3_0

Additional context

I inspect it using gdb, and the backtrace output:

#0  0x00007fff1eaf3e3e in pybind11::detail::make_new_python_type (rec=...) at /opt/python/cp36-cp36m/include/python3.6m/pybind11/detail/class.h:565
#1  0x00007fff1eaf938e in pybind11::detail::generic_type::initialize (this=this@entry=0x7fffffffc900, rec=...) at /opt/python/cp36-cp36m/include/python3.6m/pybind11/pybind11.h:902
#2  0x00007fff1eac1a97 in pybind11::class_<Index>::class_<> (name=0x7fff1eb61378 "Index", scope=..., this=0x7fffffffc900)
    at /opt/python/cp36-cp36m/include/python3.6m/pybind11/pybind11.h:1092
#3  pybind11_init_ngtpy (m=...) at src/ngtpy.cpp:436
#4  0x00007fff1eac34d0 in PyInit_ngtpy () at src/ngtpy.cpp:421
#5  0x00000000005e4268 in _PyImport_LoadDynamicModuleWithSpec ()
#6  0x00000000005e4522 in ?? ()
#7  0x000000000056246e in PyCFunction_Call ()
#8  0x00000000004fed26 in _PyEval_EvalFrameDefault ()
#9  0x00000000004f6128 in ?? ()
#10 0x00000000004f7d60 in ?? ()
#11 0x00000000004f876d in ?? ()
#12 0x00000000004f98c7 in _PyEval_EvalFrameDefault ()
#13 0x00000000004f7a28 in ?? ()
#14 0x00000000004f876d in ?? ()
#15 0x00000000004f98c7 in _PyEval_EvalFrameDefault ()
#16 0x00000000004f7a28 in ?? ()
#17 0x00000000004f876d in ?? ()
#18 0x00000000004f98c7 in _PyEval_EvalFrameDefault ()
#19 0x00000000004f7a28 in ?? ()
#20 0x00000000004f876d in ?? ()
#21 0x00000000004f98c7 in _PyEval_EvalFrameDefault ()
#22 0x00000000004f7a28 in ?? ()
#23 0x00000000004f876d in ?? ()
#24 0x00000000004f98c7 in _PyEval_EvalFrameDefault ()
#25 0x00000000004f4065 in _PyFunction_FastCallDict ()
#26 0x000000000057c8f1 in _PyObject_FastCallDict ()
#27 0x000000000057cc5e in _PyObject_CallMethodIdObjArgs ()
#28 0x00000000004cf5dd in PyImport_ImportModuleLevelObject ()
#29 0x00000000004fb864 in _PyEval_EvalFrameDefault ()
#30 0x00000000004f6128 in ?? ()
#31 0x00000000004f9023 in PyEval_EvalCode ()
#32 0x00000000006415b2 in ?? ()
#33 0x000000000064166a in PyRun_FileExFlags ()
#34 0x0000000000643730 in PyRun_SimpleFileExFlags ()
#35 0x000000000062b26e in Py_Main ()
#36 0x00000000004b4cb0 in main ()
ilham-bintang commented 4 years ago

Fix regarding this issue: #34

masajiro commented 4 years ago

Thank you for your detailed information. As far as I investigated, this segmention fault is caused by pybind11 v2.3.0. To avoid this fault, please build ngtpy from source with pybind11 v2.4.0 or older than equal v2.2.4. Ngtpys of pypi from v1.7.6 to v1.7.9 have the same problem. Another option is using ngt python ctypes bindings.

masajiro commented 4 years ago

I have released a ngt python package to resolve this issue.