microsoft / LQ-Nets

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
MIT License
239 stars 70 forks source link

Frequent `Segmentation Fault (Core dumped)` #8

Open zhutmost opened 5 years ago

zhutmost commented 5 years ago

I am trying to run the code in the usage part of the README file. python imagenet.py --gpu 0,1,2,3 --data /home/bcrc/Datasets/imagenet --mode pre ....... However, I encountered 'core dump' error frequently during quantizing.

I am not familiar with debugging the dumped core file with python. I can give the core file if someone can help me (It is too large to upload). Or anyone can give me some instructions?

zhutmost commented 5 years ago

I am using anaconda environment. Here is the package list.

# packages in environment at /home/hzzhu/Software/anaconda3/envs/tensorflow:
#
# Name                    Version                   Build  Channel
absl-py                   0.7.0                    pypi_0    pypi
astor                     0.7.1                    pypi_0    pypi
backcall                  0.1.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
bleach                    3.1.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates           2019.1.23                     0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi                   2018.11.29               py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
decorator                 4.3.2                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
entrypoints               0.3                      py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
gast                      0.2.2                    pypi_0    pypi
gmp                       6.1.2                h6c8ec71_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
google-pasta              0.1.4                    pypi_0    pypi
grpcio                    1.18.0                   pypi_0    pypi
h5py                      2.9.0                    pypi_0    pypi
intel-openmp              2019.1                      144    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipykernel                 5.1.0            py37h39e3cac_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipython                   7.2.0            py37h39e3cac_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipython_genutils          0.2.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jedi                      0.13.2                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jinja2                    2.10                     py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jsonschema                2.6.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jupyter_client            5.2.4                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jupyter_core              4.4.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-applications        1.0.7                    pypi_0    pypi
keras-preprocessing       1.0.9                    pypi_0    pypi
libedit                   3.1.20181209         hc058e9b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi                    3.2.1                hd88cf55_4    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng                 8.2.0                hdf63c60_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgfortran-ng            7.3.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libsodium                 1.0.16               h1bed415_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng              8.2.0                hdf63c60_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
markdown                  3.0.1                    pypi_0    pypi
markupsafe                1.1.0            py37h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mistune                   0.8.4            py37h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl                       2019.1                      144    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft                   1.0.10           py37ha843d7b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random                1.0.2            py37hd81dba3_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
msgpack                   0.6.1                    pypi_0    pypi
msgpack-numpy             0.4.4.2                  pypi_0    pypi
nb_conda                  2.2.1                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nb_conda_kernels          2.2.0                    py37_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nbconvert                 5.3.1                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nbformat                  4.4.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ncurses                   6.1                  he6710b0_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
notebook                  5.7.4                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy                     1.16.1                   pypi_0    pypi
numpy-base                1.15.4           py37hde5b4d6_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opencv-python             4.0.0.21                 pypi_0    pypi
openssl                   1.1.1a               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pandoc                    2.2.3.2                       0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pandocfilters             1.4.2                    py37_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
parso                     0.3.2                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pexpect                   4.6.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pickleshare               0.7.5                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip                       19.0.1                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
prometheus_client         0.5.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
prompt_toolkit            2.0.8                      py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
protobuf                  3.6.1                    pypi_0    pypi
ptyprocess                0.6.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pygments                  2.3.1                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python                    3.7.2                h0371630_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil           2.7.5                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-prctl              1.7                      pypi_0    pypi
pyzmq                     17.1.2           py37he6710b0_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
readline                  7.0                  h7b6447c_5    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
send2trash                1.5.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
setuptools                40.8.0                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six                       1.12.0                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sqlite                    3.26.0               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tabulate                  0.8.3                    pypi_0    pypi
tb-nightly                1.13.0a20190224          pypi_0    pypi
tensorpack                0.9.1                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
terminado                 0.8.1                    py37_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
testpath                  0.4.2                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tf-estimator-nightly      1.14.0.dev2019022501          pypi_0    pypi
tf-nightly-gpu            1.13.0.dev20190224          pypi_0    pypi
tk                        8.6.8                hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tornado                   5.1.1            py37h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tqdm                      4.31.1                   pypi_0    pypi
traitlets                 4.3.2                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wcwidth                   0.1.7                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
webencodings              0.5.1                    py37_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
werkzeug                  0.14.1                   pypi_0    pypi
wheel                     0.32.3                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xz                        5.2.4                h14c3975_4    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zeromq                    4.3.1                he6710b0_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zlib                      1.2.11               h7b6447c_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zhutmost commented 5 years ago

I am running the code on Ubuntu 18.04 LTS, with CUDA10. The graphic driver is the default one which is attached to the CUDA installation package.

zhutmost commented 5 years ago

I tried gdb python core. Here is output in the console.

$ gdb python core
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.

warning: core file may not match specified executable file.
[New LWP 6635]
[New LWP 6297]
[New LWP 6301]
[New LWP 6300]
[New LWP 6311]
[New LWP 6317]
[New LWP 6351]
[New LWP 6349]
[New LWP 6314]
[New LWP 6305]
[New LWP 6320]
[New LWP 6723]
[New LWP 6322]
[New LWP 6324]
[New LWP 6326]
[New LWP 6381]
[New LWP 6315]
[New LWP 6303]
[New LWP 6323]
[New LWP 6325]
[New LWP 6327]
[New LWP 6318]
[New LWP 6641]
[New LWP 6631]
[New LWP 6316]
[New LWP 6642]
[New LWP 6637]
[New LWP 6750]
[New LWP 6639]
[New LWP 6319]
[New LWP 6654]
[New LWP 6304]
[New LWP 6310]
[New LWP 6309]
[New LWP 6306]
[New LWP 6659]
[New LWP 6727]
[New LWP 6655]
[New LWP 6710]
[New LWP 6652]
[New LWP 6660]
[New LWP 6645]
[New LWP 6658]
[New LWP 6339]
[New LWP 6338]
[New LWP 6656]
[New LWP 6657]
[New LWP 6648]
[New LWP 6632]
[New LWP 6347]
[New LWP 6644]
[New LWP 6640]
[New LWP 6653]
[New LWP 6643]
[New LWP 6651]
[New LWP 6588]
[New LWP 6630]
[New LWP 6650]
[New LWP 6341]
[New LWP 6380]
[New LWP 6628]
[New LWP 6646]
[New LWP 6647]
[New LWP 6350]
[New LWP 6748]
[New LWP 6321]
[New LWP 6629]
[New LWP 6344]
[New LWP 6603]
[New LWP 6332]
[New LWP 6589]
[New LWP 6302]
[New LWP 6308]
[New LWP 6352]
[New LWP 6340]
[New LWP 6342]
[New LWP 6343]
[New LWP 6345]
[New LWP 6649]
[New LWP 6638]
[New LWP 6307]
[New LWP 6382]
[New LWP 6633]
[New LWP 6636]
[New LWP 6751]
[New LWP 6634]
[New LWP 6749]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python imagenet.py --gpu 0,1,2,3 --data /home/bcrc/Datasets/imagenet --mode pre'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5e08167ff9 in nsync::nsync_mu_lock(nsync::nsync_mu_s_*) () from /home/hzzhu/Software/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
[Current thread is 1 (Thread 0x7f5c47fff700 (LWP 6635))]
(gdb) 
dhkwon1122 commented 5 years ago

I got same error with example command. did you solve it?

zhutmost commented 5 years ago

I got same error with example command. did you solve it?

Sorry, not yet. If you have any idea, I can have a try. @dhkwon1122

EowinYe commented 5 years ago

@zhutmost @dhkwon1122 , thank you for letting me know. I have upload the new version to support for the latest tensorflow and tensorpack. But it's still under testing. You can try the branch support-latest-tf-tensorpack. If your environment is tf 1.13 and the latest tensorpack with CUDA 10.0 and CUDNN 7.5, you may face with the problem like this: UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. You can downgrade your tf version to solve it.