neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.98k stars 172 forks source link

Kernel restart when running jupyter notebooks #121

Closed james20141606 closed 3 years ago

james20141606 commented 3 years ago

Hi, I tried to run the following codes and it seems that it could run smoothly on my mac/terminal, but always died if I run in jupyter notebook:

from sparseml.pytorch.models import ModelRegistry
from sparseml.pytorch.datasets import ImagenetteDataset, ImagenetteSize

The error information:

DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.4.0 (bc00bf8b) (release) (optimized)
Date: 06-05-2021 @ 03:09:41 EDT
OS: Linux gv02.nyu.cluster 4.18.0-193.28.1.el8_2.x86_64 neuralmagic/sparseml#1 SMP Fri Oct 16 13:38:49 EDT 2020
Arch: x86_64
CPU:
Vendor:
Cores/sockets/threads: [0, 0, 0]
Available cores/sockets/threads: [0, 0, 0]
L1 cache size data/instruction: 0k/0k
L2 cache size: 0Mb
L3 cache size: 0Mb
Total memory: 377.337G
Free memory: 334.654G

Assertion at src/lib/core/cpu.cpp:263
Backtrace:
 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 1# wand::detail::assert_fail(char const*, char const*, int) in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 2# 0x0000148A66A5E51C in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 3# 0x0000148A66A5EEDD in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 4# 0x0000148AF53F8783 in /lib64/ld-linux-x86-64.so.2
 5# 0x0000148AF53FD24F in /lib64/ld-linux-x86-64.so.2
 6# _dl_catch_exception in /lib/x86_64-linux-gnu/libc.so.6
 7# 0x0000148AF53FC81A in /lib64/ld-linux-x86-64.so.2
 8# 0x0000148AF4BD4F96 in /lib/x86_64-linux-gnu/libdl.so.2
 9# _dl_catch_exception in /lib/x86_64-linux-gnu/libc.so.6
10# _dl_catch_error in /lib/x86_64-linux-gnu/libc.so.6
11# 0x0000148AF4BD5745 in /lib/x86_64-linux-gnu/libdl.so.2
12# dlopen in /lib/x86_64-linux-gnu/libdl.so.2
13# _PyImport_FindSharedFuncptr in /ext3/miniconda3/bin/python
14# _PyImport_LoadDynamicModuleWithSpec in /ext3/miniconda3/bin/python
15# 0x000055B382CAAE49 in /ext3/miniconda3/bin/python
16# _PyMethodDef_RawFastCallDict in /ext3/miniconda3/bin/python
17# _PyCFunction_FastCallDict in /ext3/miniconda3/bin/python
18# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
19# _PyEval_EvalCodeWithName in /ext3/miniconda3/bin/python
20# _PyFunction_FastCallKeywords in /ext3/miniconda3/bin/python
21# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
22# _PyFunction_FastCallKeywords in /ext3/miniconda3/bin/python
23# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python

version: I tried to install sparseml using pip install sparseml, and it will install torch with version 1.8.1+cu102 (which I found strange since the doc said sparseml requires <=1.8.0). I also tried to downgrade torch to 1.8.0 but the same error still happens. The error appears both on CPU or GPU.

james20141606 commented 3 years ago

seems that the bug is caused by a wrong torch version. If I don't have pytorch installed first, pip install sparseml will install pytorch with version 1.8.1+cu102 which is not compatible with my cuda version and sparseml's requirement. If I install appropriate pytorch first the bug won't appear

james20141606 commented 3 years ago

edit: it seems that the above could not solve the problem entirely. I could still got the same error.

markurtz commented 3 years ago

Hi @james20141606, thank you for bringing this up! There definitely is an issue where PyTorch should not be installing with 1.8. We'll be sure to get that fixed ASAP, that is unrelated to this, though (thank you for finding that!).

This issue will be with the DeepSparse engine and a hardware dependency. I'm going to move this over to the DeepSparse repository and we'll have someone on the runtime team reach out for more info on the environment it's being run on.

james20141606 commented 3 years ago

thanks for the help. seems that it is unrelated to pytorch version. I agree it should be a hardware dependency issue. seems that deepspare engine might not be compatible with some CPUs/GPUs.

bnellnm commented 3 years ago

Hi @james20141606 , can you supply some more details about the system you are running on? e.g. OS version/distribution, CPU version and contents of /proc/cpuinfo (if possible). It would also help to know if you are running inside a VM and if so, which VM software is being used.

jeanniefinks commented 3 years ago

Hello @james20141606 Let us know if the issue is still happening and you are able to share some more information per the last message? If we don't hear back, we'll close out this issue in the next few weeks as we'll assume you're all set. Thank you.

Cheers, Jeannie / Neural Magic

jeanniefinks commented 3 years ago

Hi @james20141606 As there has been no response, we'll close this issue but please re-open it if you're able to provide some more details per our last few comments. Thank you. Regards, Jeannie / Neural Magic