jinserk / pytorch-asr

ASR with PyTorch
GNU General Public License v3.0
140 stars 20 forks source link

importing model error #3

Closed homink closed 5 years ago

homink commented 5 years ago

Hi Jinserk,

I encountered the importing model error and tried to find what root cause is with some logs. But it has off the top of my head. Any suggestions?

...
try:
    print('models: ' + " ".join(models))
    print('input model: ' + model)
    print('trying importlib.import_module(f"asr.models.{model}")')
    m = importlib.import_module(f"asr.models.{model}")
    m.train(argv)
except:
    raise
...
python -V
Python 3.6.3 :: Anaconda custom (64-bit)

ls /home/kwon/EXP/ted_pytorch -alF
total 20940
drwxr-xr-x    5 kwon domain users      202 Jan  7 13:42 ./
drwxr-xr-x    3 kwon domain users       33 Jan  7 13:38 ../
drwxr-xr-x   10 kwon domain users      241 Jan  7 13:42 dev/
-rw-r--r--    1 kwon domain users     2022 Jan  7 13:42 dev_convert.txt
-rw-r--r--    1 kwon domain users   109533 Jan  7 13:42 dev.csv
drwxr-xr-x   13 kwon domain users      320 Jan  7 13:42 test/
-rw-r--r--    1 kwon domain users     2093 Jan  7 13:42 test_convert.txt
-rw-r--r--    1 kwon domain users   246191 Jan  7 13:42 test.csv
drwxr-xr-x 1497 kwon domain users    53248 Jan  7 13:40 train/
-rw-r--r--    1 kwon domain users   372408 Jan  7 13:42 train_convert.txt

ls -alF
total 88
drwxr-xr-x  4 kwon domain users   317 Jan  7 15:20 ./
drwxr-xr-x 18 kwon domain users  4096 Jan  7 10:09 ../
drwxr-xr-x  7 kwon domain users   128 Jan  7 11:47 asr/
-rw-r--r--  1 kwon domain users   455 Jan  7 10:10 batch_train.py
drwxr-xr-x  8 kwon domain users   211 Jan  7 10:10 .git/
-rw-r--r--  1 kwon domain users  1339 Jan  7 10:10 .gitignore
-rw-r--r--  1 kwon domain users 35147 Jan  7 10:10 LICENSE
-rw-r--r--  1 kwon domain users   451 Jan  7 10:10 predict.py
-rw-r--r--  1 kwon domain users   473 Jan  7 10:10 prepare.py
-rw-r--r--  1 kwon domain users  5381 Jan  7 10:10 README.md
-rw-r--r--  1 kwon domain users   547 Jan  7 10:10 requirements.txt
-rw-r--r--  1 kwon domain users   448 Jan  7 10:10 test.py
-rwxr-xr-x  1 kwon domain users   670 Jan  7 10:10 train_deepspeech.sh*
-rwxr-xr-x  1 kwon domain users   737 Jan  7 10:10 train_las.sh*
-rw-r--r--  1 kwon domain users   592 Jan  7 15:20 train.py

python train.py deepspeech_ctc --data-path /home/kwon/EXP/ted_pytorch
models: densenet deepspeech_ce deepspeech_var resnet_ce resnet_ctc resnet_split convnet ssvae capsule1 deepspeech_ctc capsule2 resnet_split_ce las densenet_ctc
input model: deepspeech_ctc
trying importlib.import_module(f"asr.models.{model}")
Segmentation fault (core dumped)

python train.py las --data-path /home/kwon/EXP/ted_pytorch
models: deepspeech_var las ssvae capsule2 convnet densenet densenet_ctc resnet_split resnet_split_ce deepspeech_ctc capsule1 resnet_ctc resnet_ce deepspeech_ce
input model: las
trying importlib.import_module(f"asr.models.{model}")
Segmentation fault (core dumped)

[kwon@ssi-dnn-slave-001 pytorch-asr]$ python train.py deepspeech_var --data-path /home/kwon/EXP/ted_pytorch
models: las convnet resnet_split densenet resnet_split_ce capsule2 ssvae deepspeech_ce deepspeech_ctc densenet_ctc resnet_ctc deepspeech_var resnet_ce capsule1
input model: deepspeech_var
trying importlib.import_module(f"asr.models.{model}")
Segmentation fault (core dumped)
jinserk commented 5 years ago

Hi @homink,

Hmm, it's odd. I guess the importlib has some issue to load modules. What OS are you using? Could you check where the segfault is generated by following this?

homink commented 5 years ago

CentOS 7. I found that importing _torch_sox gives error in my system. The following links could be similar symptoms but reinstalling pytorch/audio with pip or cloning&install doesn't work.

https://github.com/pytorch/audio/issues/62 https://github.com/pytorch/audio/issues/68

cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
python train.py deepspeech_ctc --data-path /home/kwon/EXP/ted_pytorch
models: convnet deepspeech_ce resnet_ce ssvae las capsule1 capsule2 deepspeech_var densenet_ctc resnet_ctc densenet resnet_split resnet_split_ce deepspeech_ctc
input model: deepspeech_ctc
trying importlib.import_module(f"asr.models.{model}")
Fatal Python error: Segmentation fault

Current thread 0x00007f2d90162740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 922 in create_module
  File "<frozen importlib._bootstrap>", line 571 in module_from_spec
  File "<frozen importlib._bootstrap>", line 658 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torchaudio/__init__.py", line 5 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "/home/kwon/3rdParty/pytorch-asr/asr/utils/dataset.py", line 15 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "/home/kwon/3rdParty/pytorch-asr/asr/models/deepspeech_ctc/train.py", line 10 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "/home/kwon/3rdParty/pytorch-asr/asr/models/deepspeech_ctc/__init__.py", line 1 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "<frozen importlib._bootstrap>", line 994 in _gcd_import
  File "/home/kwon/anaconda3/lib/python3.6/importlib/__init__.py", line 126 in import_module
  File "train.py", line 25 in <module>
Segmentation fault (core dumped)
echo $CPLUS_INCLUDE_PATH
/usr/include/sox:
which sox
/usr/bin/sox
which th
/usr/local/torch/install/bin/th
ls /home/kwon/anaconda3/lib/python3.6/site-packages/*.so -hal
-rwxr-xr-x 2 kwon domain users 185K Sep 17  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/_cffi_backend.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 2 kwon domain users 539K Sep 18  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/gmpy2.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 2 kwon domain users  36K Sep 18  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/greenlet.cpython-36m-x86_64-linux-gnu.so
-rwxrwxr-x 2 kwon domain users  93K Jul  5  2018 /home/kwon/anaconda3/lib/python3.6/site-packages/pycosat.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 2 kwon domain users 137K Sep 18  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/pycurl.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 2 kwon domain users 154K Sep 18  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/pyodbc.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 2 kwon domain users 121K Sep 18  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/sip.so
-rwxr-xr-x 1 kwon domain users 6.0M Jan  7 11:07 /home/kwon/anaconda3/lib/python3.6/site-packages/_torch_sox.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 2 kwon domain users 228K Sep 18  2017 /home/kwon/anaconda3/lib/python3.6/site-packages/_yaml.cpython-36m-x86_64-linux-gnu.so
python
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import _torch_sox
Segmentation fault (core dumped)
python
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _torch_sox
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /home/kwon/anaconda3/lib/python3.6/site-packages/_torch_sox.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE
jinserk commented 5 years ago

This could be a silly question, but did you install sox-devel using yum or sox package in anaconda env? Please try to reinstall after manually deletion of _torch_sox. If it is not effective, then you can consider the sox installation from the source. Actually I didn't use the anaconda env, but I guess the sox linked in your system looks like the system's package rather than the anaconda package, which could be the issue of consistency. In my experience, torch 1.0 has some ABI related issues.

homink commented 5 years ago

Anaconda looks not fully working with pytorch. pyenv works perfectly.

jmlemercier commented 4 years ago

Hello there, I would like to reopen the issue, as I had the same error when trying to use torchaudio: My OS is CentOS 7, I am using a Conda environment defined by the following .yml file:

`name: audinet_env channels:

The installed versions of for the packages of interest are then :

When using the torchaudio in my main function, I get the error Undefined symbol: ... when trying to import _torch_sox for the __init__.py script of torchaudio.

I had the same problem with a Ubuntu 18.04, which I solved by downgrading torchaudio to 0.3.1, but the same manoeuver does not work here.

I tried :

I haven't yet tried (because it is a real drag to get out of conda and switch everything to pipenv):

No success so far, would appreciate a little help