Open jiwidi opened 3 years ago
@jiwidi I'll fix Makefile
. Please retry it after the next PR.
@hirofumi0810 Hi again,
So I tried to run the same steps as in the original post and now im stuck with the warprnnt make step. My output is:
git clone https://github.com/HawkAaron/warp-transducer.git /mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer
Cloning into '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer'...
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 905 (delta 1), reused 5 (delta 1), pack-reused 894
Receiving objects: 100% (905/905), 248.13 KiB | 622.00 KiB/s, done.
Resolving deltas: 100% (462/462), done.
# Note: Requires gcc>=5.0 to build extensions with pytorch>=1.0
if . /mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/bin/activate && python -c 'import torch as t;assert t.__version__[0] == "1"' &> /dev/null; then \
. /mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/bin/activate && python -c "from distutils.version import LooseVersion as V;assert V('10.2.0') >= V('5.0'), 'Requires gcc>=5.0'"; \
fi
. /mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/bin/activate; cd /mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer && mkdir build && cd build && cmake .. && make; true
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "11.1")
-- cuda found TRUE
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build
make[1]: Entering directory '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build'
make[2]: Entering directory '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build'
make[3]: Entering directory '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build'
[ 7%] Building NVCC (Device) object CMakeFiles/warprnnt.dir/src/warprnnt_generated_rnnt_entrypoint.cu.o
nvcc fatal : Unsupported gpu architecture 'compute_30'
CMake Error at warprnnt_generated_rnnt_entrypoint.cu.o.cmake:220 (message):
Error generating
/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build/CMakeFiles/warprnnt.dir/src/./warprnnt_generated_rnnt_entrypoint.cu.o
make[3]: *** [CMakeFiles/warprnnt.dir/build.make:65: CMakeFiles/warprnnt.dir/src/warprnnt_generated_rnnt_entrypoint.cu.o] Error 1
make[3]: Leaving directory '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build'
make[2]: *** [CMakeFiles/Makefile2:191: CMakeFiles/warprnnt.dir/all] Error 2
make[2]: Leaving directory '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build'
make[1]: *** [Makefile:130: all] Error 2
make[1]: Leaving directory '/mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/build'
. /mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/bin/activate; cd /mnt/kingston/github/neural_sp/tools/neural_sp/warp-transducer/pytorch_binding && python setup.py install
Could not find libwarprnnt.so in ../build.
Build warp-rnnt and set WARP_RNNT_PATH to the location of libwarprnnt.so (default is '../build')
make: *** [Makefile:93: warp-transducer.done] Error 1
Seems like the error is
nvcc fatal : Unsupported gpu architecture 'compute_30'
I have a rtx 3090 from the latest nvidia gen, do you know if this repo is updated to compile with them? Also, since I want to test the LAS and transformer architecture in librispeech recipe I think i wont need the transducer right? Any way to skip this step?
Thanks
I found this PR on the repo with support for compute 30 https://github.com/HawkAaron/warp-transducer/pull/76, will give it a try and come back
EDIT: Managed to compile it with the branch at https://github.com/ncilfone/warp-transducer/tree/3691b3fa5483e911645738a7894c48fe1f116c9b.
Also discovered I couldnt run the run.sh script with sh run.sh
since it will get the same error:
============================================================================
LibriSpeech
============================================================================
run.sh: 14: ./path.sh: source: not found
run.sh: 34: utils/parse_options.sh: Syntax error: Bad for loop variable
It has to be run with ./run.sh --gpu 1
. This downloads all the data and does some preprocessing it stops during the data prep, just stops the script with no error.
It fails on data_prep.sh:
for part in dev-clean test-clean dev-other test-other train-clean-100 train-clean-360 train-other-500; do
# use underscore-separated names in data directories.
local/data_prep.sh ${data_download_path}/LibriSpeech/${part} ${data}/$(echo ${part} | sed s/-/_/g) || exit 1;
done
Specifically on utils/validate_data_dir.sh --no-feats $dst || exit 1;
But doesnt give any specific output or complains, the full run.sh output:
============================================================================
LibriSpeech
============================================================================
============================================================================
Data Preparation (stage:0)
============================================================================
local/download_and_untar.sh: data part dev-clean was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part test-clean was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part dev-other was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part test-other was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part train-clean-100 was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part train-clean-360 was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part train-other-500 was already successfully extracted, nothing to do.
Downloading file '3-gram.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'3-gram.arpa.gz' already exists and appears to be complete
Downloading file '3-gram.pruned.1e-7.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'3-gram.pruned.1e-7.arpa.gz' already exists and appears to be complete
Downloading file '3-gram.pruned.3e-7.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'3-gram.pruned.3e-7.arpa.gz' already exists and appears to be complete
Downloading file '4-gram.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'4-gram.arpa.gz' already exists and appears to be complete
Downloading file 'g2p-model-5' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'g2p-model-5' already exists and appears to be complete
Downloading file 'librispeech-lm-corpus.tgz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'librispeech-lm-corpus.tgz' already exists and appears to be complete
Downloading file 'librispeech-vocab.txt' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'librispeech-vocab.txt' already exists and appears to be complete
Downloading file 'librispeech-lexicon.txt' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'librispeech-lexicon.txt' already exists and appears to be complete
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: computed /mnt/kingston/asr-datasets/neural-sp//dev_clean/utt2dur
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
@hirofumi0810 I managed to skip last problem skipping the data validation step (assumming al processing went right) and now I'm stuck at the LM training, it fails due to a cudnn error. I think its related with my cuda installation/rtx3090 and the code. This has already happened to me with different frameworks already. I have run pytest on the neural_sp root and all 501 test passed so I dont know how to debug it.
Running:
../../../neural_sp/bin/lm/train.py --corpus librispeech --config conf/lm/rnnlm.yaml --n_gpus 1 --cudnn_benchmark true --train_set /n/work2/inaguma/corpus/librispeech/dataset_lm/train_100_vocab100_wpbpe10000_external.tsv --dev_set /n/work2/inaguma/corpus/librispeech/dataset_lm/dev_clean_100_vocab100_wpbpe10000.tsv --eval_sets /n/work2/inaguma/corpus/librispeech/dataset_lm/dev_other_100_vocab100_wpbpe10000.tsv /n/work2/inaguma/corpus/librispeech/dataset_lm/test_clean_100_vocab100_wpbpe10000.tsv /n/work2/inaguma/corpus/librispeech/dataset_lm/test_other_100_vocab100_wpbpe10000.tsv --unit wp --dict /n/work2/inaguma/corpus/librispeech/dict/train_100_wpbpe10000.txt --wp_model /n/work2/inaguma/corpus/librispeech/dict/train_100_bpe10000.model --model_save_dir /n/work2/inaguma/results/librispeech/lm --stdout true --resume
Generates this error:
2021-01-03 20:36:39,060 neural_sp.models.base line:108 INFO: torch.backends.cudnn.enabled: True
Traceback (most recent call last):
File "../../../neural_sp/bin/lm/train.py", line 347, in <module>
save_path = pr.runcall(main)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "../../../neural_sp/bin/lm/train.py", line 178, in main
model.cuda()
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 260, in cuda
return self._apply(lambda t: t.cuda(device))
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 117, in _apply
self.flatten_parameters()
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
[1] 172006 bus error (core dumped) ../../../neural_sp/bin/lm/train.py --corpus librispeech --config --n_gpus 1
Have you encountered this error before? Any tips to solve it or debug it?
@jiwidi --benchmark false
in run.sh
will fix this.
@hirofumi0810 Hi! Thanks for the help.
I tried that and now it fails in another step. It does starts the first minibatch though.
0%| | 0/982390016 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/mnt/kingston/github/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 353, in <module>
save_path = pr.runcall(main)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/mnt/kingston/github/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 227, in main
loss, hidden, observation = model(ys_train, state=hidden)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/kingston/github/neural_sp/neural_sp/models/lm/lm_base.py", line 55, in forward
loss, state, observation = self._forward(ys, state)
File "/mnt/kingston/github/neural_sp/neural_sp/models/lm/lm_base.py", line 63, in _forward
logits, out, new_state = self.decode(ys_in, state=state, mems=state)
File "/mnt/kingston/github/neural_sp/neural_sp/models/lm/rnnlm.py", line 220, in decode
ys_emb = self.glu(ys_emb)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/kingston/github/neural_sp/neural_sp/models/modules/glu.py", line 26, in forward
return F.glu(self.fc(xs), dim=-1)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 67, in forward
return F.linear(input, self.weight, self.bias)
File "/mnt/kingston/github/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 1354, in linear
output = input.matmul(weight.t())
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCBlas.cu:258
Do you know of anyone who has successfully run this code in rtx 3000 series cards?
@jiwidi Are you able to train ASR models in stage-4? (by skipping stage-3)
@hirofumi0810 Hi
Sorry Ive been out the last weeks, its a busy week for me this one but will try it on the weekend. Thanks
facing the same error during installation:nvcc fatal : Unsupported gpu architecture 'compute_30'
jiwidi
Hi me and my colleague have run the model with aishell2 recipe on rtx3090. We had the same compute 30 problem and resolved it by commenting out 1 or 2 lines in regarding cmake file
Hi Hiro!
First, thank you for the repo. I've been following for a while and I saw you implement a big number of dl architectures.
So far I was only watching the repo from time to time, but now I would like to see If I can reproduce some results and eventually use it with custom datasets. I tried to reproduce librispeech experiment without success and need some help with it.
I went ahead and follow the installation instructions:
Kaldi complained about a few libraries but after installing them manually the make command run successfully. After this a conda environment was created under my path:
/mnt/kingston/github/neural_sp/tools/miniconda
. I activatated it withconda activate /mnt/kingston/github/neural_sp/tools/miniconda
and proceeded to runBut got the following output:
Have I missed an important part of the installation process? Do you have a more detailed list of steps I should follow in order to reproduce? Any help would be very much appreciated thanks.