Closed othrif closed 3 years ago
Can you clear the docker system (with docker system prune) and clear the caches and try again? I had similar error (especially torch is a heavy file, so if you tried this multiple times, the memory might be full).
Also, which Dockerfile did you run? There are two, and I had success with building from wave2letter.Dockerfile
Thanks @raja1196 for the tip. But clearing the cache and switch towav2letter.Dockerfile
by running:
docker build -t wav2vec -f wav2letter.Dockerfile .
did not help.
=> [8/9] WORKDIR /root/fairseq 0.0s
=> ERROR [9/9] RUN TMPDIR=/data/mydir/ pip install --cache-dir=/data/mydir/ --editable ./ && python examples/speech_recognition/infer.py --help && p 28.9s
------
> [9/9] RUN TMPDIR=/data/mydir/ pip install --cache-dir=/data/mydir/ --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help:
#14 0.611 Obtaining file:///root/fairseq
#14 0.614 Installing build dependencies: started
#14 3.306 Installing build dependencies: finished with status 'done'
#14 3.306 Getting requirements to build wheel: started
#14 3.501 Getting requirements to build wheel: finished with status 'done'
#14 3.505 Installing backend dependencies: started
#14 6.264 Installing backend dependencies: finished with status 'done'
#14 6.265 Preparing wheel metadata: started
#14 6.589 Preparing wheel metadata: finished with status 'done'
#14 6.792 Requirement already satisfied: numpy<1.20.0 in /usr/local/lib/python3.6/dist-packages (from fairseq==1.0.0a0+c8a0659) (1.18.2)
#14 6.823 Requirement already satisfied: cffi in /usr/local/lib/python3.6/dist-packages (from fairseq==1.0.0a0+c8a0659) (1.14.4)
#14 6.967 Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from fairseq==1.0.0a0+c8a0659) (4.44.1)
#14 8.065 Collecting hydra-core<1.1
#14 8.155 Downloading hydra_core-1.0.4-py3-none-any.whl (122 kB)
#14 8.283 Collecting antlr4-python3-runtime==4.8
#14 8.299 Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
#14 8.540 Collecting omegaconf<2.1
#14 8.558 Downloading omegaconf-2.0.5-py3-none-any.whl (36 kB)
#14 8.695 Collecting PyYAML>=5.1.*
#14 8.710 Downloading PyYAML-5.3.1.tar.gz (269 kB)
#14 9.085 Collecting sacrebleu>=1.4.12
#14 9.109 Downloading sacrebleu-1.4.14-py3-none-any.whl (64 kB)
#14 9.182 Requirement already satisfied: pycparser in /usr/local/lib/python3.6/dist-packages (from cffi->fairseq==1.0.0a0+c8a0659) (2.20)
#14 9.184 Collecting cython
#14 9.200 Downloading Cython-0.29.21-cp36-cp36m-manylinux1_x86_64.whl (2.0 MB)
#14 9.280 Collecting dataclasses
#14 9.297 Downloading dataclasses-0.8-py3-none-any.whl (19 kB)
#14 9.304 Collecting importlib-resources
#14 9.320 Downloading importlib_resources-3.3.0-py2.py3-none-any.whl (26 kB)
#14 9.401 Collecting zipp>=0.4
#14 9.416 Downloading zipp-3.4.0-py3-none-any.whl (5.2 kB)
#14 9.434 Collecting portalocker
#14 9.452 Downloading portalocker-2.0.0-py2.py3-none-any.whl (11 kB)
#14 9.466 Collecting regex
#14 9.485 Downloading regex-2020.11.13-cp36-cp36m-manylinux2014_x86_64.whl (723 kB)
#14 9.520 Collecting torch
#14 9.535 Downloading torch-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (776.8 MB)
#14 28.31 Killed
------
executor failed running [/bin/sh -c TMPDIR=/data/mydir/ pip install --cache-dir=/data/mydir/ --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help]: exit code: 137
I also tried the version you have in your repo since I saw you made few modifications, but still no luck.
Anything else I might try?
Can you share more information like where you are running this program, the docker version and the memory profile? docker stats will give you information if you already have containers running that may occupy storage. I had similar issue and the way I resolved it was by identifying if docker does not have any free space.
Also, if you have Nvidia GPU support, try running
nvidia-docker build -t wav2vec2 -f wav2letter.Dockerfile .
and in wave2letter.Dockerfile, change the first two lines to
FROM wav2letter/wav2letter:cuda-latest
ENV USE_CUDA=1
Sure, here are the specifications of my system that does not have a GPU:
docker system prune -f
Thanks for the help!
I have also tested in another system that has a GPU and implemented the modifications you outlined. This time python examples/speech_recognition/infer.py --help
runs but python examples/wav2vec/recognize.py --help
doesn't
The error:
/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "examples/wav2vec/recognize.py", line 10, in <module>
from fairseq.models.wav2vec.wav2vec2_asr import base_architecture, Wav2VecEncoder
ImportError: cannot import name 'base_architecture'
From the command line, I can run python -c "import torch; print(torch._C._cuda_getDeviceCount())"
and get 1
.
Sorry for that, can you remove the base_architecture,
from line 10 of recognize.py
It is no longer available in fairseq, so it will not be imported. I defined the function locally, but forgot to change that.
line 10 should read from fairseq.models.wav2vec.wav2vec2_asr import Wav2VecEncoder
About the GPU, you might have to match the CUDA version with the nvidia driver one. Can you check nvidia-smi
for CUDA Version: x.xx and make sure it matches the package version.
Thanks @raja1196 , this solved my problem! And for the MacOS issue, it was related to the memory resources in the docker kernel.
Now that I got the setup working, I am not having problem interpreting the output. It is not transcribing properly my sample test.
For instance, saying "hello world" returns the following: ASD WHELON FPPTH
Any idea how to get this working?
Make sure you are running this command to generate result:
python examples/wav2vec/recognize.py --wav_path /app/data/test_audio_16.wav --w2v_path /app/data/wav2vec2_vox_960h.pt --target_dict_path /app/data/dict.ltr.txt
and that the audio file you have is of 16 KHz wav file. If you have 8KHz file you can convert it with:
sox "your_audio_file.wav" -r 16000 -c 1 -b 16 "test_8K.wav"
This is a command line command and you can run it in terminal. If you get sox package not found error, then brew install or apt-get install it and run. If the problem still exists let me know.
The model can be changed (that is upto your requirement), but I have found best result with the combination of 16KHz with that model file (.pt)
Yeah, that what i was running but indeed changing the model gives different performance. But what made the real difference is uttering longer sentences than "hello world" which worked much better.
Things are working from my side, thanks for your help @raja1196. I am closing this ticket for now, I will ask different ones later;)
Hi there,
I am trying to build this docker image and get the following:
Any idea what am i missing?