loretoparisi / wave2vec-recognize-docker

Wave2vec 2.0 Recognize pipeline
MIT License
33 stars 10 forks source link

Cannot build image `executor failed running [/bin/sh -c pip3 install soundfile torchaudio sentencepiece]: exit code: 137` #6

Closed othrif closed 3 years ago

othrif commented 3 years ago

Hi there,

I am trying to build this docker image and get the following: image

Any idea what am i missing?

raja1196 commented 3 years ago

Can you clear the docker system (with docker system prune) and clear the caches and try again? I had similar error (especially torch is a heavy file, so if you tried this multiple times, the memory might be full).

raja1196 commented 3 years ago

Also, which Dockerfile did you run? There are two, and I had success with building from wave2letter.Dockerfile

othrif commented 3 years ago

Thanks @raja1196 for the tip. But clearing the cache and switch towav2letter.Dockerfile by running: docker build -t wav2vec -f wav2letter.Dockerfile . did not help.

 => [8/9] WORKDIR /root/fairseq                                                                                                                         0.0s
 => ERROR [9/9] RUN TMPDIR=/data/mydir/ pip install --cache-dir=/data/mydir/ --editable ./ && python examples/speech_recognition/infer.py --help && p  28.9s
------
 > [9/9] RUN TMPDIR=/data/mydir/ pip install --cache-dir=/data/mydir/ --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help:
#14 0.611 Obtaining file:///root/fairseq
#14 0.614   Installing build dependencies: started
#14 3.306   Installing build dependencies: finished with status 'done'
#14 3.306   Getting requirements to build wheel: started
#14 3.501   Getting requirements to build wheel: finished with status 'done'
#14 3.505   Installing backend dependencies: started
#14 6.264   Installing backend dependencies: finished with status 'done'
#14 6.265     Preparing wheel metadata: started
#14 6.589     Preparing wheel metadata: finished with status 'done'
#14 6.792 Requirement already satisfied: numpy<1.20.0 in /usr/local/lib/python3.6/dist-packages (from fairseq==1.0.0a0+c8a0659) (1.18.2)
#14 6.823 Requirement already satisfied: cffi in /usr/local/lib/python3.6/dist-packages (from fairseq==1.0.0a0+c8a0659) (1.14.4)
#14 6.967 Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from fairseq==1.0.0a0+c8a0659) (4.44.1)
#14 8.065 Collecting hydra-core<1.1
#14 8.155   Downloading hydra_core-1.0.4-py3-none-any.whl (122 kB)
#14 8.283 Collecting antlr4-python3-runtime==4.8
#14 8.299   Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
#14 8.540 Collecting omegaconf<2.1
#14 8.558   Downloading omegaconf-2.0.5-py3-none-any.whl (36 kB)
#14 8.695 Collecting PyYAML>=5.1.*
#14 8.710   Downloading PyYAML-5.3.1.tar.gz (269 kB)
#14 9.085 Collecting sacrebleu>=1.4.12
#14 9.109   Downloading sacrebleu-1.4.14-py3-none-any.whl (64 kB)
#14 9.182 Requirement already satisfied: pycparser in /usr/local/lib/python3.6/dist-packages (from cffi->fairseq==1.0.0a0+c8a0659) (2.20)
#14 9.184 Collecting cython
#14 9.200   Downloading Cython-0.29.21-cp36-cp36m-manylinux1_x86_64.whl (2.0 MB)
#14 9.280 Collecting dataclasses
#14 9.297   Downloading dataclasses-0.8-py3-none-any.whl (19 kB)
#14 9.304 Collecting importlib-resources
#14 9.320   Downloading importlib_resources-3.3.0-py2.py3-none-any.whl (26 kB)
#14 9.401 Collecting zipp>=0.4
#14 9.416   Downloading zipp-3.4.0-py3-none-any.whl (5.2 kB)
#14 9.434 Collecting portalocker
#14 9.452   Downloading portalocker-2.0.0-py2.py3-none-any.whl (11 kB)
#14 9.466 Collecting regex
#14 9.485   Downloading regex-2020.11.13-cp36-cp36m-manylinux2014_x86_64.whl (723 kB)
#14 9.520 Collecting torch
#14 9.535   Downloading torch-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (776.8 MB)
#14 28.31 Killed
------
executor failed running [/bin/sh -c TMPDIR=/data/mydir/ pip install --cache-dir=/data/mydir/ --editable ./ && python examples/speech_recognition/infer.py --help && python examples/wav2vec/recognize.py --help]: exit code: 137

I also tried the version you have in your repo since I saw you made few modifications, but still no luck.

Anything else I might try?

raja1196 commented 3 years ago

Can you share more information like where you are running this program, the docker version and the memory profile? docker stats will give you information if you already have containers running that may occupy storage. I had similar issue and the way I resolved it was by identifying if docker does not have any free space.

raja1196 commented 3 years ago

Also, if you have Nvidia GPU support, try running nvidia-docker build -t wav2vec2 -f wav2letter.Dockerfile .

and in wave2letter.Dockerfile, change the first two lines to FROM wav2letter/wav2letter:cuda-latest ENV USE_CUDA=1

othrif commented 3 years ago

Sure, here are the specifications of my system that does not have a GPU:

Thanks for the help!

othrif commented 3 years ago

I have also tested in another system that has a GPU and implemented the modifications you outlined. This time python examples/speech_recognition/infer.py --help runs but python examples/wav2vec/recognize.py --help doesn't

The error:

/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "examples/wav2vec/recognize.py", line 10, in <module>
    from fairseq.models.wav2vec.wav2vec2_asr import base_architecture, Wav2VecEncoder
ImportError: cannot import name 'base_architecture'

From the command line, I can run python -c "import torch; print(torch._C._cuda_getDeviceCount())" and get 1.

raja1196 commented 3 years ago

Sorry for that, can you remove the base_architecture, from line 10 of recognize.py It is no longer available in fairseq, so it will not be imported. I defined the function locally, but forgot to change that. line 10 should read from fairseq.models.wav2vec.wav2vec2_asr import Wav2VecEncoder

About the GPU, you might have to match the CUDA version with the nvidia driver one. Can you check nvidia-smi for CUDA Version: x.xx and make sure it matches the package version.

othrif commented 3 years ago

Thanks @raja1196 , this solved my problem! And for the MacOS issue, it was related to the memory resources in the docker kernel.

Now that I got the setup working, I am not having problem interpreting the output. It is not transcribing properly my sample test. For instance, saying "hello world" returns the following: ASD WHELON FPPTH

Any idea how to get this working?

raja1196 commented 3 years ago

Make sure you are running this command to generate result: python examples/wav2vec/recognize.py --wav_path /app/data/test_audio_16.wav --w2v_path /app/data/wav2vec2_vox_960h.pt --target_dict_path /app/data/dict.ltr.txt

and that the audio file you have is of 16 KHz wav file. If you have 8KHz file you can convert it with: sox "your_audio_file.wav" -r 16000 -c 1 -b 16 "test_8K.wav" This is a command line command and you can run it in terminal. If you get sox package not found error, then brew install or apt-get install it and run. If the problem still exists let me know.

raja1196 commented 3 years ago

The model can be changed (that is upto your requirement), but I have found best result with the combination of 16KHz with that model file (.pt)

othrif commented 3 years ago

Yeah, that what i was running but indeed changing the model gives different performance. But what made the real difference is uttering longer sentences than "hello world" which worked much better.

Things are working from my side, thanks for your help @raja1196. I am closing this ticket for now, I will ask different ones later;)