synesthesiam / voice2json

Command-line tools for speech and intent recognition on Linux
MIT License
1.08k stars 63 forks source link

Kaldi is being located inconsistently between train-profile and transcribe-wav #65

Open mrwerdo opened 2 years ago

mrwerdo commented 2 years ago

Hello! First off, thank you for the effort put into maintaining this project. It's pretty awesome.

I've been installing voice2json on Arch Linux. My experience has been that it works, provided that you can get everything installed. I'm hoping that this issue will help improve its portability and reduce the number of steps needed to install the package. If not, well then at least the steps can be documented for someone else who's attempting to install it on another distro.

I apologise in advance for the lack of conciseness. I'm not really sure how to start communicating what I'm thinking. I'll start with the most concrete problem:

If I run python -m voice2json -p en train-profile, with KALDI_DIR=/opt/kaldi, then the command succeeds. However, if I run python -m voice2json -p en transcribe-wav with KALDI_DIR=/opt/kaldi, the command fails. Below is a copy-paste of my terminal output. The docker container can be found here. The relevant part of the docker file is starts here.

[nubots@e0f4cf2d69bd NUbots]$ espeak -w output.wav "hello world"
[nubots@e0f4cf2d69bd NUbots]$ KALDI_DIR= python -m voice2json -p en transcribe-wav output.wav
{"text": "follow william", "likelihood": 1, "transcribe_seconds": 0.4412445730122272, "wav_seconds": 1.028375, "tokens": null, "wav_name": "output.wav"}
[nubots@e0f4cf2d69bd NUbots]$ python -m voice2json -p en transcribe-wav output.wav
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/site-packages/voice2json/__main__.py", line 1088, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/voice2json/__main__.py", line 87, in main
    await args.func(args, core)
  File "/usr/local/lib/python3.9/site-packages/voice2json/transcribe.py", line 57, in transcribe_wav
    transcriber.transcribe_wav(wav_data) or Transcription.empty()
  File "/usr/local/lib/python3.9/site-packages/rhasspyasr_kaldi/transcribe.py", line 79, in transcribe_wav
    text = self._transcribe_wav_nnet3(wav_file.name)
  File "/usr/local/lib/python3.9/site-packages/rhasspyasr_kaldi/transcribe.py", line 127, in _transcribe_wav_nnet3
    lines = subprocess.check_output(
  File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.9/subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/opt/kaldi/online2-wav-nnet3-latgen-faster'
[nubots@e0f4cf2d69bd NUbots]$ python -m voice2json -p en train-profile
WARNING:voice2json.pronounce:Skipping /usr/local/share/voice2json/en-us_kaldi-zamia/custom_words.txt (does not exist)
WARNING:rhasspynlu.g2p:Missing word 'chatbot'
WARNING:rhasspynlu.g2p:Missing word 'self-driving'
WARNING:rhasspynlu.g2p:Missing word 'neaten'
WARNING:rhasspynlu.g2p:Missing word 'moravec's'
WARNING:rhasspynlu.g2p:Missing word 'nanobot'
WARNING:rhasspynlu.g2p:Missing word 'ai's'
WARNING:rhasspynlu.g2p:Missing word 'yonge'
WARNING:rhasspynlu.g2p:Missing word 'skyler'
WARNING:rhasspynlu.g2p:Missing word '\'
/opt/kaldi/egs/wsj/s5/utils/prepare_lang.sh /usr/local/share/voice2json/en-us_kaldi-zamia/acoustic_model/data/local/dict <unk> /usr/local/share/voice2json/en-us_kaldi-zamia/acoustic_model/data/local/lang /usr/local/share/voice2json/en-us_kaldi-zamia/acoustic_model/data/lang
Checking /usr/local/share/voice2json/en-us_kaldi-zamia/acoustic_model/data/local/dict/silence_phones.txt ...
--> reading /usr/local/share/voice2json/en-us_kaldi-zamia/acoustic_model/data/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> /usr/local/share/voice2json/en-us_kaldi-zamia/acoustic_model/data/local/dict/silence_phones.txt is OK

In the container, the Kaldi is installed under /opt/kaldi. Voice2json is located under /home/nubots/voice2json, and installed under /usr/lib/python3.9/site-packages. I'm currently moving it to be installed under /usr/local/lib/python3.9/site-packages, which is why the output may be slightly different in the container I have linked.

So, how should voice2json be searching for Kaldi programs?