Adapt for DeepSpeech 0.6.0

BoneGoat commented 4 years ago

I'm trying to adapt this code for the new release of DeepSpeech. After some minor modifications in align/wavTranscriber.py, mostly CreateModel and enableDecoderWithLM, I'm running into the following error:

bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/log.txt --stt-model-dir models/en/deepspeech-0.6.0-models --alphabet models/en/alphabet.txt
DEBUG:root:Start
DEBUG:root:Loading alphabet from "models/en/alphabet.txt"...
DEBUG:root:Looking for model files in "models/en/deepspeech-0.6.0-models"...
DEBUG:root:Loading acoustic model from "models/en/deepspeech-0.6.0-models/output_graph.pb", alphabet from "models/en/alphabet.txt" and language model from "data/test1/transcript.txt.lm"...
DEBUG:root:Transcribing VAD segments...
VAD splitting: 3464it [00:01, 2274.26it/s]
Transcribing:   0%|                                                                                                                                                                                                  | 0/3464 [00:00<?, ?it/s]DEBUG:root:Process 28330: Loaded models
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.0-0-g6d43e21
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-12-19 12:58:48.519864: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Error: Trie file version mismatch (4 instead of expected 5). Update your trie file.
libc++abi.dylib: terminating with uncaught exception of type int

pip list
Package      Version
------------ -------
deepspeech   0.6.0
numpy        1.17.4
pip          19.0.3
pydub        0.23.1
setuptools   40.8.0
six          1.13.0
sox          1.3.7
textdistance 4.1.5
tqdm         4.40.2
webrtcvad    2.0.10

I have replaced generate_trie with the 0.6.0 version from native client. I can get a bit further by not generating specific LMs but then I run into this problem:

bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/log.txt --stt-model-dir models/en/deepspeech-0.6.0-models --alphabet models/en/alphabet.txt --stt-no-own-lm
DEBUG:root:Start
DEBUG:root:Loading alphabet from "models/en/alphabet.txt"...
DEBUG:root:Looking for model files in "models/en/deepspeech-0.6.0-models"...
DEBUG:root:Loading acoustic model from "models/en/deepspeech-0.6.0-models/output_graph.pb", alphabet from "models/en/alphabet.txt", trie from "models/en/deepspeech-0.6.0-models/trie" and language model from "models/en/deepspeech-0.6.0-models/lm.binary"...
DEBUG:root:Transcribing VAD segments...
VAD splitting: 3464it [00:01, 2264.24it/s]
Transcribing:   0%|                                                                                                                                                                                                  | 0/3464 [00:00<?, ?it/s]DEBUG:root:Process 31650: Loaded models
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.0-0-g6d43e21
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-12-19 14:19:34.495576: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
DEBUG:root:Process 31650: Transcribing...
DEBUG:root:Process 31650: Transcribing...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users/tobias/dev/git/DSAlign/align/align.py", line 74, in stt
    transcript = wavTranscriber.stt(model, audio, sample_rate)
  File "/Users/tobias/dev/git/DSAlign/align/wavTranscriber.py", line 40, in stt
    output = ds.stt(audio, fs)
  File "/Users/tobias/dev/git/DSAlign/venv/lib/python3.7/site-packages/deepspeech/__init__.py", line 93, in stt
    return deepspeech.impl.SpeechToText(self._impl, *args, **kwargs)
TypeError: SpeechToText() takes at most 2 arguments (3 given)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/tobias/dev/git/DSAlign/align/align.py", line 672, in <module>
DEBUG:root:Process 31650: Transcribing...
    main()
  File "/Users/tobias/dev/git/DSAlign/align/align.py", line 631, in main
    for time_start, time_end, segment_transcript in transcripts:
  File "/Users/tobias/dev/git/DSAlign/venv/lib/python3.7/site-packages/tqdm/std.py", line 1102, in __iter__
    for obj in iterable:
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
DEBUG:root:Process 31650: Transcribing...
TypeError: SpeechToText() takes at most 2 arguments (3 given)
DEBUG:root:Process 31650: Transcribing...
Transcribing:   0%|

I'm sure these are just minor problems and adapting DSAlign to DeepSpeech 0.6.0 won't be that difficult.

BoneGoat commented 4 years ago

Fixed the "SpeechToText() takes at most 2 arguments (3 given)" problem by removing sample rate from the ds.tts() call. Sample rate is now inferred from the model.

tilmankamp commented 4 years ago

Thanks for your help! Could you add/replace deepspeech==0.6.0 in the requirements.txt file and put this fixes up as a PR?

BoneGoat commented 4 years ago

Got passed the "Trie file version mismatch (4 instead of expected 5). Update your trie file." error by using v0.6.0 of the native client from GitHub extracted into dependencies/deepspeech. taskcluster.py is supposed to fix this but taskcluster.net seems deprecated. taskcluster.py needs refactoring to pull files from GitHub.

BoneGoat commented 4 years ago

Thanks for merging and for this wonderful lib.

tilmankamp commented 4 years ago

https://github.com/mozilla/DSAlign/commit/6dcfd4dd4a8146d23c0c784a960c1557b6e803bb should have fixed the DeepSpeech dependency in case of individual language models.

mozilla / DSAlign

Adapt for DeepSpeech 0.6.0 #14