synesthesiam / voice2json

Command-line tools for speech and intent recognition on Linux
MIT License
1.09k stars 63 forks source link

v2.1 - transcribe-wav appears to be fairly slow #59

Closed OSS542 closed 3 years ago

OSS542 commented 3 years ago

I am testing this using en-us_pocketsphinx-cmu-1.2. I patched /usr/lib/python3.9/site-packages/voice2json/main.py so that the "en" alias references en-us_pocketsphinx-cmu.

I get the following timings:

bash-3.2$ time cat jfh1.wav | voice2json --profile en transcribe-wav | voice2json --profile en recognize-intent
...
real    0m3.662s
user    0m4.241s
sys     0m0.453s

bash-3.2$ time cat jfh1.wav | voice2json --profile en transcribe-wav > /dev/null
real    0m3.494s
user    0m3.382s
sys     0m0.397s

bash-3.2$ time cat jfh1.tsc | voice2json --profile en recognize-intent
...
real    0m0.852s
user    0m0.905s
sys     0m0.232s

Pocketsphinx is normally very fast when working with a restricted vocabulary having a small corpus. In many pocketsphinx applications, the corpus is uploaded to the CMU lmtool facility to obtain the dictionary "dic" and language model "lm" files derived from it, after which the application operates offline unless the vocabulary is updated. I am wondering if voice2json is working with a very large fixed dictionary and/or language model.

OSS542 commented 3 years ago

using the following works very quickly: voice2json -p en transcribe-stream | voice2json -p en recognize-intent | jq

closing this issue as resolved for me.