I am testing this using en-us_pocketsphinx-cmu-1.2. I patched /usr/lib/python3.9/site-packages/voice2json/main.py so that the "en" alias references en-us_pocketsphinx-cmu.
I get the following timings:
bash-3.2$ time cat jfh1.wav | voice2json --profile en transcribe-wav | voice2json --profile en recognize-intent
...
real 0m3.662s
user 0m4.241s
sys 0m0.453s
bash-3.2$ time cat jfh1.wav | voice2json --profile en transcribe-wav > /dev/null
real 0m3.494s
user 0m3.382s
sys 0m0.397s
bash-3.2$ time cat jfh1.tsc | voice2json --profile en recognize-intent
...
real 0m0.852s
user 0m0.905s
sys 0m0.232s
Pocketsphinx is normally very fast when working with a restricted vocabulary having a small corpus. In many pocketsphinx applications, the corpus is uploaded to the CMU lmtool facility to obtain the dictionary "dic" and language model "lm" files derived from it, after which the application operates offline unless the vocabulary is updated. I am wondering if voice2json is working with a very large fixed dictionary and/or language model.
I am testing this using en-us_pocketsphinx-cmu-1.2. I patched /usr/lib/python3.9/site-packages/voice2json/main.py so that the "en" alias references en-us_pocketsphinx-cmu.
I get the following timings:
Pocketsphinx is normally very fast when working with a restricted vocabulary having a small corpus. In many pocketsphinx applications, the corpus is uploaded to the CMU lmtool facility to obtain the dictionary "dic" and language model "lm" files derived from it, after which the application operates offline unless the vocabulary is updated. I am wondering if voice2json is working with a very large fixed dictionary and/or language model.