syl22-00 / pocketsphinx.js

Speech recognition in JavaScript and WebAssembly
1.49k stars 261 forks source link

default config for memory-mapping causes problems when lazy-loading pocketsphinx-provided en-us acoustic model #70

Open jcmoore opened 8 years ago

jcmoore commented 8 years ago

tl;dr in order to (lazily) load the acoustic model from here (https://github.com/cmusphinx/pocketsphinx/tree/a60982363101704eca342e7e0920754090cd49b1/model/en-us/en-us) without warnings/errors, I'm having to provide a ["-mmap", "no"] configuration setting to the recognizer.

...

When loading the en-us model, I experienced numerous errors -- many survivable (ERROR: "dict.c", line 195: Line 134722: Phone 'Z' is mising in the acoustic model; word 'zyuganov(2)' ignored) and some fatal (Uncaught Assertion failed: (ci >= 0) && (ci < m->n_ciphone), at: /home/sylvain/dev/projects/pocketsphinx.js/pocketsphinx/src/libpocketsphinx/bin_mdef.c,758,bin_mdef_phone_id at). A comparable native (non-js/emscripten) build of pocketsphinx (on OSX) experienced no such problems for the same configuration (just setting an -hmm argument).

After narrowing down differences between the logs of the native and emscripten builds, I noticed the following warning occurred a number of times for me in pocketsphinx.js:

WARN: "bin_mdef.c", line 499: Senone 0 is shared between multiple base phones

Seems that mdef was not loading properly -- many (but seemingly not all) of my m->phone[i].ssid were 0 as a result. If I understand correctly, the offending code is here (https://github.com/cmusphinx/pocketsphinx/blob/a60982363101704eca342e7e0920754090cd49b1/src/libpocketsphinx/bin_mdef.c#L403-L430). Memory mapping is on by default, I don't know emscripten that well but I thought it was likely there might not be support for memory mapping, and when I explicitly disabled memory mapping with ["-mmap", "no"], errors and warnings went away.

Strangely, this did not seem to happen for the acoustic model cmusphinx-en-us-5.2.tar.gz here (https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/). Go figure.

dhdaines commented 4 years ago

I ran into this problem too, it is an issue with Emscripten which I don't fully understand. It looks like mmap() should work in the JavaScript runtime but it actually just returns zero-filled memory. The good news is that there is no real advantage to using mmap() to load files, so you can just turn it off.