Open CobraCalle opened 4 years ago
Sure, see if this works for you. The API surface is still really rough and it has a lot of helper structs (Pointer<>, etc.) to make it behave like C. This looks really awkward in C# but it also means that you can use the regular C documentation and the functions/parameters should be analogous.
// Initialize sphinx config, including paths to model + dictionary files
Pointer<cmd_ln_t> config = PointerHelpers.Malloc<cmd_ln_t>(1);
config.Deref = new cmd_ln_t();
config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1,
"-hmm", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\en-us-semi",
"-dict", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\cmudict_SPHINX_40.txt",
"-verbose", "y");
// I altered the original Sphinx code to allow for programmatically passing multiple keywords at once
// (where the original code only accepted either a single keyword, or a path to an external file containing multiple keywords).
// This is the format of the keyword string
string keywordFile =
"GO UP/3.1e-6\n" +
"GO LEFT/3.1e-6\n" +
"GO RIGHT/3.1e-6\n" +
"GO DOWN/3.1e-6\n";
// Initialize decoder struct
Pointer <ps_decoder_t> ps = pocketsphinx.ps_init(config);
// Set search mode
if (pocketsphinx.ps_set_kws(ps, cstring.ToCString("keyword_search"), cstring.ToCString(keywordFile)) != 0)
{
throw new Exception("Failed to set keyword search mode");
}
if (pocketsphinx.ps_set_search(ps, cstring.ToCString("keyword_search")) != 0)
{
throw new Exception("Failed to set keyword search mode");
}
// Tell decoder to prepare for some input audio
if (pocketsphinx.ps_start_utt(ps) != 0)
{
throw new Exception("Failed to start utterance");
}
bool user_is_speaking = false;
Pointer<byte> last_hyp = PointerHelpers.Malloc<byte>(512); // scratch buffer to hold the detected keyword
last_hyp[0] = 0;
// Send 10ms of audio at a time to decoder
const int SLICE_SIZE = 160;
Pointer<short> pointerToSamples = new Pointer<short>(inputAudioSamples);
for (int startOffset = 0; startOffset <= inputAudioSamples.Length - SLICE_SIZE; startOffset += SLICE_SIZE)
{
Console.WriteLine("Processing " + startOffset);
pocketsphinx.ps_process_raw(ps, pointerToSamples.Point(startOffset), SLICE_SIZE, 0, 0);
// And extract output
bool speech_detected = pocketsphinx.ps_get_in_speech(ps) != 0;
BoxedValueInt score = new BoxedValueInt();
Pointer<byte> hyp = pocketsphinx.ps_get_hyp(ps, score);
if (hyp.IsNonNull)
{
// A search hypothesis was found in the input audio. Extract it as a cstring and store it elsewhere
uint hypsize = cstring.strlen(hyp);
cstring.strncpy(last_hyp, hyp, hypsize);
last_hyp[hypsize] = 0;
Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp));
}
if (!speech_detected && user_is_speaking)
{
Console.WriteLine("Speech -> silence transition");
// speech->silence transition, time to start new utterance
pocketsphinx.ps_end_utt(ps);
// get final hypothesis, if any
hyp = pocketsphinx.ps_get_hyp(ps, score);
if (hyp.IsNonNull)
{
uint hypsize = cstring.strlen(hyp);
cstring.strncpy(last_hyp, hyp, hypsize);
last_hyp[hypsize] = 0;
Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp));
}
// and restart utterance
if (pocketsphinx.ps_start_utt(ps) != 0)
{
throw new Exception("Failed to start utterance");
}
user_is_speaking = false;
}
else
{
user_is_speaking = speech_detected;
}
}
perfect... thank you very very much... will try as soon as possible
Thank you very much for your sample, I`ve ported my solution from native pocketsphonx to managed and now I can switch between native and managed version.
When I try to use the managed version the result of ps_init is null and I get the following output on ths console: ERROR: FFT: Number of points must be greater or equal to frame size (410000000 samples)
using the exact same config-paramters the native version works fine.
Here are the different variants of config I`ve tried so far:
Exact same values I use with the native version: config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", modelPath.Value, "-dict", dictFilePath.Value, "-samprate", "16000", "-nfft", "2048");
Same version as in your sample: config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", modelPath.Value, "-dict", dictFilePath.Value, "-verbose", "y");
Mix of both: config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", modelPath.Value, "-dict", dictFilePath.Value, "-verbose", "y", "-samprate", "16000", "-nfft", "2048");
At the moment I have no clue what am I doing wrong?
Sure, see if this works for you. The API surface is still really rough and it has a lot of helper structs (Pointer<>, etc.) to make it behave like C. This looks really awkward in C# but it also means that you can use the regular C documentation and the functions/parameters should be analogous.
// Initialize sphinx config, including paths to model + dictionary files Pointer<cmd_ln_t> config = PointerHelpers.Malloc<cmd_ln_t>(1); config.Deref = new cmd_ln_t(); config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\en-us-semi", "-dict", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\cmudict_SPHINX_40.txt", "-verbose", "y"); // I altered the original Sphinx code to allow for programmatically passing multiple keywords at once // (where the original code only accepted either a single keyword, or a path to an external file containing multiple keywords). // This is the format of the keyword string string keywordFile = "GO UP/3.1e-6\n" + "GO LEFT/3.1e-6\n" + "GO RIGHT/3.1e-6\n" + "GO DOWN/3.1e-6\n"; // Initialize decoder struct Pointer <ps_decoder_t> ps = pocketsphinx.ps_init(config); // Set search mode if (pocketsphinx.ps_set_kws(ps, cstring.ToCString("keyword_search"), cstring.ToCString(keywordFile)) != 0) { throw new Exception("Failed to set keyword search mode"); } if (pocketsphinx.ps_set_search(ps, cstring.ToCString("keyword_search")) != 0) { throw new Exception("Failed to set keyword search mode"); } // Tell decoder to prepare for some input audio if (pocketsphinx.ps_start_utt(ps) != 0) { throw new Exception("Failed to start utterance"); } bool user_is_speaking = false; Pointer<byte> last_hyp = PointerHelpers.Malloc<byte>(512); // scratch buffer to hold the detected keyword last_hyp[0] = 0; // Send 10ms of audio at a time to decoder const int SLICE_SIZE = 160; Pointer<short> pointerToSamples = new Pointer<short>(inputAudioSamples); for (int startOffset = 0; startOffset <= inputAudioSamples.Length - SLICE_SIZE; startOffset += SLICE_SIZE) { Console.WriteLine("Processing " + startOffset); pocketsphinx.ps_process_raw(ps, pointerToSamples.Point(startOffset), SLICE_SIZE, 0, 0); // And extract output bool speech_detected = pocketsphinx.ps_get_in_speech(ps) != 0; BoxedValueInt score = new BoxedValueInt(); Pointer<byte> hyp = pocketsphinx.ps_get_hyp(ps, score); if (hyp.IsNonNull) { // A search hypothesis was found in the input audio. Extract it as a cstring and store it elsewhere uint hypsize = cstring.strlen(hyp); cstring.strncpy(last_hyp, hyp, hypsize); last_hyp[hypsize] = 0; Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp)); } if (!speech_detected && user_is_speaking) { Console.WriteLine("Speech -> silence transition"); // speech->silence transition, time to start new utterance pocketsphinx.ps_end_utt(ps); // get final hypothesis, if any hyp = pocketsphinx.ps_get_hyp(ps, score); if (hyp.IsNonNull) { uint hypsize = cstring.strlen(hyp); cstring.strncpy(last_hyp, hyp, hypsize); last_hyp[hypsize] = 0; Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp)); } // and restart utterance if (pocketsphinx.ps_start_utt(ps) != 0) { throw new Exception("Failed to start utterance"); } user_is_speaking = false; } else { user_is_speaking = speech_detected; } }
I think Ive found the problem... we have a culture specific problem. The value for windows length is much to big because the code does only work when I change the culture hard do en-US. But my system has de-DE (because I
m living in Germany :-) ).
But switching to en-US gives me another Error: ERROR_SYSTEM: S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\de-de\feature_transform: bio_fread_3d(lda) failed
To be 100% sure, I also switched to the original model an dict form the cmu repo. With culture "en-US" now I get an arithmetic overflow at:
at SphinxPortManaged.bin_mdef.bin_mdef_read(Pointer1 config, Pointer
1 filename) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\bin_mdef.cs:line 553
at SphinxPortManaged.acmod.acmod_init_am(Pointer1 acmod) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\acmod.cs:line 30 at SphinxPortManaged.acmod.acmod_init(Pointer
1 config, Pointer1 lmath, Pointer
1 fe, Pointer1 fcb) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\acmod.cs:line 233 at SphinxPortManaged.pocketsphinx.ps_reinit(Pointer
1 ps, Pointer1 config) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\pocketsphinx.cs:line 179 at SphinxPortManaged.pocketsphinx.ps_init(Pointer
1 config) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\pocketsphinx.cs:line 307
Here is the output from the console: INFO: Parsed model-specific feature parameters from S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\en-us\feat.params INFO: Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none' INFO: Using subvector specification 0-12/13-25/26-38 INFO: Reading model definition: S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\en-us\mdef INFO: Found byte-order mark SphinxPortManaged.CPlusPlus.Pointer`1[System.Byte], assuming this is a binary mdef file INFO: Reading binary model definition: S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\en-us\mdef
I think the assumption that is is a "binary mdef" file is nor correct...
Hello,
I
m looking for a way to get pocketsphinx running on dotnet core on windows iot (and linux)... and now I
ve found your c# port, that sounds like a good solution to my problem.Would it be possible to share a small sample on how to initialize the keyword spotter and process pcm data to spot a keyword?
That would really help... thank you very much for your great work
Carl