lostromb / pocketsphinx-kws

Pure C# port of the Pocketsphinx keyword spotter
BSD 3-Clause "New" or "Revised" License
12 stars 11 forks source link

Could you provide a small sample? #1

Open CobraCalle opened 4 years ago

CobraCalle commented 4 years ago

Hello,

Im looking for a way to get pocketsphinx running on dotnet core on windows iot (and linux)... and now Ive found your c# port, that sounds like a good solution to my problem.

Would it be possible to share a small sample on how to initialize the keyword spotter and process pcm data to spot a keyword?

That would really help... thank you very much for your great work

Carl

lostromb commented 4 years ago

Sure, see if this works for you. The API surface is still really rough and it has a lot of helper structs (Pointer<>, etc.) to make it behave like C. This looks really awkward in C# but it also means that you can use the regular C documentation and the functions/parameters should be analogous.

// Initialize sphinx config, including paths to model + dictionary files
Pointer<cmd_ln_t> config = PointerHelpers.Malloc<cmd_ln_t>(1);
config.Deref = new cmd_ln_t();
config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1,
    "-hmm", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\en-us-semi",
    "-dict", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\cmudict_SPHINX_40.txt",
    "-verbose", "y");

// I altered the original Sphinx code to allow for programmatically passing multiple keywords at once
// (where the original code only accepted either a single keyword, or a path to an external file containing multiple keywords).
// This is the format of the keyword string
string keywordFile = 
    "GO UP/3.1e-6\n" +
    "GO LEFT/3.1e-6\n" +
    "GO RIGHT/3.1e-6\n" +
    "GO DOWN/3.1e-6\n";

// Initialize decoder struct
Pointer <ps_decoder_t> ps = pocketsphinx.ps_init(config);

// Set search mode
if (pocketsphinx.ps_set_kws(ps, cstring.ToCString("keyword_search"), cstring.ToCString(keywordFile)) != 0)
{
    throw new Exception("Failed to set keyword search mode");
}

if (pocketsphinx.ps_set_search(ps, cstring.ToCString("keyword_search")) != 0)
{
    throw new Exception("Failed to set keyword search mode");
}

// Tell decoder to prepare for some input audio
if (pocketsphinx.ps_start_utt(ps) != 0)
{
    throw new Exception("Failed to start utterance");
}

bool user_is_speaking = false;
Pointer<byte> last_hyp = PointerHelpers.Malloc<byte>(512); // scratch buffer to hold the detected keyword
last_hyp[0] = 0;

// Send 10ms of audio at a time to decoder
const int SLICE_SIZE = 160;
Pointer<short> pointerToSamples = new Pointer<short>(inputAudioSamples);
for (int startOffset = 0; startOffset <= inputAudioSamples.Length - SLICE_SIZE; startOffset += SLICE_SIZE)
{
    Console.WriteLine("Processing " + startOffset);
    pocketsphinx.ps_process_raw(ps, pointerToSamples.Point(startOffset), SLICE_SIZE, 0, 0);

    // And extract output
    bool speech_detected = pocketsphinx.ps_get_in_speech(ps) != 0;
    BoxedValueInt score = new BoxedValueInt();
    Pointer<byte> hyp = pocketsphinx.ps_get_hyp(ps, score);
    if (hyp.IsNonNull)
    {
        // A search hypothesis was found in the input audio. Extract it as a cstring and store it elsewhere
        uint hypsize = cstring.strlen(hyp);
        cstring.strncpy(last_hyp, hyp, hypsize);
        last_hyp[hypsize] = 0;
        Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp));
    }

    if (!speech_detected && user_is_speaking)
    {
        Console.WriteLine("Speech -> silence transition");
        // speech->silence transition, time to start new utterance
        pocketsphinx.ps_end_utt(ps);

        // get final hypothesis, if any
        hyp = pocketsphinx.ps_get_hyp(ps, score);
        if (hyp.IsNonNull)
        {
            uint hypsize = cstring.strlen(hyp);
            cstring.strncpy(last_hyp, hyp, hypsize);
            last_hyp[hypsize] = 0;
            Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp));
        }

        // and restart utterance
        if (pocketsphinx.ps_start_utt(ps) != 0)
        {
            throw new Exception("Failed to start utterance");
        }

        user_is_speaking = false;
    }
    else
    {
        user_is_speaking = speech_detected;
    }
}
CobraCalle commented 4 years ago

perfect... thank you very very much... will try as soon as possible

CobraCalle commented 4 years ago

Thank you very much for your sample, I`ve ported my solution from native pocketsphonx to managed and now I can switch between native and managed version.

When I try to use the managed version the result of ps_init is null and I get the following output on ths console: ERROR: FFT: Number of points must be greater or equal to frame size (410000000 samples)

using the exact same config-paramters the native version works fine.

Here are the different variants of config I`ve tried so far:

Exact same values I use with the native version: config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", modelPath.Value, "-dict", dictFilePath.Value, "-samprate", "16000", "-nfft", "2048");

Same version as in your sample: config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", modelPath.Value, "-dict", dictFilePath.Value, "-verbose", "y");

Mix of both: config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1, "-hmm", modelPath.Value, "-dict", dictFilePath.Value, "-verbose", "y", "-samprate", "16000", "-nfft", "2048");

At the moment I have no clue what am I doing wrong?

CobraCalle commented 4 years ago

Sure, see if this works for you. The API surface is still really rough and it has a lot of helper structs (Pointer<>, etc.) to make it behave like C. This looks really awkward in C# but it also means that you can use the regular C documentation and the functions/parameters should be analogous.

// Initialize sphinx config, including paths to model + dictionary files
Pointer<cmd_ln_t> config = PointerHelpers.Malloc<cmd_ln_t>(1);
config.Deref = new cmd_ln_t();
config = cmd_ln.cmd_ln_init(config, pocketsphinx.ps_args(), 1,
    "-hmm", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\en-us-semi",
    "-dict", @"C:\Users\Administrator\Desktop\pocketsphinx-kws-master\CSharp\cmudict_SPHINX_40.txt",
    "-verbose", "y");

// I altered the original Sphinx code to allow for programmatically passing multiple keywords at once
// (where the original code only accepted either a single keyword, or a path to an external file containing multiple keywords).
// This is the format of the keyword string
string keywordFile = 
    "GO UP/3.1e-6\n" +
    "GO LEFT/3.1e-6\n" +
    "GO RIGHT/3.1e-6\n" +
    "GO DOWN/3.1e-6\n";

// Initialize decoder struct
Pointer <ps_decoder_t> ps = pocketsphinx.ps_init(config);

// Set search mode
if (pocketsphinx.ps_set_kws(ps, cstring.ToCString("keyword_search"), cstring.ToCString(keywordFile)) != 0)
{
    throw new Exception("Failed to set keyword search mode");
}

if (pocketsphinx.ps_set_search(ps, cstring.ToCString("keyword_search")) != 0)
{
    throw new Exception("Failed to set keyword search mode");
}

// Tell decoder to prepare for some input audio
if (pocketsphinx.ps_start_utt(ps) != 0)
{
    throw new Exception("Failed to start utterance");
}

bool user_is_speaking = false;
Pointer<byte> last_hyp = PointerHelpers.Malloc<byte>(512); // scratch buffer to hold the detected keyword
last_hyp[0] = 0;

// Send 10ms of audio at a time to decoder
const int SLICE_SIZE = 160;
Pointer<short> pointerToSamples = new Pointer<short>(inputAudioSamples);
for (int startOffset = 0; startOffset <= inputAudioSamples.Length - SLICE_SIZE; startOffset += SLICE_SIZE)
{
    Console.WriteLine("Processing " + startOffset);
    pocketsphinx.ps_process_raw(ps, pointerToSamples.Point(startOffset), SLICE_SIZE, 0, 0);

    // And extract output
    bool speech_detected = pocketsphinx.ps_get_in_speech(ps) != 0;
    BoxedValueInt score = new BoxedValueInt();
    Pointer<byte> hyp = pocketsphinx.ps_get_hyp(ps, score);
    if (hyp.IsNonNull)
    {
        // A search hypothesis was found in the input audio. Extract it as a cstring and store it elsewhere
        uint hypsize = cstring.strlen(hyp);
        cstring.strncpy(last_hyp, hyp, hypsize);
        last_hyp[hypsize] = 0;
        Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp));
    }

    if (!speech_detected && user_is_speaking)
    {
        Console.WriteLine("Speech -> silence transition");
        // speech->silence transition, time to start new utterance
        pocketsphinx.ps_end_utt(ps);

        // get final hypothesis, if any
        hyp = pocketsphinx.ps_get_hyp(ps, score);
        if (hyp.IsNonNull)
        {
            uint hypsize = cstring.strlen(hyp);
            cstring.strncpy(last_hyp, hyp, hypsize);
            last_hyp[hypsize] = 0;
            Console.WriteLine("Keyword detected: " + cstring.FromCString(last_hyp));
        }

        // and restart utterance
        if (pocketsphinx.ps_start_utt(ps) != 0)
        {
            throw new Exception("Failed to start utterance");
        }

        user_is_speaking = false;
    }
    else
    {
        user_is_speaking = speech_detected;
    }
}

I think Ive found the problem... we have a culture specific problem. The value for windows length is much to big because the code does only work when I change the culture hard do en-US. But my system has de-DE (because Im living in Germany :-) ).

But switching to en-US gives me another Error: ERROR_SYSTEM: S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\de-de\feature_transform: bio_fread_3d(lda) failed

CobraCalle commented 4 years ago

To be 100% sure, I also switched to the original model an dict form the cmu repo. With culture "en-US" now I get an arithmetic overflow at: at SphinxPortManaged.bin_mdef.bin_mdef_read(Pointer1 config, Pointer1 filename) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\bin_mdef.cs:line 553 at SphinxPortManaged.acmod.acmod_init_am(Pointer1 acmod) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\acmod.cs:line 30 at SphinxPortManaged.acmod.acmod_init(Pointer1 config, Pointer1 lmath, Pointer1 fe, Pointer1 fcb) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\acmod.cs:line 233 at SphinxPortManaged.pocketsphinx.ps_reinit(Pointer1 ps, Pointer1 config) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\pocketsphinx.cs:line 179 at SphinxPortManaged.pocketsphinx.ps_init(Pointer1 config) in S:\Oxidium\Oxidium.PocketSphinx\pocketsphinx-kws-master\CSharp\SphinxPortManaged\pocketsphinx.cs:line 307

Here is the output from the console: INFO: Parsed model-specific feature parameters from S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\en-us\feat.params INFO: Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none' INFO: Using subvector specification 0-12/13-25/26-38 INFO: Reading model definition: S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\en-us\mdef INFO: Found byte-order mark SphinxPortManaged.CPlusPlus.Pointer`1[System.Byte], assuming this is a binary mdef file INFO: Reading binary model definition: S:\Oxidium\Oxidium.BoardComputer\Binaries\Debug\Oxidium.BoardComputer.CommunicationDevice.Blazor\AnyCPU\Windows\netcoreapp3.0\PocketSphinxModels\en-us\mdef

I think the assumption that is is a "binary mdef" file is nor correct...