synesthesiam / rhasspy

Rhasspy voice assistant for offline home automation
https://rhasspy.readthedocs.io
MIT License
942 stars 101 forks source link

recognition with wake worse than "hold to record" #199

Open vabene1111 opened 4 years ago

vabene1111 commented 4 years ago

I dont know if this is a bug or just some kind of usage error but i have tried both pocketsphyx and prorcupine as wake word providers. Both work and have good recognition rates but the recognition after the wake word is very bad compared to the recognition when clicking "hold to record".

When using hold to record i feel like the recognition rate is somewhere around 98%, with the wake word and a lot of trial and error for the right pronunciation and timing between waking and speaking its somwhere around 20-30%. I have been trying play music and stop music as commands.

I am using v2.4.19 with the docker install and a ReSpeaker 2-Mic board as a microphone (but i dont thing thats the cause since the manual record works great).

The settings are all default (except of course that the wake word detection was turned on). I have trained multiple times, restarted, and cleared the training chache and retrained.

Any ideas why this difference could occur ?

here the settings from the advanced tab (should be only the non default ones if i understand correctly)

{
    "language": "en",
    "name": "en",
    "locale": "en_US",
    "speech_to_text": {
        "system": "pocketsphinx",
        "dictionary_casing": "lower",
        "kaldi": {
            "base_dictionary": "kaldi/base_dictionary.txt",
            "base_language_model": "kaldi/base_language_model.txt",
            "base_language_model_fst": "kaldi/base_language_model.fst",
            "compatible": true,
            "custom_words": "kaldi_custom_words.txt",
            "dictionary": "kaldi/dictionary.txt",
            "graph": "graph",
            "language_model": "kaldi/language_model.txt",
            "model_dir": "kaldi/model",
            "unknown_words": "kaldi/unknown_words.txt",
            "mix_fst": "kaldi/mixed.fst",
            "g2p_model": "kaldi/g2p.fst",
            "phoneme_examples": "kaldi/phoneme_examples.txt",
            "phoneme_map": "kaldi/espeak_phonemes.txt"
        }
    },
    "intent": {
        "flair": {
            "embeddings": [
                "news-forward-0.4.1.pt",
                "news-backward-0.4.1.pt"
            ]
        }
    },
    "text_to_speech": {
        "wavenet": {
            "language_code": "en-US"
        },
        "marytts": {
            "locale": "en-US"
        }
    },
    "download": {
        "conditions": {
            "speech_to_text.system": {
                "pocketsphinx": {
                    "acoustic_model": "cmusphinx-en-us-5.2.tar.gz:cmusphinx-en-us-5.2",
                    "base_dictionary.txt": "en-g2p.tar.gz:base_dictionary.txt",
                    "g2p.fst": "en-g2p.tar.gz:g2p.fst"
                },
                "kaldi": {
                    "kaldi": "en_kaldi-zamia.tar.gz:kaldi"
                }
            },
            "speech_to_text.kaldi.open_transcription": {
                "True": {
                    "kaldi/model/base_graph": "en_kaldi-zamia-base_graph.tar.gz:base_graph"
                }
            },
            "speech_to_text.pocketsphinx.mix_weight": {
                ">0": {
                    "base_language_model.txt": "en-70k-0.2-pruned.lm.gz:en-70k-0.2-pruned.lm"
                }
            },
            "intent.system": {
                "flair": {
                    "flair/cache/embeddings/news-forward-0.4.1.pt": "news-forward-0.4.1.pt",
                    "flair/cache/embeddings/news-backward-0.4.1.pt": "news-backward-0.4.1.pt"
                }
            }
        },
        "files": {
            "cmusphinx-en-us-5.2.tar.gz": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/cmusphinx-en-us-5.2.tar.gz"
            },
            "en-70k-0.2-pruned.lm.gz": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/en-70k-0.2-pruned.lm.gz"
            },
            "en-g2p.tar.gz": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/en-g2p.tar.gz"
            },
            "news-forward-0.4.1.pt": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/news-forward-0.4.1.pt",
                "cache": false
            },
            "news-backward-0.4.1.pt": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/news-backward-0.4.1.pt",
                "cache": false
            },
            "en_kaldi-zamia.tar.gz": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/en_kaldi-zamia.tar.gz"
            },
            "en_kaldi-zamia-base_graph.tar.gz": {
                "url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-en/en_kaldi-zamia-base_graph.tar.gz"
            }
        }
    }
}
synesthesiam commented 4 years ago

There should be almost no difference unless the feedback sounds are bleeding over into the recorded voice command. If you record a command and click the play back button in the web UI, do you hear the beeps?

vabene1111 commented 4 years ago

so until now i did not have an output device configured. I attatched a headset for testing.

When recording, stopping recording or playing the voice command no sounds play. When saying the wake word it does make two sounds, one when starting recording and one when ending (at least that what i think since the icon on the top left changes as well between the beeps).

Still the detection is basically useless when using the wake word and almost perfect when triggering the recording manually.

markusappel commented 4 years ago

Started playing today and had the same problem. When listening to the last command in the web UI after the wake word (thanks for the hint @synesthesiam), I realized that the first split second of the command was cut off and the first word could not be recognized. Seems like webrtcvad needed some tuning ... adding this to the profile fixed it for me:

    "command": {
        "webrtcvad": {
            "speech_buffers": 0,
            "throwaway_buffers": 3
        }

(Although something was weird about the speech_buffers setting: somewhere around 2-3 the behaviour jumped from "cutting the first command word" to "keep all the silence between wake word and first command word")

vabene1111 commented 4 years ago

Ok that is definitely a huge improvement to how it was before! It feels like everything is a little slower now but that might be something else.

Mic92 commented 4 years ago

I can confirm that https://github.com/synesthesiam/rhasspy/issues/199#issuecomment-614336096 is an improvement:

Before when I would say: What time is it? it would only recognize time is it. Now it recognizes the whole sentence.

Mic92 commented 4 years ago

Should I make a PR to change the defaults?

Mic92 commented 4 years ago

I don't see this problem anymore with rhasspy 2.5 from here: https://github.com/rhasspy/rhasspy