rhasspy / wyoming-openwakeword

Wyoming protocol server for openWakeWord wake word detection system
MIT License
83 stars 23 forks source link

Support for openWakeWords Custom Verifier #13

Open eckley opened 5 months ago

eckley commented 5 months ago

It would be great if we could add the ability to use more of the feature sets from the openWakeWord project like Custom Verifies, as per Custom Verifier Models. Unless I'm missing something obvious and it's already supported.

ser commented 4 months ago

I am browsing the code and I am a bit surprised that wyoming-openwakeword seems to NOT using openwakeword library which handles custom verifier models. I would love to understand - why?

I have many false triggers and without a custom verifier it's impossible to use wyoming-openwakeword to be honest :(

ser commented 4 months ago

https://community.home-assistant.io/t/poll-whats-your-biggest-struggle-with-voice-control-right-now/658018/17

Michael writes: @synesthesiam

I’ve started my implementation, but haven’t been able to test it with multiple people yet.

but there is no any code link unfortunately ....

synesthesiam commented 4 months ago

To answer some questions: I don't use the official openWakeWord library because I wanted to implement batching. My implementation is optimized around having multiple audio streams all trying to detect the same wake word at the same time.

I started down the path of implementing custom verifier models (logistic regressions), but I've been wondering lately if using dynamic time warping (like my old Raven system did) might be better.

ser commented 4 months ago

OK, now it's clear, thanks for explanation :)

If you need any testers I am ready to help anytime. I can also understand code so I can be valuable as a tester I suppose. I had to switch off assistant as I simply can't stand "sorry but i did not understand that" three times an hour when i play some radio in the background - I have clear motivation to work on this!

Greylinux commented 4 months ago

I just opened an issue on dscripka's repo for this exact issue , not realising that this is the HA add-on version 😳 woops ! Well good to see that an issue has been created already. Thanks Mike for your excellent work on all the voice elements.

jhbruhn commented 3 months ago

Would you mind sharing your implementation of the logistic regression based custom verifiers, perhaps as a separate branch? Was it already in a usable state?

I'm trying to get reliable wyoming-satellites with local wake words running. I am currently considering implementing a second wyoming-openwakeword-standalone server which uses the original oWW library directly. That implementation doesn't need batching as it will only have one client and inherits VAD (which I've implemented in a PR already) and CVs from oWW. But if this implementation here provided Custom Verifiers, I would not need to implement the separate handler. What is your opinion on this?

BTW thanks for all the work on the Assist feature, it's already great as is, and the fact that everything is open source enables people like me to a) understand what is going on in the completely local VA and b) somehow give back by contributing (hopefully usable) code! :)

ser commented 3 months ago

@jhbruhn i have just rewritten wyoming-openwakeword to use original libraries and it works much much better with custom verifier indeed

jhbruhn commented 3 months ago

@ser would you mind sharing your implementation? 😍

jhbruhn commented 3 months ago

I have just implemented Custom Verifiers on my fork: https://github.com/jhbruhn/wyoming-openwakeword/commit/1c84f07eaa6b14acf96e276a3bf614ac2a1f2c55

But I don't feel it is in a state to make a PR and bring it into wyoming-openwakeword yet:

  1. I have not really tested it yet
  2. It probably makes sense to have different custom verifiers per wyoming client? Because different satellites have different sound characteristics/environments. Perhaps even some form of support in the wyoming protocol makes sense? I don't want to intervene if synesthesiam has a general bigger picture for this in mind :slightly_smiling_face:
  3. Using openwakeword pickles is a bit weird because it pulls in openwakeword as a dependency. But training on startup would take a couple of seconds depending on the amount of sample data.

Edit: About 2.: an ensemble of multiple custom verifiers just ran in parallel would be fine I think, as they should be very lightweight and additionally might aid in the separation of activations for multiple devices at once.

synesthesiam commented 3 months ago

I do want to rewrite wyoming-openwakeword to use the original library and include custom verifiers. I've added a new speaker field to the wake detection message so it will be possible to link a custom verifier with a speaker name. In the future, then, HA would be able to use this speaker name.

ser commented 3 months ago

@jhbruhn your implementation looks more interesting than mine, i completely replaced the @synesthesiam code with openwakeword which makes it not easy to publish as it's mess. I will test yours if it gives the same good results as mine.

jhbruhn commented 3 months ago

I want to try adding a custom verifier manager to my implementation which also manages training based on voice recordings. This way, it is a very hands off approach for custom verifiers, which can be fed with a directory of voice samples (positive and negative) , potentially from different speakers to build an ensemble of custom verifiers which also differentiate speakers. The results from that can then be used for the wyoming speaker attribute.

This way, the current batching implementation can be (somehow) kept. Perhaps the custom verifiers could run in parallel to the wakeword inference as the input features are the same, but I don't want to focus on that for now. Maybe this way @synesthesiam would not have to reimplement this wyoming service to use the original library?

This Custom Verifier manager could then also implement alternative models, perhaps through some kind of hyperparameter optimization during training, or include the aforementioned dynamic time warping algorithm from raven, which might perform even better.

The preliminary results of my implementation mentioned above seem very promising, I didn't notice any false activations, but because I could lower the thresholds in general, I'm also getting less false-negatives. But I've also noticed that, due to the sample data, it is now better at detecting my voice than other peoples voices, which makes the speaker identification capabilities even more promising.

jhbruhn commented 2 months ago

I have added a verify basic implementation of automatic Custom Verifier training based on a directory of samples categorized into different speakers: https://github.com/jhbruhn/wyoming-openwakeword/commit/5ea6254a30a15b7a6c8a9ef6777b500bab53e470

Whenever a wakeword model is first loaded, it checks the folder for positive (per speaker) and negative samples, and if no cached model is found (either in memory or a pickle file), it trains a new verifier before the wakeword thread starts.

It trains a Logistic Regression custom verifier based on the approach demonstrated by dscripka in the openwakeword repository, and still pulls in openwakeword as a dependency. The difference to the original implementation here is though, that the logistic regression is trained on N+1 labels, where N is the amount of speakers. Thus, the labels are for each speaker, and a negative label. Internally, sklearn should use a one-vs-all ensemble of regressors. There still is some verification needed whether that is a good approach.

The structure is modular, so it would be possible to integrate different custom verifier approaches.

What might be lacking currently is a differentiation between different clients, which might be in different sound environments. But as there is, afaik, no stable Client ID, I skipped this for now.

When a new version of the wyoming library is released, this approach can also include the speaker name with the Detection-event.

The expected sample directory structure is as follows:

<custom-verifier-samples-dir>/
  - positive/
    - speaker_1/
      - sample_1.wav
      - sample_2.wav
      - ...
    - speaker_2/
      - sample_1.wav
      - sample_2.wav
      - ...
  - negative/
    - sample_1.wav
    - sample_2.wav
    - ... 

Edit: For better functionality, it might be a good idea to programatically limit the amount of samples used for training because a) training on an Raspberry Pi can take a long time if a lot of samples are used and b) speaker identification can get biased if the amount of samples per speaker is unbalanced. A general remaining question is, how the sample collection pipeline can be improved. Ideally, the samples can be collected or at least selected via the Home Assistant interface. This would require additions to the wyoming protocol.

codemunkie15 commented 1 month ago

Is there any update on when this might be implemented please? Voice matching is the only thing left now stopping me from converting to Assist. You guys rock!

jhbruhn commented 1 month ago

I have tried doing voice matching with the same logisticRegression Classifier the custom verifiers are using, with very little success. Even training a separate LogisticRegression classifier did not yield usable results. I unfortunately don't have the time to do further work on this right now, as the custom verifier functionality is enough for me. Perhaps the dynamic time warping approach synesthesiam suggested above might be a path to evaluate further for voice matching? If that could also do the custom-verifier part, the performance should be okay.

Unfortunately, my custom verifier architecture which you can find on the branch I linked above would need some rework for that, as the custom verifiers can currently only work on the extracted features, not on the audio stream directly. But that should be easy to implement as a buffer of the last 2 seconds is already stored for debug purposes IIRC.

ser commented 4 days ago

I would also love to see it implemented in official openwakeword HA add-on, as it's a way to go.

synesthesiam commented 4 days ago

I have started the process of moving to the official openWakeWord library here: https://github.com/rhasspy/wyoming-openwakeword/pull/27