sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
505 stars 77 forks source link

Is it possible to limit the vocabulary? #173

Open adscott1982 opened 4 months ago

adscott1982 commented 4 months ago

My use case is very specific - and only needs to listen for an activation word, and maybe 12 or 13 different commands.

To improve the performance I would like to limit the vocabulary (or tokens?) to just those words.

I read the ticket here and it seems it is possible:

https://github.com/openai/whisper/discussions/843

I am wondering if it is possible to do it from your .NET wrapper?

I won't be able to install python on the target machine, so would like to implement this so I can just drop in the required DLLs and run the executable to start listening for the commands from the user.

image

adscott1982 commented 4 months ago

I see the 'tokenize' function is available on NativeMethods.cs

https://github.com/sandrohanea/whisper.net/blob/28839bf0aa80c723a68d2876f9adbb0a0f21d0d2/Whisper.net/Internals/Native/NativeMethods.cs#L70C1-L71C1

image

Is there a way to 'get_tokenizer' as in the python example through the C# wrapper?

adscott1982 commented 4 months ago

Wait - I just saw you have a custom GPT - I will give it a shot.

adscott1982 commented 4 months ago

The custom GPT wasn't able to help unfortunately. I would share the link, but it contained images so not currently allowed.

It recommended implementing a custom LogitFilter, but I don't know if that is possible with whisper.net.