Closed yakovw closed 7 months ago
@adamnova added something like this to the demo in: https://github.com/sandrohanea/whisper.net/pull/9
However, I didn't wanted to add this Naudio dependency on the full demo, but since then, each example is done in a different project where there is no problem to have NAudio.
I think it makes sense to move it as a standalone example.
Also, for the best mic support, continuous recognition is also a must: https://github.com/sandrohanea/whisper.net/issues/25 Otherwise, transcript can be bad near "merging" segments.
I would also add some mic example for blazor, as that would be pretty cool.
My demo was basically a proof of concept, it is not very usable in practice. Without the continuous recognition, all you get is somewhat repeating lines of text.
It appears there is now continuous recognition here:
https://github.com/sandrohanea/whisper.net/tree/main/examples/ContinuousRecognition
Though it appears that's an example rather than part of core, is there a chance of getting a microphone sample now?
I was able to get realtime transcription from the mic working on my M1 Mac using the code below, which uses OpenTK.OpenAL. This is stitched together from various SO posts, and could be improved, but may be helpful to others looking to do similar. I ended up having to get the CoreML model manually, unzipping, and putting it in the current folder. Ideally IMO Whisper.net would "just work" and download this model when on apple silicon, similar to how it does the base .bin model.
The other "gotcha" I ran into was that I needed to specify a float[] buffer and ALFormat.MonoFloat32Ext capture.
var modelName = "ggml-base.bin";
//TODO: also https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base-encoder.mlmodelc.zip
if (!File.Exists(modelName))
{
Console.WriteLine("Downloading whisper model...");
using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.Base);
using var fileWriter = File.OpenWrite(modelName);
await modelStream.CopyToAsync(fileWriter);
}
using var whisperFactory = WhisperFactory.FromPath(modelName);
using var processor = whisperFactory.CreateBuilder()
.WithLanguage("en")
.Build();
int bufferLength = 10 * 16000;//10 sec
var mic = ALC.CaptureOpenDevice(null, 16000, ALFormat.MonoFloat32Ext, bufferLength);
Console.WriteLine("Using:");
Console.WriteLine(ALC.GetString(new ALDevice(mic.Handle), AlcGetString.DeviceSpecifier));
var currentInput = new StringBuilder();
ALC.CaptureStart(mic);
var buffer = new float[bufferLength];
for (int i = 0; i < 100; ++i)
{
Thread.Sleep(1000);
int samplesAvailable = ALC.GetAvailableSamples(mic);
ALC.CaptureSamples(mic, buffer, samplesAvailable);
if (samplesAvailable > 0)
{
await foreach (var resultData in processor.ProcessAsync(buffer[..samplesAvailable]))
{
Console.WriteLine("RAW:" + resultData.Text);
}
}
}
ALC.CaptureStop(mic);
ALC.CaptureCloseDevice(mic);
`
Will close any issue related to streaming processing as linked to: https://github.com/sandrohanea/whisper.net/issues/25
Is there a way in the library to use the microphone and not just transcribe an existing recording? because the original library has in whisper.cpp