Closed eugeneYz closed 5 months ago
I use NAudio to record audio and save it as a WAV file locally. When I immediately use Whisper for prediction, it recognizes irrelevant content. However, when I don't record again and instead use Whisper to predict the same local file, it can recognize the audio content. Are there any specific things I should be aware of that I might have overlooked? Thank you.
private void AudioRecStartHandle(object obj)
{
//if (File.Exists(tempWavFileName)) { File.Delete(tempWavFileName); }
nAudioHelper.StartRec();
}
private async void AudioRecStopHandle(object obj)
{
try
{
nAudioHelper.StopRec();
Thread.Sleep(500);
using FileStream fileStream = new FileStream(tempWavFileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
await foreach (var result in processor.ProcessAsync(fileStream))
{
string recognizedText = result.Text;
if (string.IsNullOrEmpty(recognizedText))
{
fileStream.Dispose();
break;
}
..... }
public void StartRec()
{
WaveSource = new WaveIn();
var filePath = "D:\\2_WPF画面\\WPFSamples-main\\WpfControlsX\\WpfControlsX\\Resource\\temp.wav";
WaveSource.WaveFormat = new WaveFormat(16000, 16, 1); // 16bit,16KHz,Mono的录音格式
writer = new WaveFileWriter(filePath, WaveSource.WaveFormat);
WaveSource.BufferMilliseconds = 3000;
WaveSource.DataAvailable += Recording;
WaveSource.RecordingStopped += RecordingStopped;
WaveSource.StartRecording();
}
public void StopRec()
{
try
{
WaveSource?.StopRecording();
// Close Wave(Not needed under synchronous situation)
WaveSource?.Dispose();
WaveSource = null;
}
catch (Exception e)
{
DialogHelper.Error(e.ToString());
}
}
private void Recording(object sender, WaveInEventArgs e)
{
writer?.Write(e.Buffer, 0, e.BytesRecorded);
}
private void RecordingStopped(object sender, StoppedEventArgs e)
{
writer?.Close();
writer?.Dispose();
writer = null;
}
Real-time processing is not fully supported as described here: https://github.com/sandrohanea/whisper.net/issues/25
When you just send partial results, those might have half-words in them and no token can be understood (especially for Chinese, where a token is usually a lot longer in duration).
The effect of wav file recognition is good, but there will be some irrelevant results in real-time speech recognition. I use Naudio, Wavesource. BufferMilliseconds = 2000, recognizing it once after recording for 2 seconds.