sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
506 stars 77 forks source link

Invalid Wave file RIFF header #130

Open JordanBrits opened 8 months ago

JordanBrits commented 8 months ago

I am currently developing a blazor web app that uploads a audio file and gets processed and converted to 16 bit using NAudio, however i am running into the same exception each time.

Whisper.net.Wave.CorruptedWaveException: Invalid wave file RIFF header.
   at Whisper.net.Wave.WaveParser.InternalInitialize(Boolean useAsync, CancellationToken cancellationToken)
   at Whisper.net.Wave.WaveParser.GetAvgSamplesAsync(CancellationToken cancellationToken)
   at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+MoveNext()
   at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
using (var stream = new MemoryStream(File.ReadAllBytes(name)))
{
    using (var reader = new WaveFileReader(name))
    {
        var newFormat = new WaveFormat(8000, 16, 1);
        using (var conversionStream = new WaveFormatConversionStream(newFormat, reader))
        {
            var stringBuilder = new StringBuilder();

            await foreach (var segment in processor.ProcessAsync(conversionStream, CancellationToken.None))
            {
                stringBuilder.Append($"{segment.Start} ==> {segment.End} : {segment.Text}" + Environment.NewLine);
            }

            return stringBuilder.ToString();
        }
    }
}
sandrohanea commented 8 months ago

It seems that WaveFormatConversionStream is just parsing the input and converting to a pcm stream (but not writing the header as a WAVE file) + having WaveFormat property which can be used to create the header.

In this case, I suggest that either

  1. you create a wrapper on top of it and provide the WAVE header from the WaveFormat property, or 2. you can extract the frames directly and instead of calling ProcessAsync(stream) you will call ProcessAsync (ReadOnlyMemory).

In this case 2 should be preferred as it doesn't make sense to create a write some stream just for WaveParser to understand your format when you can provide directly the frames as you know the format.

LSXAxeller commented 6 months ago

I got the same error when I tried to parse an mp3, try this to convert mp3 to wav and parse

            if (Path.GetExtension(name).Equals(".mp3", StringComparison.OrdinalIgnoreCase))
            {
                using var mp3FileReader = new Mp3FileReader(SelectedTranscribeAudio);
                // Create a WaveFormatConversionStream to convert MP3 to WAV
                var pcmStream = WaveFormatConversionStream.CreatePcmStream(mp3FileReader);

                // Resample the PCM stream to 16kHz
                var resampler = new WdlResamplingSampleProvider(pcmStream.ToSampleProvider(), 16000);

                // Write the resampled WAV data to a memory stream
                WaveFileWriter.WriteWavFileToStream(stream, resampler.ToWaveProvider16());
                stream.Seek(0, SeekOrigin.Begin);
            }

            var results = TranscribeProcessor?.ProcessAsync(stream);
             await foreach (var result in results)
             {
                stringBuilder.Append($"{segment.Start} ==> {segment.End} : {segment.Text}" + Environment.NewLine);
            }
eugeneYz commented 5 months ago

I got the same error when I tried to parse an mp3, try this to convert mp3 to wav and parse

            if (Path.GetExtension(name).Equals(".mp3", StringComparison.OrdinalIgnoreCase))
            {
                using var mp3FileReader = new Mp3FileReader(SelectedTranscribeAudio);
                // Create a WaveFormatConversionStream to convert MP3 to WAV
                var pcmStream = WaveFormatConversionStream.CreatePcmStream(mp3FileReader);

                // Resample the PCM stream to 16kHz
                var resampler = new WdlResamplingSampleProvider(pcmStream.ToSampleProvider(), 16000);

                // Write the resampled WAV data to a memory stream
                WaveFileWriter.WriteWavFileToStream(stream, resampler.ToWaveProvider16());
                stream.Seek(0, SeekOrigin.Begin);
            }

            var results = TranscribeProcessor?.ProcessAsync(stream);
             await foreach (var result in results)
             {
                stringBuilder.Append($"{segment.Start} ==> {segment.End} : {segment.Text}" + Environment.NewLine);
            }

How do you declarate "stream" in WaveFileWriter.WriteWavFileToStream(stream, resampler.ToWaveProvider16());

LSXAxeller commented 4 months ago

I got the same error when I tried to parse an mp3, try this to convert mp3 to wav and parse

            if (Path.GetExtension(name).Equals(".mp3", StringComparison.OrdinalIgnoreCase))
            {
                using var mp3FileReader = new Mp3FileReader(SelectedTranscribeAudio);
                // Create a WaveFormatConversionStream to convert MP3 to WAV
                var pcmStream = WaveFormatConversionStream.CreatePcmStream(mp3FileReader);

                // Resample the PCM stream to 16kHz
                var resampler = new WdlResamplingSampleProvider(pcmStream.ToSampleProvider(), 16000);

                // Write the resampled WAV data to a memory stream
                WaveFileWriter.WriteWavFileToStream(stream, resampler.ToWaveProvider16());
                stream.Seek(0, SeekOrigin.Begin);
            }

            var results = TranscribeProcessor?.ProcessAsync(stream);
             await foreach (var result in results)
             {
                stringBuilder.Append($"{segment.Start} ==> {segment.End} : {segment.Text}" + Environment.NewLine);
            }

How do you declarate "stream" in WaveFileWriter.WriteWavFileToStream(stream, resampler.ToWaveProvider16());

using var stream= new MemoryStream();