sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
534 stars 82 forks source link

Error Message of "Invalid wave file RIFF header" with various valid wav files #33

Closed GFNiko closed 1 year ago

GFNiko commented 1 year ago

I copy/pasted the demo code to the file whisper.cs isntalled the packages in nuget. The only changes I made are changing the models (base to large) and the file name inside Default="" in the'f' Option. The code is besides taht really the same as the demo code of this repo! The wav files are in the project folder and registered by whisper.net

But, unfortunateIy I get the following error message every time no matter which wav file I try:

   at Whisper.net.Wave.WaveParser.InitializeAsync()
   at Whisper.net.Wave.WaveParser.GetAvgSamplesAsync(CancellationToken cancellationToken)
   at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+MoveNext()
   at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Program.<<Main>$>g__FullDetection|0_2(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 80
   at Program.<<Main>$>g__FullDetection|0_2(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 80
   at Program.<<Main>$>g__Demo|0_0(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 33
   at CommandLine.ParserResultExtensions.WithParsedAsync[T](ParserResult`1 result, Func`2 action)
   at Program.<Main>$(String[] args) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 13
   at Program.<Main>(String[] args)

The output before the error message:

whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: type = 5
whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 2950.97 MB
whisper_model_load: model size = 2950.66 MB

I tried the sample wav files from this repo, audio records, converted into wav via cloudconvert and ffmpeg.

Environment: MS Windows 11 Pro 22H2 .Net v7.0.203 Jetbrains Rider 2023.1.1

What am i doing wrong here?

sandrohanea commented 1 year ago

Hello @GFNiko , As you can see, that exception is only thrown if the stream which is provided, doesn't contain RIFF header (as all wave file should do): https://github.com/sandrohanea/whisper.net/blob/b397baa30ae11ede6110dd764c5e2b44a5793bcc/Whisper.net/Wave/WaveParser.cs#LL150C23-L150C45

I suspect you set the file in a different path or you passed a stream which is already consumed.

In order to debug this further you can display the first 4 chars in the header (at like 78 in the demo): using var fileStream = File.OpenRead(opt.FileName);

                var buffer = new byte[4];
        fileStream.Read(buffer, 0, 4);
        Console.WriteLine(System.Text.Encoding.UTF8.GetString(buffer));
        fileStream.Position = 0;
GFNiko commented 1 year ago

Changing the file location helped. Strange behaviour if you ask me. Anyways, thank you very much for your help :)