sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
534 stars 82 forks source link

Working with MP3s #27

Closed 7702244 closed 1 year ago

7702244 commented 1 year ago

How can we use the library with MP3 files? At the moment when working with MP3, the error "Invalid wave file RIFF header" is thrown. The original Whisper supports MP3 files.

Alumniminium commented 1 year ago

Original whisper does not support mp3, some example scripts just include the ffmpeg decoding step.

sandrohanea commented 1 year ago

Hello @7702244, @Alumniminium is right, Whisper as a model is only supporting wav. But you can use ffmpeg or NAudio to process other types.

I didn't want to introduce any additional dependency in the base package, but another package with NAudio integration (which will have dependency both NAudio and Whisper.net) would be great.

GFNiko commented 1 year ago

For example, I wrote an Adapter with FfMpegCore, which converts every given file to .wav with a 16kHz sampling rate. It saves the converted file to the origin's filename and changes its file extension to .wav. Just needed to install FfMpegCore via NuGet

FFMpegArguments
   .FromFileInput(filePath)
   .OutputToFile($"{outputFilePath}", true,
      options => options
         .ForceFormat("wav")
         .WithAudioSamplingRate(16000))
   .ProcessAsynchronously();

Works like a charm for me (if you don't program async, just use .ProcessSynchronously() in the last line)

sandrohanea commented 1 year ago

Added also an example of how to use NAudio in order to convert the mp3 to wav 16 khz and pass it to Whisper.net: https://github.com/sandrohanea/whisper.net/blob/main/examples/NAudioMp3/Program.cs

Ofc, FFmpeg can be used instead as @GFNiko shown above.

dfengpo commented 7 months ago

However, NAudio does not support Linux as it integrates too many Windows related APIs

Huddeij commented 7 months ago

That's right. I also had to switch for an integration in Kubernetes. However, it works perfectly with ffmpeg