vade / OpenAI-Whisper-CoreML

OpenAI's Whisper ported to CoreML
144 stars 3 forks source link

Whisper CoreML

A port of OpenAI's Whisper Speech Transcription model to CoreML

The goal of this project is to natively port, and optimize Whisper for use on Apple Silicon including optimization for the Apple Neural Engine, and match the incredible WhisperCPP project on features.


Please note this repo is currently under development, so there will be bumps in the road.

Community input is welcome!


You can:

Create a Whipser instance whisper = try Whisper()

And run transcription on a Quicktime compatible asset via: await whisper.transcribe(assetURL:URL, options:WhisperOptions)

You can choose options via the WhisperOptions struct.

Whipser CoreML will load an asset using AVFoundation and convert the audio to the appropriate format for transcription.

Alternatively, for realtime usage, you can call start a whisper session via startWhisperSession(options:WhisperOptions), and then send sample buffers to accrueSamplesFromSampleBuffer(sampleBuffer:CMSampleBuffer) from say an AVCaptureSession or AVAudioSession, or any other source.

Note, we accrue a 30 second sample for now as that is the expected number of samples required.

Status

Performance

Getting Models:

For ease of use, you can use this Google Colab to convert models. Note that if you convert Medium or larger models you may run into memory issues on Google Colab.

This repository assumes youre converting multilingual models. If you need 'en' models you'll need to adjust the special token values by negative 1.