Closed MiscellaneousStuff closed 1 year ago
Could this be done by swapping the whisper packages underneath?
-- pip install openai-whisper
++ pip install git+https://github.com/MiscellaneousStuff/whisper.git
Yep. That submodule is exactly the same as the original but has swapped the Linear() layer for nn.Linear(). However, it also means that anyone wanting to run the model at half precision on GPU won’t be able to do it, should it only use that custom whisper module for dynamic quantisation on CPU.
Great! In that case, I'll add it as a note on Readme to swap out whisper for your fork if they intend to run it on a CPU only machine. Thanks!
Updated Readme here: 0431dee2eedac62c6ddae96c2145d801ffee3c15
Doing what is recommended in the Readme does not work:
Note: If you're using a CPU-only machine, your runtime can be sped-up by using quantization implemented by @MicellaneousStuff by swapping out pip install openai-whisper from requirements.txt and replacing it with their fork pip install git+https://github.com/MiscellaneousStuff/whisper.git (See related discussion here - https://github.com/hayabhay/whisper-ui/issues/20)
what exactly has to be put in the requirements.txt?
Would it be possible for you guys to add an option to enable dynamic quantization of the model when it's being run on a CPU? This would greatly improve the run-time performance of the OpenAI Whisper model (CPU-only) with minimal to no loss in performance.
The benchmarks for this are available here.
The implementation only requires adding a few lines of code using features which are already built into PyTorch.
Implementation
Quantization of the Whisper model requires changing the
Linear()
layers within the model tonn.Linear()
. This is because you need to specifiy which layer types to dynamically quantize, such as:However the whisper model is designed to be adaptable, i.e. it can run at different precisions, so the
Linear()
layer contains custom code to account for this. However, this is not required for the quantized model. You can either change theLinear()
layers in "/whisper/whisper/model.py" yourself (i.e. create a fork of OpenAI-Whisper which would be compatible with future merges), or you can use mine from here.