I went through the habana code inside this repo and how Intel uses a cheatcode method to adapt any transformer modules to run on Intel gaudi/gaudi2 hardware. I didn't read the underlying code written by Habana devs but I was curious can it in theory transform any transformer based model including those which are written for cuda in mind?
Overlook my ignorance but I don't think Intel offers any native framework to develop architecture or models that leverages their hardware and fine-tuned for it. Almost all of its hardware is inference based rather than training + inference both in mind. That's why, I was wondering if it can run insanely fast whisper or faster whisper alike models faster or equal speed of Nvidia A100/H100/H200?
I went through the habana code inside this repo and how Intel uses a cheatcode method to adapt any transformer modules to run on Intel gaudi/gaudi2 hardware. I didn't read the underlying code written by Habana devs but I was curious can it in theory transform any transformer based model including those which are written for cuda in mind?
Overlook my ignorance but I don't think Intel offers any native framework to develop architecture or models that leverages their hardware and fine-tuned for it. Almost all of its hardware is inference based rather than training + inference both in mind. That's why, I was wondering if it can run insanely fast whisper or faster whisper alike models faster or equal speed of Nvidia A100/H100/H200?
https://huggingface.co/Systran/faster-whisper-medium