Start by looking at the WhisperModel instantiation in transcribe.py. I'm using the faster_whisper library for inference. They may or may not have a way to load these checkpoints.
Most likely we would to fetch them from hugging face, then point WhisperModel at that file path.
I can also take a look at this, this is intriguing.