xenova / whisper-web

ML-powered speech recognition directly in your browser
https://hf.co/spaces/Xenova/whisper-web
MIT License
2.58k stars 274 forks source link

[experimental-webgpu] - Configuring Encoder/Decoder Precision with dtype for Local Models #50

Open kostia-ilani opened 1 day ago

kostia-ilani commented 1 day ago

Hello,

I’m using whisper-web (experimental-webgpu branch) with local models, (env.allowLocalModels = true and env.localModelPath = "./models"), and facing challenges in setting distinct dtype values for encoder_model and decoder_model_merged with a - small model.

The error I see -

Uncaught (in promise) Error: Can't create a session. ERROR_CODE: 7, ERROR_MESSAGE: Failed to load model because protobuf parsing failed.

Is there a specific convention for key names or values when setting dtype for encoder/decoder precision levels (according to the models ONNX files?

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "my-whisper-model",
  {
    dtype: {
      encoder_model: "fp32",
      decoder_model_merged: "q4"
    },
    device: "webgpu"
  }
);
xenova commented 1 day ago

Might be related to https://github.com/huggingface/transformers.js/issues/1025#issuecomment-2474198860 (Have you pulled the git lfs files into your local folder?)

kostia-ilani commented 11 hours ago

Thanks for the fast reply. @xenova I use local files stored under ./models, so it's not a git issue.

The files were taken from https://huggingface.co/Xenova/whisper-small/tree/main/onnx

Do you have any assumptions about what might be the issue?