Issue with Implementing Custom Whisper TFLite Model - Seeking Guidance

nyadla-sys / whisper.tflite

Optimized OpenAI's Whisper TFLite Port for Efficient Offline Inference on Edge Devices

MIT License

134 stars 29 forks source link

Issue with Implementing Custom Whisper TFLite Model - Seeking Guidance #8

Open JusticeEli opened 11 months ago

JusticeEli commented 11 months ago

Hi,

I'm currently working on implementing a custom Whisper TFLite model, and I'm facing some challenges, particularly with FlatBuffers and understanding how the Tensor-related components work within a TFLite model. I've tried to research and troubleshoot on my own, but I could use some guidance from the community or the repository maintainers.

If there's any specific information on how the repository maintainers created the Whisper TFLite model or if there's a related repository with resources and code samples, I would love to explore it to better understand the implementation.

nyadla-sys commented 11 months ago

Please refer the below google colab to understand and how to generate tflite model from TFWhisper model https://colab.research.google.com/github/nyadla-sys/whisper.tflite/blob/main/models/generate_tflite_from_whisper.ipynb

JusticeEli commented 11 months ago

Thank you

JusticeEli commented 10 months ago

Question about Android Example

In the android_example package, inside the assets folder, I noticed the presence of the file filter_vocab_gen.bin. This file appears to contain mel filters and vocabularies. However, I would like to inquire if there is any documentation available that explains the source and content of this particular file. I'm interested in understanding its purpose and origin.

Additionally, while reviewing the code, I noticed that there are some hard-coded values used for clamping and normalization of the input audio. Could you kindly provide some insights or references on how these specific values were chosen? Understanding the rationale behind these values would be greatly appreciated.

Thank you for your assistance.

vilassn commented 10 months ago

For multilingual inputs, Whisper uses a custom tokenizer. For English only inputs, it uses the standard GPT-2 tokenizer which are both accessible through the open source Whisper Python package

Multilingual vocab data is derived from multilingual.tiktoken English only vocab data is derived from gpt2.tiktoken

Mel filter bank data derived from mel_filters.npz

For Log mel spectrogram calculation, refer whisper python code - https://github.com/openai/whisper/blob/main/whisper/audio.py#L92-L156

JusticeEli commented 10 months ago

Thank you

ITHealer commented 9 months ago

How to fix issue when i build filters_vocab_gen_util.ipnb? https://github.com/usefulsensors/openai-whisper/blob/main/notebooks/filters_vocab_gen_util.ipynb

i am running on gg colab

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/content/whisper/whisper/assets/gpt2'. Use repo_type argument if needed.

I did not change anything in your code!!

Please help me, Thanks

ITHealer commented 9 months ago

@vilassn @nyadla-sys Can you help me?

Thank you very much!!!

nyadla-sys commented 9 months ago

How to fix issue when i build filters_vocab_gen_util.ipnb? https://github.com/usefulsensors/openai-whisper/blob/main/notebooks/filters_vocab_gen_util.ipynb

i am running on gg colab

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/content/whisper/whisper/assets/gpt2'. Use repo_type argument if needed.

I did not change anything in your code!!

Please help me, Thanks Use below to generate vocab mel https://colab.research.google.com/github/nyadla-sys/whisper.tflite/blob/main/models/tflt_vocab_mel.ipynb