nyadla-sys / whisper.tflite

Optimized OpenAI's Whisper TFLite Port for Efficient Offline Inference on Edge Devices
MIT License
134 stars 29 forks source link

How to fix issue when i build vocab from filters_vocab_gen_util.ipnb file? #16

Open ITHealer opened 9 months ago

ITHealer commented 9 months ago

https://github.com/usefulsensors/openai-whisper/blob/main/notebooks/filters_vocab_gen_util.ipynb

image

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/content/whisper/whisper/assets/gpt2'. Use repo_type argument if needed.

I did not change anything in your code!!

Please help me, Thanks

nyadla-sys commented 9 months ago

Use the below colab to generate vocab.bin and i have change dthe magic and please follow https://colab.research.google.com/github/nyadla-sys/whisper.tflite/blob/main/models/tflt_vocab_mel.ipynb

please refer
https://github.com/nyadla-sys/whisper.tflite/tree/main/models

ITHealer commented 9 months ago

Use the below colab to generate vocab.bin and i have change dthe magic and please follow https://colab.research.google.com/github/nyadla-sys/whisper.tflite/blob/main/models/tflt_vocab_mel.ipynb

please refer https://github.com/nyadla-sys/whisper.tflite/tree/main/models

I was able to run... Thank you very much!

nyadla-sys commented 9 months ago

//tfltchange in minimal if (magic != 0x74666C74) { printf("Invalid vocab file (bad magic)\n"); return 0; }

ITHealer commented 9 months ago

//tfltchange in minimal if (magic != 0x74666C74) { printf("Invalid vocab file (bad magic)\n"); return 0; }

yes i noticed that and i fixed it. Thanks!

ITHealer commented 9 months ago

//tfltchange in minimal if (magic != 0x74666C74) { printf("Invalid vocab file (bad magic)\n"); return 0; }

0x74666C74

One thing I don't know is if I use another model will the value "0x74666C74" have to change or not. What is it and how do I identify it?

nyadla-sys commented 9 months ago

You can comment out this code it is just kind of authentication step to make sure you are using our vocab for this

ITHealer commented 9 months ago

Excuse me, if I want to build in another language, for example Vietnamses, I need to provide the word vocabulary and mel spectrogram of the dataset that I bring to train or I can use the vocab set and mel is also the language. but not from my data set, okay?

Is it true that for each different voice and frequency, each vocabulary will be mapped differently?

I'm new to AI so there are some things I'm not sure I'm stating correctly.

ITHealer commented 9 months ago

Because currently I am only provided with a model that has been finetuned in Vietnamese and I need to create a bin file containing the vocab and mel files like you did. Can you guide me on what to keep in mind to create it?

Thanks!

nyadla-sys commented 9 months ago

You need to generate multilingual vocab file based on fine tuned pytorch model