usefulsensors / openai-whisper

Robust Speech Recognition via Large-Scale Weak Supervision
MIT License
62 stars 24 forks source link

tflite models - segmentation fault #12

Open cogmeta opened 1 year ago

cogmeta commented 1 year ago

I tried using tflite models built using the notebook https://github.com/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb.

But, I am getting segmentation fault when tried with stream. Any ideas why that might be happening?

cogmeta commented 1 year ago

n_vocab:50257 audio_sdl_init: found 1 capture devices: audio_sdl_init: - Capture device #0: 'MacBook Pro Microphone' audio_sdl_init: attempt to open capture device 0 : 'MacBook Pro Microphone' ... audio_sdl_init: obtained spec for input device (SDL Id = 2): audio_sdl_init: - sample rate: 16000 audio_sdl_init: - format: 33056 (required: 33056) audio_sdl_init: - channels: 1 (required: 1) audio_sdl_init: - samples per frame: 1024 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Segmentation fault: 11

nyadla-sys commented 1 year ago

Could you please follow the steps outlined here https://github.com/usefulsensors/openai-whisper/tree/main/stream

Sorry, I don't have macbook pro to verify the same

cogmeta commented 1 year ago

Oh, it works perfectly well with the ../models/whisper.tflite but not with the models created with the notebook. I wanted to try creating tflite with medium model.

nyadla-sys commented 1 year ago

Use the following colab, but swap out openai/whisper-tiny for openai/whisper-medium. https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb

cogmeta commented 1 year ago

Followed the exact same steps. Trying to use the model results in segmentation faults even on Ubuntu machine. Does not look like mac issue.

nyadla-sys commented 1 year ago

I may need to construct a new multilingual vocab bin, but it may be worthwhile to test the model below for the existing vocab bin Use the following GitHub repository, but replace openai/whisper-tiny with openai/whisper-medium.en.

nyadla-sys commented 1 year ago

Before building stream example replace ~/openai-whisper/stream/filters_vocab_gen.h with filters_vocab_multilingual.h cp ~/openai-whispermodels/filters_vocab_multilingual.h ~/openai-whispermodels/stream/filters_vocab_gen.h then follow the build steps and run with below command to use whisper-medium.tflite model on stream example ./stream ../models/whisper-medium.tflite

nyadla-sys commented 1 year ago

Use whisper-medium.tflite that you generated

nyadla-sys commented 1 year ago

I am not able to upload model as it is around 700MB in size

nyadla-sys commented 1 year ago

I could add whisper-medium.tflite

cogmeta commented 1 year ago

Thanks! will try it out.

cogmeta commented 1 year ago

./stream ../models/whisper-medium.tflite

n_vocab:50257 audio_sdl_init: found 1 capture devices: audio_sdl_init: - Capture device #0: 'MacBook Pro Microphone' audio_sdl_init: attempt to open capture device 0 : 'MacBook Pro Microphone' ... audio_sdl_init: obtained spec for input device (SDL Id = 2): audio_sdl_init: - sample rate: 16000 audio_sdl_init: - format: 33056 (required: 33056) audio_sdl_init: - channels: 1 (required: 1) audio_sdl_init: - samples per frame: 1024 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. ERROR: gather index out of bounds ERROR: Node number 35 (GATHER) failed to invoke. ERROR: Node number 8319 (WHILE) failed to invoke. Error at /Users/prashantsasatte/openai-whisper/tensorflow_src/tensorflow/lite/examples/stream/stream.cc:366

nyadla-sys commented 1 year ago

Can u try with minimal build and replace filters vocab gen.bin with filters vocab multilingual and rename as filters vocab gen.bin

nyadla-sys commented 1 year ago

I think it is producing more tokens than I restricted to 223 in model generation,pls change max_tokens to 384 and generate model again

cogmeta commented 1 year ago

Tried with minimal build. same error...I have not yet looked into the code. will do that. Thanks.

I am actually looking to implement a streaming GRPC server.

nyadla-sys commented 1 year ago

I have incorporated 448 tokens into the whisper-medium model in an attempt to solve your issue.

nyadla-sys commented 1 year ago

please try latest whisper-medium model