jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

Make Whisper dependency optional? #348

Closed vytskalt closed 7 months ago

vytskalt commented 7 months ago

Hi, I want to use stable-ts in my project for forced alignment, however when I add it to my Docker image the size jumps from 9MB to over 2.5GB.

I noticed that the Whisper dependency takes a lot of space due to it depending on Numba which uses LLVM at runtime. I'm using the Hugging Face Transformers option, so I would assume the dependency on original Whisper implementation could be removed (?). Looking at the code of stable-ts, it seems like the usage of Whisper is mostly just for importing constants.

jianfch commented 7 months ago

To install whisperless version of stable-ts:

pip install -U stable-ts-whisperless

or

pip install -U git+https://github.com/jianfch/stable-ts.git@whisperless
vytskalt commented 7 months ago

I am getting the following error when running stable-ts audio.wav --align text.txt --language en -fw --model /tmp/tiny --transcribe_option "ignore_compatibility=1":

Traceback (most recent call last):
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/bin/.stable-ts-wrapped", line 9, in <module>
    sys.exit(cli())
             ^^^^^
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/whisper_word_level/cli.py", line 735, in cli
    _cli(cmd=cmd, _cache=cache)
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/whisper_word_level/cli.py", line 702, in _cli
    result: WhisperResult = call_method_with_options(transcribe_method, transcribe_options)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/whisper_word_level/cli.py", line 536, in call_method_with_options
    return method(**options)
           ^^^^^^^^^^^^^^^^^
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/alignment.py", line 446, in align
    nonspeech_preds = nonspeech_predictor.predict(audio=audio_segment, offset=time_offset)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/stabilization/__init__.py", line 238, in predict_with_nonvad
    mask = self.pad_mask(mask)
           ^^^^^^^^^^^^^^^^^^^
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/stabilization/__init__.py", line 129, in pad_mask
    return self.mask_pad_func(mask, 1501)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/c1j23xdk5hxsqddimq1n2ym79s1cgy1q-stable-ts-whisperless-d44caabb937599f5dbdae637b1c77c3fb81e2f8d/lib/python3.11/site-packages/stable_whisper/whisper_compatibility.py", line 27, in whisper_not_available
    raise ModuleNotFoundError("Please install Whisper: "
ModuleNotFoundError: Please install Whisper: 'pip install openai-whisper==20231117'. Official Whisper repo: https://github.com/openai/whisper

It appears to be related to detecting non-speech parts. In my case, the audio only consists of speech so would it be possible to disable that?

(I switched to using faster-whisper since hugging face transformers option did not appear to support alignment)

jianfch commented 7 months ago

This error and compatibility warning should be fixed in af01d5bd8b319fa62ca00f90479c7b186b45f5d6.

vytskalt commented 7 months ago

It works perfectly now, thanks!