jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

Incompatible with latest faster-whisper #403

Open vytskalt opened 1 month ago

vytskalt commented 1 month ago

Looks like recent changes to faster-whisper broke compatibility with stable-ts, giving errors like this:

Traceback (most recent call last):
  File "/nix/store/qcr3a5k910x6ywvkhinzqjiwv50mpvn1-stable-ts-aligner/bin/stable-ts-aligner", line 41, in <module>
    results = list(executor.map(lambda req: align(req, model), requests))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/zs1xky7izkfmc8wxm8bhhdff5a605hfj-python3-minimal-3.11.9/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/zs1xky7izkfmc8wxm8bhhdff5a605hfj-python3-minimal-3.11.9/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/nix/store/zs1xky7izkfmc8wxm8bhhdff5a605hfj-python3-minimal-3.11.9/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/nix/store/zs1xky7izkfmc8wxm8bhhdff5a605hfj-python3-minimal-3.11.9/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/nix/store/zs1xky7izkfmc8wxm8bhhdff5a605hfj-python3-minimal-3.11.9/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/qcr3a5k910x6ywvkhinzqjiwv50mpvn1-stable-ts-aligner/bin/stable-ts-aligner", line 41, in <lambda>
    results = list(executor.map(lambda req: align(req, model), requests))
                                            ^^^^^^^^^^^^^^^^^
  File "/nix/store/qcr3a5k910x6ywvkhinzqjiwv50mpvn1-stable-ts-aligner/bin/stable-ts-aligner", line 10, in align
    result = model.align(request['audio_file'], request['text'], language=request['language'], nonspeech_skip=None, fast_mode=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/il0mrip5xma96339x272izlv1mq1g5lq-python3-minimal-3.11.9-env/lib/python3.11/site-packages/stable_whisper/alignment.py", line 583, in align
    segment = timestamp_words()
              ^^^^^^^^^^^^^^^^^
  File "/nix/store/il0mrip5xma96339x272izlv1mq1g5lq-python3-minimal-3.11.9-env/lib/python3.11/site-packages/stable_whisper/alignment.py", line 309, in timestamp_words
    features = model.feature_extractor(audio_segment.cpu().numpy())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/il0mrip5xma96339x272izlv1mq1g5lq-python3-minimal-3.11.9-env/lib/python3.11/site-packages/faster_whisper/feature_extractor.py", line 88, in __call__
    waveform = waveform.to(torch.float32)
               ^^^^^^^^^^^
AttributeError: 'numpy.ndarray' object has no attribute 'to'
jianfch commented 1 month ago

There seems to be a lot more that broke from those changes with the sudden change from numpy to pytorch.

vytskalt commented 1 month ago

Yes, very weird they would do that in a minor release.

HeidelParreno commented 1 month ago

I tried running stable-ts[fw] in jpy notebook, it crashed at model.transcribe. It works on the prompt tho, dunno what causes it.

jianfch commented 1 month ago

I tried running stable-ts[fw] in jpy notebook, it crashed at model.transcribe. It works on the prompt tho, dunno what causes it.

stable-ts[fw] installs the latest Faster-Whisper version (1.0.3) on PyPI, so aforementioned changes (occured after 1.0.3) do not affect it. For Faster-Whisper models, the transcribe() method is the original Faster-Whisper transcription method. To use Stable-ts, use model.transcribe_stable() instead. But if transcribe() is crashing then it's likely a Faster-Whisper issue. A similar issue seems to be on their repo already: https://github.com/SYSTRAN/faster-whisper/issues/820.

HeidelParreno commented 1 month ago

I tried running stable-ts[fw] in jpy notebook, it crashed at model.transcribe. It works on the prompt tho, dunno what causes it.

stable-ts[fw] installs the latest Faster-Whisper version (1.0.3) on PyPI, so aforementioned changes (occured after 1.0.3) do not affect it. For Faster-Whisper models, the transcribe() method is the original Faster-Whisper transcription method. To use Stable-ts, use model.transcribe_stable() instead. But if transcribe() is crashing then it's likely a Faster-Whisper issue. A similar issue seems to be on their repo already: SYSTRAN/faster-whisper#820.

Thanks for this! Solved the crashing by downgrading faster-whisper to 1.0.0, changed my CUDA to 12.1, and changed my pytorch to cu121