jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

Installation instructions incomplete #354

Closed nns2009 closed 6 months ago

nns2009 commented 6 months ago

At first I got:

PS C:\Users\nns2009> stable-ts
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\nns2009\AppData\Roaming\Python\Python312\Scripts\stable-ts.exe\__main__.py", line 4, in <module>
  File "C:\Users\nns2009\AppData\Roaming\Python\Python312\site-packages\stable_whisper\__init__.py", line 1, in <module>
    from .whisper_word_level import *
  File "C:\Users\nns2009\AppData\Roaming\Python\Python312\site-packages\stable_whisper\whisper_word_level\__init__.py", line 2, in <module>
    from .cli import cli
  File "C:\Users\nns2009\AppData\Roaming\Python\Python312\site-packages\stable_whisper\whisper_word_level\cli.py", line 9, in <module>
    import torch
  File "C:\Users\nns2009\AppData\Roaming\Python\Python312\site-packages\torch\__init__.py", line 141, in <module>
    raise err
OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\nns2009\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll" or one of its dependencies.

Turns out, I needed to run pip install -U stable-ts from an admin PowerShell.

Then I got:

PS E:\> stable-ts "E:\TestTS.m4a"
Loaded Whisper base model
C:\Program Files\Python312\Lib\site-packages\stable_whisper\whisper_word_level\original_whisper.py:236: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\Python312\Scripts\stable-ts.exe\__main__.py", line 7, in <module>                                                                                          File "C:\Program Files\Python312\Lib\site-packages\stable_whisper\whisper_word_level\cli.py", line 735, in cli                                                                      _cli(cmd=cmd, _cache=cache)                                                                                                                                                     File "C:\Program Files\Python312\Lib\site-packages\stable_whisper\whisper_word_level\cli.py", line 702, in _cli                                                                     result: WhisperResult = call_method_with_options(transcribe_method, transcribe_options)                                                                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\stable_whisper\whisper_word_level\cli.py", line 536, in call_method_with_options
    return method(**options)
           ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\stable_whisper\whisper_word_level\original_whisper.py", line 271, in transcribe_stable
    audio = AudioLoader(
            ^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\stable_whisper\audio\__init__.py", line 197, in __init__
    metadata = get_metadata(source)
               ^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\stable_whisper\audio\utils.py", line 159, in get_metadata
    metadata = subprocess.run(
               ^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\Python312\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

Turns out you need to install ffmpeg (I installed through Chocolatey).

Finally, it worked, but on CPU:

C:\Program Files\Python312\Lib\site-packages\stable_whisper\whisper_word_level\original_whisper.py:236: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

Turns out I had to reinstall PyTorch separately with a CUDA option: https://pytorch.org/get-started/locally/#with-cuda-1

I suggest to include this info in the "Installation" section, as simply pip install -U stable-ts is not enough.

It would also be nice to have some example command line launches for the basic usecases. My current one:

stable-ts --language en --model large-v3 --max_chars 100 --output_format srt <filename>