Closed peterstavrou closed 10 months ago
def translate(audio): options = dict(beam_size=5, best_of=5) translate_options = dict(task="translate", options) result = model.transcribe(audio_file,translate_options,demucs=True,vad=True)
return result.to_dict()
What is beam_size=5, best_of=5
? It doesn't seem to work for me, I get an AssertionError
. What exactly is wrong with what I'm doing?
The recent update should generally prevent text from appearing too early. Which version are you using?
The recent update should generally prevent text from appearing too early. Which version are you using?
The latest. I did the below before logging this issue.
pip install -U git+https://github.com/jianfch/stable-ts.git
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
Avoid using ts_num
.
What is
beam_size=5, best_of=5
? It doesn't seem to work for me, I get anAssertionError
. What exactly is wrong with what I'm doing?
What was the error message?
Avoid using
ts_num
.What is
beam_size=5, best_of=5
? It doesn't seem to work for me, I get anAssertionError
. What exactly is wrong with what I'm doing?What was the error message?
I commented out ts_num=16
but didn't make a difference.
Error:
Traceback (most recent call last):
File "c:\Z\Programming\Python\OpenAI_Whisper\Video_Translated_Subtitles.py", line 13, in <module>
result = model.transcribe(
^^^^^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\stable_whisper\whisper_word_level.py", line 458, in transcribe_stable
result: DecodingResult = decode_with_fallback(mel_segment, ts_token_mask=ts_token_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\stable_whisper\whisper_word_level.py", line 335, in decode_with_fallback
decode_result, audio_features = model.decode(seg,
^^^^^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\stable_whisper\decode.py", line 112, in decode_stable
result = task.run(mel)
^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\whisper\decoding.py", line 732, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Z\Programming\Python\OpenAI_Whisper\venv\Lib\site-packages\stable_whisper\decode.py", line 36, in _main_loop
assert audio_features.shape[0] == tokens.shape[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
You might have installed Whisper from the its repo which is not compatible with Stable-ts. Try:
pip install --upgrade --no-deps --force-reinstall openai-whisper==20230314
You might have installed Whisper from the its repo which is not compatible with Stable-ts. Try:
pip install --upgrade --no-deps --force-reinstall openai-whisper==20230314
Unfortunately the issue still happens
Unfortunately the issue still happens
Try to install Stable-ts in a new environment and not install Whisper but just Stable-ts directly.
Unfortunately the issue still happens
Try to install Stable-ts in a new environment and not install Whisper but just Stable-ts directly.
I deleted completely my venv and then installed Stable-ts using the latest commit. The AssertionError error is gone but the issue with some subtitles appearing way before any speech still happens when translating a video file (tv show).
result = model.transcribe(
input_file,
language="nl",
task="translate",
fp16=False,
suppress_silence=True,
no_speech_threshold=0.6,
beam_size=5,
best_of=5,
)
What type of audio you are passing,if it contains music also,you can add parameters like: model.transcribe(audio_file,demucs=True,vad=True)
Unfortunately the issue still happens
Try to install Stable-ts in a new environment and not install Whisper but just Stable-ts directly.
I deleted my venv completely and then installed Stable-ts using the latest commit but it's still happening.
What type of error is coming can you mention that also
Unfortunately the issue still happens
Try to install Stable-ts in a new environment and not install Whisper but just Stable-ts directly.
I deleted my venv completely and then installed Stable-ts using the latest commit but it's still happening.
What type of error is coming can you mention that also
Sorry I have updated my reply. The original issue of subtitles appearing way before any speech still happens.
Unfortunately the issue still happens
Try to install Stable-ts in a new environment and not install Whisper but just Stable-ts directly.
I deleted completely my venv and then installed Stable-ts using the latest commit. The AssertionError error is gone but the issue with some subtitles appearing way before any speech still happens when translating a video file (tv show).
result = model.transcribe( input_file, language="nl", task="translate", fp16=False, suppress_silence=True, no_speech_threshold=0.6, beam_size=5, best_of=5, )
If it fails to detect the non speech with vad=True
and demucs=True
then try including min_word_dur=0
as well.
You can also use a lower value of medium_factor
or even set a max_dur
value for clamp_max() .
@jianfch can you provide in detail when to use which parameter inside the transcribe function and what range it covers because there are many parameters and each have different characteristics to play
@Hemangpandey a detailed documentation is on the roadmap, but for now there is only the docstring
Some subtitles appear and stay on the screen 10-15 seconds before anyone even talks. It's not all like this but it happens frequently. Some subtitles disappear way too fast (not sure if it's related).