Closed aliscie closed 1 year ago
What do you mean by "sample_video"?
Do you have a command line or python code to show the problem you have?
The description in the issue #32 you opened is a bit more clear.
Maybe you have to lower the --no_speech_threshold
(try 0.0 instead of the default 0.6...)?
Can you try whisper alone to see if you have the same problem?
(launch whisper
instead of whisper_timestamped
)
If whisper is fine (if it solves your problem), then you can try whisper_timestamped
with option --accurate
.
What do you mean by "sample_video"?
Do you have a command line or python code to show the problem you have?
Oh man sirry I ment whisper_timestamped package . That is my project nae 😂😂
my code is
import whisper_timestamped as whisper
audio = whisper.load_audio("my_audio_file.wav")
model = whisper.load_model("base", device="cpu")
result = whisper.transcribe(model, audio, language="en")
import json
print(json.dumps(result, indent = 2, ensure_ascii = False))
OK, your code is fine, and I cannot investigate much without having the audio to reproduce.
Is there something particular about the portion of speech that is not transcribed? (low volume...)
Have you tied with import whisper
instead of import whisper_timestamped as whisper
?
Have you tried whisper_timestamped with option no_speech_threshold = 0
?
You can also play with options compression_ratio_threshold
and logprob_threshold
(lowering those thresholds).
And if the previous did not work, also try temperature = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0), best_of = 5
.
OK, your code is fine, and I cannot investigate much without having the audio to reproduce.
Is there something particular about the portion of speech that is not transcribed? (low volume...)
Have you tied with
import whisper
instead ofimport whisper_timestamped as whisper
?Have you tried whisper_timestamped with option
no_speech_threshold = 0
?You can also play with options
compression_ratio_threshold
andlogprob_threshold
(lowering those thresholds).And if the previous did not work, also try
temperature = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0), best_of = 5
.
@Jeronymous Here should I put these options whisper_timestamped
? is it like
options = {
"no_speech_threshold":0,
}
model = whisper.load_model("base", device="cpu", no_speech_threshold=0,options)
You can also play with options compression_ratio_threshold and logprob_threshold (lowering those thresholds).
OK let me clarify. You can add options here:
result = whisper.transcribe(model, audio, language="en")
If whisper works fine, then this should work fine:
result = whisper.transcribe(model, audio, language="en",
beam_size=5, best_of=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
)
And if you are concerned about processing time (wants things to run fast), you can give a try to:
result = whisper.transcribe(model, audio, language="en",
no_speech_threshold = 0
)
Note: I see you are using the "base" model, but the performance of this model are not good (about twice more transcription errors than the "small" model). So if you can afford more computation time / memory usage, I recommend that you use "small" if not "medium" model. It of course depends on the accuracy you are expecting.
@Jeronymous What about using "large" in model = whisper.load_model("large", device="cpu")
instead of using "base"? Would that help?
The transcription will be certainly better. And the computation time higher also... You're the best judge depending on your application. Just give it a try and see.
the Whisper_timestamped transcript only the first 10 words and ignore the rest?