Closed faxotherapy closed 3 months ago
I did try again with the medium version (ggml-medium.bin) this time, and it seems to work. Any reason why the larger model would not work? I've got 24-GB RAM. Thank you.
It seems that the issue might be with the model itself, as discussed in this GitHub issue.
A potential solution has been suggested in this comment. Currently, we don't have an option to pass max tokens to Whisper, but I can add it if needed.
Did you downloaded the large model from the same link opened from settings?
I released new version with option to set max context tokens length
You can install and run with
cd /tmp
wget -q --show-progress https://github.com/thewh1teagle/vibe/releases/download/v2.0.1/vibe_2.0.1_amd64.deb
sudo apt install ./vibe_2.0.1*.deb --reinstall
RUST_LOG=debug vibe
Then choose again large model, and in advanced options in main window before transcribe set maximum context to 64
or 32
I downloaded the large model from https://huggingface.co/ggerganov/whisper.cpp/tree/main
I'm gonna try your latest release today with the larger model. But, wouldn't the medium version be enough instead of using the larger one? Thanks.
I'm gonna try your latest release today with the larger model. But, wouldn't the medium version be enough instead of using the larger one? Thanks.
I believe that the medium version is sufficient in most cases. that's why I set it to be the default model in the app. Also, transcribing with larger model takes more time.
Hi, thx for your reply. In fact, I'm gonna get rid of the large model, which either used with 32 or 64 max context, provided very unsatisfactory results (repetitions). Though using 32 max context provided much less repetitions with 64 max context.
Sticking with the medium version is best, with or without using, e.g. a 32-max context.
What happened?
Attempting to transcribe a MP4 video lasting about 1 h 30. After 20' of transcribing, app keeps repeating the same sentence precisely 686 times. After 46', the app keeps repeating again another same sentence precisely 1401 times. It seems the app likes those two sentences very much. Unfortunately, the log file does not reflect at all this issue.
Is it OK that I feed the app directly an MP4 video ? Or should I extract the audio track and feed it into the app?
Context:
Should I first convert the audio to regular AAC?
Example:
This one kept repeating 1401 times:
This other one kept repeating 686 times:
I verified. This is not the case at all in the video; guy keeps talking normally using other sentences.
Thank you for sharing any suggestion.
Steps to reproduce
What OS are you seeing the problem on?
Linux
Relevant log output
When selecting the video, this is what happens:
I guess I should repeat the transcription again with log enabled using
RUST_LOG=debug vibe