Closed Coolykoen closed 5 months ago
Same with italian, what's weird is that I sent 2 voice messages spoken in italian:
-The first came out like this: (speaking in foreign language) (speaking in foreign language) (speaking in foreign language)
-The second one came translated perfectly to italian 😂
I think that whisper.cpp needs the auto language detection, because it defaults to ENG
EDIT: Ok It's probably not that easy
Thanks again for listening to my suggestions on reddit.
Im am trying to make it speech to text from dutch voice notes. however, here you can see that it simply tries to interpret it as english:
Its not making sense because i was speaking dutch haha. This was with the large v3 model, but before your updates, it did the exact same thing without the model env. so i suppose that was with base?
Current compose:
version: "3" services: shhhbot: container_name: shhhbot hostname: shhhbot restart: always image: ghcr.io/tonym128/shhh-bot environment: - model=large-v3-q5_0 - SHHH_API_KEY=API_KEY
The model is built into the image currently. Please try change the image line to
image: ghcr.io/tonym128/shhh-bot-small
And you can remove the model environment line... Might be an idea to support that to download a different model and build tiny into the base image... More features 😄.
Same with italian, what's weird is that I sent 2 voice messages spoken in italian:
-The first came out like this: (speaking in foreign language) (speaking in foreign language) (speaking in foreign language)
-The second one came translated perfectly to italian 😂
I think that whisper.cpp needs the auto language detection, because it defaults to ENG
EDIT: Ok It's probably not that easy
If you're able to, could you please supply a test audio sample.
I should be able to add additional settings for running whisper via the docker env. Will look into it
Same with italian, what's weird is that I sent 2 voice messages spoken in italian: -The first came out like this: (speaking in foreign language) (speaking in foreign language) (speaking in foreign language) -The second one came translated perfectly to italian 😂 I think that whisper.cpp needs the auto language detection, because it defaults to ENG EDIT: Ok It's probably not that easy
If you're able to, could you please supply a test audio sample.
I should be able to add additional settings for running whisper via the docker env. Will look into it
https://file.io/wI74oMcRG5pS thank you :)
EDIT: This is bot's response to that audio :D
Try changing your image to image: ghcr.io/tonym128/shhh-bot-small
I had to convert your audio file because the first run it told me it couldn't process the .ogg file
But after I converted it to an mp3 and uploaded it, I got this response. I should have asked you for the text too 😆
Ah apoligies, I misread, also getting english conversions, will look to add some more environment variables to setup how you want whisper to run.
The model is built into the image currently. Please try change the image line to
image: ghcr.io/tonym128/shhh-bot-small
So, i have now tried both -small and -medium. they certainly seem to translate now instead of interpreting it as english, thats an improvement. but it really doesnt work well. wrong words, skipped words, etc. English works great though, it even manages to understand (most of) the lyrics in songs, which is pretty cool. but yea i think dutch isnt popular enough, so it hasnt learned it that well maybe? or can i still improve something?
EDIT: Perhaps it can just respond in dutch, skipping the translation? that would be the ideal scenario in my opinion
Had a day to spend hacking! :)
image: ghcr.io/tonym128/shhh-bot
Now has the tiny model built in, but you can download a different model at startup by specifying the model in the environment eg SHHH_WHISPER_MODEL=medium
To get persistence across runs and now have to constantly redownload it on new versions, you can mount a persistent volume for the models at /models.
You can also supply whisper.cpp options as SHHH_WHISPER_OPTIONS -l nl - means it should assume the language is dutch -l auto - should try to autodetect, default is english
You can search for Options on https://github.com/ggerganov/whisper.cpp to see what is available.
My current docker-compose script
version: '3'
services:
shhhbot:
container_name: shhhbot
hostname: shhhbot
restart: unless-stopped
image: ghcr.io/tonym128/shhh-bot
environment:
- SHHH_API_KEY={API_KEY}
- SHHH_MY_CHAT_ID={CHAT_ID}
- SHHH_ALLOWED_CHAT_IDS={ALLOWED_CHAT_IDS}
- SHHH_WHISPER_MODEL=medium
- SHHH_WHISPER_OPTIONS=-l nl
volumes:
- models:/models
volumes:
models:
It can take a while to download the new image at startup
Here's with an example of me trying to speech some Dutch 😆
thanks! it does work really well now, amazing.
Can confirm it works wonderfully 💪🏻💪🏻 thanks a lot!
Thanks again for listening to my suggestions on reddit.
Im am trying to make it speech to text from dutch voice notes. however, here you can see that it simply tries to interpret it as english:
Its not making sense because i was speaking dutch haha. This was with the large v3 model, but before your updates, it did the exact same thing without the model env. so i suppose that was with base?
Current compose: