tonym128 / shhh-bot

A Telegram Bot to convert speech to text from small videos and audio files.
MIT License
22 stars 1 forks source link

Speech to text for dutch not working #12

Closed Coolykoen closed 5 months ago

Coolykoen commented 5 months ago

Thanks again for listening to my suggestions on reddit.

Im am trying to make it speech to text from dutch voice notes. however, here you can see that it simply tries to interpret it as english:

image Its not making sense because i was speaking dutch haha. This was with the large v3 model, but before your updates, it did the exact same thing without the model env. so i suppose that was with base?

Current compose:

version: "3"
services:
  shhhbot:
    container_name: shhhbot
    hostname: shhhbot
    restart: always
    image: ghcr.io/tonym128/shhh-bot
    environment:
      - model=large-v3-q5_0
      - SHHH_API_KEY=API_KEY
cchrkk commented 5 months ago

Same with italian, what's weird is that I sent 2 voice messages spoken in italian:

-The first came out like this: (speaking in foreign language) (speaking in foreign language) (speaking in foreign language)

-The second one came translated perfectly to italian 😂

I think that whisper.cpp needs the auto language detection, because it defaults to ENG

image EDIT: Ok It's probably not that easy

tonym128 commented 5 months ago

Thanks again for listening to my suggestions on reddit.

Im am trying to make it speech to text from dutch voice notes. however, here you can see that it simply tries to interpret it as english:

image Its not making sense because i was speaking dutch haha. This was with the large v3 model, but before your updates, it did the exact same thing without the model env. so i suppose that was with base?

Current compose:

version: "3"
services:
  shhhbot:
    container_name: shhhbot
    hostname: shhhbot
    restart: always
    image: ghcr.io/tonym128/shhh-bot
    environment:
      - model=large-v3-q5_0
      - SHHH_API_KEY=API_KEY

The model is built into the image currently. Please try change the image line to

image: ghcr.io/tonym128/shhh-bot-small

And you can remove the model environment line... Might be an idea to support that to download a different model and build tiny into the base image... More features 😄.

tonym128 commented 5 months ago

Same with italian, what's weird is that I sent 2 voice messages spoken in italian:

-The first came out like this: (speaking in foreign language) (speaking in foreign language) (speaking in foreign language)

-The second one came translated perfectly to italian 😂

I think that whisper.cpp needs the auto language detection, because it defaults to ENG

image EDIT: Ok It's probably not that easy

If you're able to, could you please supply a test audio sample.

I should be able to add additional settings for running whisper via the docker env. Will look into it

cchrkk commented 5 months ago

Same with italian, what's weird is that I sent 2 voice messages spoken in italian: -The first came out like this: (speaking in foreign language) (speaking in foreign language) (speaking in foreign language) -The second one came translated perfectly to italian 😂 I think that whisper.cpp needs the auto language detection, because it defaults to ENG image EDIT: Ok It's probably not that easy

If you're able to, could you please supply a test audio sample.

I should be able to add additional settings for running whisper via the docker env. Will look into it

https://file.io/wI74oMcRG5pS thank you :)

EDIT: This is bot's response to that audio :D image

tonym128 commented 5 months ago

Try changing your image to image: ghcr.io/tonym128/shhh-bot-small

I had to convert your audio file because the first run it told me it couldn't process the .ogg file

But after I converted it to an mp3 and uploaded it, I got this response. I should have asked you for the text too 😆

converted_text

tonym128 commented 5 months ago

Ah apoligies, I misread, also getting english conversions, will look to add some more environment variables to setup how you want whisper to run.

Coolykoen commented 5 months ago

The model is built into the image currently. Please try change the image line to

image: ghcr.io/tonym128/shhh-bot-small

So, i have now tried both -small and -medium. they certainly seem to translate now instead of interpreting it as english, thats an improvement. but it really doesnt work well. wrong words, skipped words, etc. English works great though, it even manages to understand (most of) the lyrics in songs, which is pretty cool. but yea i think dutch isnt popular enough, so it hasnt learned it that well maybe? or can i still improve something?

EDIT: Perhaps it can just respond in dutch, skipping the translation? that would be the ideal scenario in my opinion

tonym128 commented 5 months ago

Had a day to spend hacking! :)

image: ghcr.io/tonym128/shhh-bot

Now has the tiny model built in, but you can download a different model at startup by specifying the model in the environment eg SHHH_WHISPER_MODEL=medium

To get persistence across runs and now have to constantly redownload it on new versions, you can mount a persistent volume for the models at /models.

You can also supply whisper.cpp options as SHHH_WHISPER_OPTIONS -l nl - means it should assume the language is dutch -l auto - should try to autodetect, default is english

You can search for Options on https://github.com/ggerganov/whisper.cpp to see what is available.

My current docker-compose script

version: '3'

services:
  shhhbot:
    container_name: shhhbot
    hostname: shhhbot
    restart: unless-stopped
    image: ghcr.io/tonym128/shhh-bot
    environment:
      - SHHH_API_KEY={API_KEY}
      - SHHH_MY_CHAT_ID={CHAT_ID}
      - SHHH_ALLOWED_CHAT_IDS={ALLOWED_CHAT_IDS}
      - SHHH_WHISPER_MODEL=medium 
      - SHHH_WHISPER_OPTIONS=-l nl
    volumes:
      - models:/models
volumes:
  models:

It can take a while to download the new image at startup

Here's with an example of me trying to speech some Dutch 😆 Dutch Example

Coolykoen commented 5 months ago

thanks! it does work really well now, amazing.

cchrkk commented 5 months ago

Can confirm it works wonderfully 💪🏻💪🏻 thanks a lot!