ValueError: This model doesn't have language tokens so it can't perform lang id

trappedinspacetime commented 1 year ago

Hi, First of all, thank you for making this tool. I am running Ubuntu 22.04 Mate. I cloned this repo and followed your guide. However, I get the following error:

        🚀 voice control ready ... listening every 3 seconds
        alex waiting for order ...
        listening ...
        saving audio ...
        transcribing audio data ...
        Traceback (most recent call last):
          File "/home/****/Desktop/2022-10/linux-voice-control/main.py", line 102, in <module>
            main()
          File "/home****/.local/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
            return self.main(*args, **kwargs)
          File "/home/****/.local/lib/python3.9/site-packages/click/core.py", line 1053, in main
            rv = self.invoke(ctx)
          File "/home/****/.local/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
            return ctx.invoke(self.callback, **ctx.params)
          File "/home/****/.local/lib/python3.9/site-packages/click/core.py", line 754, in invoke
            return __callback(*args, **kwargs)
          File "/home/****/Desktop/2022-10/linux-voice-control/main.py", line 79, in main
            result = audio_model.transcribe(WAVE_OUTPUT_FILENAME, fp16=False)
          File "/home/****/.local/lib/python3.9/site-packages/whisper/transcribe.py", line 90, in transcribe
            _, probs = model.detect_language(segment)
          File "/home/****/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
            return func(*args, **kwargs)
          File "/home/****/.local/lib/python3.9/site-packages/whisper/decoding.py", line 35, in detect_language
            raise ValueError(f"This model doesn't have language tokens so it can't perform lang id")
        ValueError: This model doesn't have language tokens so it can't perform lang id

omegaui commented 1 year ago

Thanks for appraisal and reporting... I have been out for a while, I am now in .. to figure this out.

omegaui commented 1 year ago

@trappedinspacetime

Hey I just reinstalled it from the repo. And it still seems to run perfectly as designed. May be the problem is from your end. To start debugging, I need to know the following things from you

Your Python installation Version
Your System's audio service (puslseaudio, pipewire, or any other)

omegaui commented 1 year ago

According to the Exception ... The problem seems to be arising from the language model. The default language model is set to base

You can try changing to one of the other available models. and rerun the install.sh script to see if it works.

trappedinspacetime commented 1 year ago

@omegaui Thank you for your help. I've just tried it from scratch. I got the same errors.

 python --version 
 Python 3.9.1

Audio service is pulseaudio

omegaui commented 1 year ago

@trappedinspacetime

Try updating your python installation to the latest to see if it works. I don't know about how whisper is written actually, if this also fails to help then I will look further to fix this.

trappedinspacetime commented 1 year ago

       pyenv install 3.10.7
      Downloading Python-3.10.7.tar.xz...
      -> https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz
      Installing Python-3.10.7...
      WARNING: The Python tkinter extension was not compiled and GUI subsystem has been detected. Missing the Tk toolkit?
      Installed Python-3.10.7 to /home/****/.pyenv/versions/3.10.7

   pyenv global 3.10.7

   python3 -m pip install pip -U

   python --version 
   Python 3.10.7

   pip install -r requirements.txt

    ./install.sh

      🚀 voice control ready ... listening every 3 seconds
      alex waiting for order ...
      listening ...
      saving audio ...
      transcribing audio data ...
      Traceback (most recent call last):
        File "/home/***/lvc-bin/main.py", line 102, in <module>
          main()
        File "/home/***/.pyenv/versions/3.10.7/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
          return self.main(*args, **kwargs)
        File "/home/***/.pyenv/versions/3.10.7/lib/python3.10/site-packages/click/core.py", line 1053, in main
          rv = self.invoke(ctx)
        File "/home/***/.pyenv/versions/3.10.7/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
          return ctx.invoke(self.callback, **ctx.params)
        File "/home/***/.pyenv/versions/3.10.7/lib/python3.10/site-packages/click/core.py", line 754, in invoke
          return __callback(*args, **kwargs)
        File "/home/***/lvc-bin/main.py", line 79, in main
          result = audio_model.transcribe(WAVE_OUTPUT_FILENAME, fp16=False)
        File "/home/***/.local/lib/python3.10/site-packages/whisper/transcribe.py", line 90, in transcribe
          _, probs = model.detect_language(segment)
        File "/home/***/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
          return func(*args, **kwargs)
        File "/home/***/.local/lib/python3.10/site-packages/whisper/decoding.py", line 35, in detect_language
          raise ValueError(f"This model doesn't have language tokens so it can't perform lang id")
      ValueError: This model doesn't have language tokens so it can't perform lang id

omegaui commented 1 year ago

@trappedinspacetime Well ... this doesn't seems good. I'm looking forward to fix this

omegaui commented 1 year ago

@trappedinspacetime You can check if this commit have fixed the exception on your side?

trappedinspacetime commented 1 year ago

@omegaui I'm sorry for taking your time. I tried it in a clean virtualenv and it worked. I thought it stemmed from distro difference. Fedora seems to be more stable. By the way whispercpp, a fork of whisper is faster and memory efficient.

Thank you for your kindness. All the best.

omegaui commented 1 year ago

@trappedinspacetime No worries ... your report can help others recover. Happy Hacking!

omegaui / linux-voice-control

ValueError: This model doesn't have language tokens so it can't perform lang id #4