yum-food / TaSTT

A free self-hosted STT for VRChat
MIT License
78 stars 4 forks source link

ct2 models don't load #7

Open Anthonyg5005 opened 5 months ago

Anthonyg5005 commented 5 months ago

I can't get the ct2 models from hf to load with any settings. I've tried installing the correct version of pytorch and upgrading all the python packages to the latest versions in the venv which stopped the warning but still couldn't get them to load.\ Here's an example of the logs I got below:

Launching transcription engine
py app valid: true
DEBUG::operator ():: config_path: Resources/app_config.yml
Input Device id  0  -  Microsoft Sound Mapper - Input
Input Device id  1  -  Microphone (Razer Seiren Mini)
Input Device id  2  -  Headset Microphone (Oculus Virt
Input Device id  3  -  Microphone (2- VR P10 Dongle)
Input Device id  4  -  Microphone (WO Mic Device)
Input Device id  5  -  Microphone (Virtual Desktop Aud
Input Device id  6  -  Microphone (Steam Streaming Mic
Input Device id  7  -  CABLE Output (VB-Audio Virtual 
Finding mic 1
Mic 1 requested, treating it as a numerical device ID
Found mic 1: Microphone (Razer Seiren Mini)
Mic sample rate: 44100
Model yumfood/whisper_distil_medium_en_ct2 will be saved to C:\Users\*****\Downloads\TaSTT\Resources\Models\yumfood/whisper_distil_medium_en_ct2

Here's all the settings: image

Anthonyg5005 commented 5 months ago

actually, checking the folders, it seems like the models weren't stored where they're supposed to be. Instead, they were only stored in hf cache folder

I found the fix only for the download. It requires the new huggingface_hub version with the updated download workflow introduced in version 0.23.0. only problem left is that it doesn't seem to load the model.

Python 3.10.9

Launching transcription engine
py app valid: true
DEBUG::operator ():: config_path: Resources/app_config.yml
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Input Device id  0  -  Microsoft Sound Mapper - Input
Input Device id  1  -  Microphone (Razer Seiren Mini)
Input Device id  2  -  Headset Microphone (Oculus Virt
Input Device id  3  -  Microphone (2- VR P10 Dongle)
Input Device id  4  -  Microphone (WO Mic Device)
Input Device id  5  -  Microphone (Virtual Desktop Aud
Input Device id  6  -  Microphone (Steam Streaming Mic
Input Device id  7  -  CABLE Output (VB-Audio Virtual 
Finding mic 1
Mic 1 requested, treating it as a numerical device ID
Found mic 1: Microphone (Razer Seiren Mini)
Mic sample rate: 44100
Model yumfood/whisper_distil_medium_en_ct2 will be saved to C:\Users\*****\Downloads\TaSTT\Resources\Models\yumfood/whisper_distil_medium_en_ct2
Traceback (most recent call last):
  File "C:\Users\*****\Downloads\TaSTT\Resources\Scripts\transcribe_v2.py", line 1270, in <module>
    run(cfg)
  File "C:\Users\*****\Downloads\TaSTT\Resources\Scripts\transcribe_v2.py", line 1185, in run
    whisper = Whisper(collector, cfg)
  File "C:\Users\*****\Downloads\TaSTT\Resources\Scripts\transcribe_v2.py", line 435, in __init__
    self.model = WhisperModel(model_str,
  File "C:\Users\*****\Downloads\TaSTT\Resources\Python\lib\site-packages\faster_whisper\transcribe.py", line 114, in __init__
    model_path = download_model(
  File "C:\Users\*****\Downloads\TaSTT\Resources\Python\lib\site-packages\faster_whisper\utils.py", line 58, in download_model
    raise ValueError(
ValueError: Invalid model size 'yumfood/whisper_distil_medium_en_ct2', expected one of: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2
Command exited with code 1: 0: The operation completed successfully.
Anthonyg5005 commented 5 months ago

Complete fix for it was to upgrade to the latest versions of huggingface_hub and faster-whisper.\ The way I was able to fix it manually was by going into TaSTT\Resources\Python and running the following command:

python -m pip install huggingface_hub faster-whisper --upgrade

The only problem being that it doesn't show download progress.