The App crashed once start audio transcripting with audio input stream

fuddyduddy commented 1 week ago

I am very interested in your project. I am looking for an audio streaming transcripting method for language learning.

Long story short, i set large-v3 as model and choosen my audio_device: Headset (2- OpenRun by Shokz) MME (1 in). Pressed start and after the model is downloaded. I stopped the transcripting. It crashed. Then i restart the run.bat. And i checked the previous settings is loaded. Two results as below:

Press the start transcript, speak a few words, then it crashed as shown in cmd.
Press the start transcript,. without speaking, stop the transcript, it loads and stopped without crashed.

I am using NVIDIA GeForce RTX 4090 with 24GB vram 64 GB DDR5 ram intel i7-13700F CPU.

{
    "app_settings": {
        "audio_device": 1,
        "create_audio_file": true,
        "include_non_speech": false,
        "noise_threshold": 5,
        "non_speech_threshold": 0.1,
        "silence_limit": 8,
        "use_openai_api": false,
        "use_websocket_server": false
    },
    "model_settings": {
        "compute_type": "default",
        "cpu_threads": 0,
        "device": "cuda",
        "device_index": 0,
        "local_files_only": true,
        "model_size_or_path": "large-v2",
        "num_workers": 1
    },
    "transcribe_settings": {
        "append_punctuations": "\\\"'.。,，!！?？:：”)]}、",
        "beam_size": 5,
        "best_of": 5,
        "compression_ratio_threshold": 2.4,
        "condition_on_previous_text": true,
        "length_penalty": 1,
        "log_prob_threshold": -1,
        "max_initial_timestamp": 1,
        "no_repeat_ngram_size": 0,
        "no_speech_threshold": 0.6,
        "patience": 1,
        "prepend_punctuations": "\\\"'“¿([{-",
        "repetition_penalty": 1,
        "suppress_blank": true,
        "suppress_tokens": [
            -1
        ],
        "task": "transcribe",
        "temperature": [
            0,
            0.2,
            0.4,
            0.6,
            0.8,
            1
        ],
        "vad_filter": false,
        "vad_parameters": {
            "max_speech_duration_s": 0,
            "min_silence_duration_ms": 2000,
            "min_speech_duration_ms": 250,
            "speech_pad_ms": 400,
            "threshold": 0.5
        },
        "without_timestamps": false,
        "word_timestamps": false
    }
}

reriiasu commented 1 week ago

@fuddyduddy Hello fuddyduddy

If it works when you change the device in model_settings to cpu, then cuda is not available. In that case, please check the steps below. https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#gpu

fuddyduddy commented 4 days ago

Thanks for the reply. I ran the program with CPU successfully.

Then I read about the GPU Nvidia libraries. Checked that i have the following cuda and libraries version:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

For cuBlas

#define CUBLAS_VER_MAJOR 11
#define CUBLAS_VER_MINOR 11
#define CUBLAS_VER_PATCH 3
#define CUBLAS_VER_BUILD 6

And i downloaded and installed cudnn version 9.4 through .exe Although i don't find cudnn_version.h in my C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cudnn_version.h I believe the installer said installed is installed.

Then i downgraded to ctranslate2 to 3.24.0. And start myenv without run.bat, then ran command "python -m speech_to_text".

But the result with cuda transcripting still crashed. I checked with findstr ctranslate, confirmed the version is 3.24.0.

That's all i would wrap it a day. I will check with the docker method later since i have long time not touch the cuda version staff.

More information: I found my computer has C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 as well. But i am not sure whether a computer can installed multiple versions of cuda as nvcc --version only gives me v11.8.89 for me.

reriiasu / speech-to-text

The App crashed once start audio transcripting with audio input stream #19