mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
468 stars 19 forks source link

Other -> use AMD ROCm crashes app #93

Closed Kentoseth closed 6 months ago

Kentoseth commented 7 months ago

In Settings > Other:

Choosing "Use AMD ROCm" crashes the app. So I chose "override GPU version" which then allows me to choose the right version under STT, but under TTS I get:

TTS-speechnote-bug

So the override fixes STT but TTS breaks.

mkiol commented 7 months ago

Thanks for the report.

What video card is in your system?

Could you please start the app with --verbose option and paste here an output?

flatpak run net.mkiol.SpeechNote --verbose

Thanks

Kentoseth commented 7 months ago

What video card is in your system?

AMD Radeon RX 5500M

Related ticket: https://github.com/mkiol/dsnote/issues/85

Here is the relevant output:

0x7fd104f60d00 () - app service state: unknown => busy logger error: invalid format string qrc:/qml/main.qml:340:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/main.qml:331:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/MainToolBar.qml:282:13: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... }

.

[D] 13:34:09.571 0x7fd104f60d00 () - choosing model for id: "ar_whisper_medium" "fa" [D] 13:34:09.571 0x7fd104f60d00 () - gpu device str: ("ROCm", " 0", " AMD Radeon RX 5500M") [D] 13:34:09.571 0x7fd104f60d00 () - restart stt engine config: "lang=ar, lang_code=, model-files=[model-file=/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_medium.ggml, scorer-file=, ttt-model-file=], speech-mode=single-sentence, vad-mode=aggressiveness-3, speech-started=0, text-format=raw, options=t, use-gpu=1, gpu-device=[id=0, api=rocm, name=AMD Radeon RX 5500M, platform-name=], sub-config=[min-segment-dur=4, min-line-length=30, max-line-length=60]" [D] 13:34:09.571 0x7fd104f60d00 () - new stt engine required [D] 13:34:09.571 0x7fd104f60d00 open_whisper_lib:122 - using whisper-hipblas [D] 13:34:09.573 0x7fd104f60d00 make_wparams:429 - cpu info: arch=x86_64, cores=16 [D] 13:34:09.573 0x7fd104f60d00 make_wparams:431 - using threads: 5/16 [D] 13:34:09.573 0x7fd104f60d00 make_wparams:433 - system info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | [D] 13:34:09.573 0x7fd104f60d00 start:225 - starting engine [D] 13:34:09.573 0x7fd104f60d00 start:234 - engine started [D] 13:34:09.573 0x7fd104f60d00 () - creating audio source [D] 13:34:09.573 0x7fd104f60d00 () - mic source created [D] 13:34:09.573 0x7fcd203fe600 start_processing:271 - processing started [D] 13:34:09.573 0x7fcd203fe600 set_processing_state:457 - processing state: idle => initializing [D] 13:34:09.573 0x7fcd203fe600 set_processing_state:464 - speech detection status: no-speech => initializing (no-speech) [D] 13:34:09.573 0x7fcd203fe600 () - service refresh status, new state: idle [D] 13:34:09.574 0x7fcd203fe600 () - task state changed: 0 => 3 [D] 13:34:09.574 0x7fcd203fe600 create_whisper_model:239 - creating whisper model whisper_init_from_file_with_params_no_state: loading model from '/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_medium.ggml' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1024 whisper_model_load: n_audio_head = 16 whisper_model_load: n_audio_layer = 24 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1024 whisper_model_load: n_text_head = 16 whisper_model_load: n_text_layer = 24 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 9 whisper_model_load: qntvr = 2 whisper_model_load: type = 4 (medium) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 rocBLAS error: Cannot read /app/extensions/amd/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek

mkiol commented 7 months ago

Could you please past here log when app start as well?

What value is set in "override GPU version"?

Thanks

Kentoseth commented 7 months ago

Could you please past here log when app start as well?

.

flatpak run net.mkiol.SpeechNote --verbose
Qt: Session management error: Could not open network socket [I] 13:33:40.951 0x7fd104f60d00 init:49 - logging to stderr enabled [D] 13:33:40.951 0x7fd104f60d00 () - version: 4.4.0 [D] 13:33:40.952 0x7fd104f60d00 parse_cpuinfo:117 - cpu flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap [D] 13:33:40.952 0x7fd104f60d00 parse_cpuinfo:125 - cpuinfo: processor-count=16, flags=[avx, avx2, fma, f16c, ] [D] 13:33:40.952 0x7fd104f60d00 () - translation: "en_ZA" [W] 13:33:40.952 0x7fd104f60d00 () - failed to install translation [D] 13:33:40.952 0x7fd104f60d00 () - starting standalone app [D] 13:33:40.952 0x7fd104f60d00 () - app: net.mkiol dsnote [D] 13:33:40.952 0x7fd104f60d00 () - config location: "/home/client/.var/app/net.mkiol.SpeechNote/config" [D] 13:33:40.952 0x7fd104f60d00 () - data location: "/home/client/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote" [D] 13:33:40.952 0x7fd104f60d00 () - cache location: "/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote" [D] 13:33:40.952 0x7fd104f60d00 () - settings file: "/home/client/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf" [D] 13:33:40.952 0x7fd104f60d00 () - platform: "xcb" [D] 13:33:40.952 0x7fd104f60d00 () - amd addon exists [D] 13:33:40.952 0x7fd104f60d00 () - enforcing num threads: 0 [D] 13:33:41.120 0x7fd104f60d00 () - supported audio input devices: ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp [D] 13:33:41.391 0x7fd104f60d00 () - "pulse" [D] 13:33:41.393 0x7fd104f60d00 () - "default" ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio) [D] 13:33:41.397 0x7fd104f60d00 () - "sysdefault:CARD=acp" [D] 13:33:41.397 0x7fd104f60d00 () - "alsa_input.pci-0000_07_00.6.HiFihw_Generic_1source" [D] 13:33:41.397 0x7fd104f60d00 () - "alsa_output.pci-0000_07_00.6.HiFihw_Generic_1sink.monitor" [D] 13:33:41.397 0x7fd104f60d00 () - "alsa_input.pci-0000_07_00.6.HiFihw_acp__source" [D] 13:33:41.397 0x7fd104f60d00 () - "alsa_output.pci-0000_03_00.1.HiFihw_HDMI_3__sink.monitor" [D] 13:33:41.425 0x7fd104f60d00 () - starting service: app-standalone [D] 13:33:41.426 0x7fd104f60d00 () - mbrola dir: "/app/bin" [D] 13:33:41.426 0x7fd104f60d00 () - espeak dir: "/app/bin" [D] 13:33:41.426 0x7fd0c3691600 loop:75 - py executor loop started [D] 13:33:41.429 0x7fd104f60d00 () - module already unpacked: "rhvoicedata" [D] 13:33:41.429 0x7fd104f60d00 () - module already unpacked: "rhvoiceconfig" [D] 13:33:41.432 0x7fd104f60d00 () - module already unpacked: "espeakdata" [D] 13:33:41.432 0x7fd104f60d00 () - default stt model not found: "ar_whisper_medium" [D] 13:33:41.432 0x7fd104f60d00 () - default tts model not found: "ar_piper_jo_kareem_medium" [D] 13:33:41.432 0x7fd104f60d00 () - default mnt lang not found: "en" [D] 13:33:41.432 0x7fd104f60d00 () - new default mnt lang: "en" [D] 13:33:41.433 0x7fd104f60d00 () - service refresh status, new state: busy [D] 13:33:41.433 0x7fd104f60d00 () - service state changed: unknown => busy [D] 13:33:41.433 0x7fd104f60d00 () - delaying features availability [D] 13:33:41.434 0x7fd104f60d00 () - runtime prefix: "/app" [D] 13:33:41.435 0x7fd104f60d00 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal") [D] 13:33:41.435 0x7fd104f60d00 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2") [D] 13:33:41.435 0x7fd104f60d00 () - import paths: ("/usr/lib/qml", "/app/bin", "qrc:/qt-project.org/imports") [D] 13:33:41.435 0x7fd104f60d00 () - library paths: ("/usr/share/runtime/lib/plugins", "/usr/lib/plugins", "/app/bin") [D] 13:33:41.435 0x7fd104f60d00 () - using auto qt style [D] 13:33:41.435 0x7fd104f60d00 () - XDG_CURRENT_DESKTOP: KDE [D] 13:33:41.435 0x7fd104f60d00 () - switching to style: "org.kde.desktop" [D] 13:33:41.435 0x7fd0ec9fa600 () - config version: 65 65 [D] 13:33:41.437 0x7fd0c3691600 libs_availability:61 - checking: torch cuda [D] 13:33:41.463 0x7fd0ec9fa600 () - models changed [D] 13:33:42.57 0x7fd104f60d00 () - starting app: app-standalone [D] 13:33:42.57 0x7fd104f60d00 () - app service state: unknown => busy

.

What value is set in "override GPU version"?

10.1.0

(it works when using this setting)

mkiol commented 7 months ago

Sorry, I wasn't very specific. Please paste the log until the application is no longer busy. Wait a few seconds after starting. Some important logs are actually generated in this "busy" state after start.

Thank you for your patience 😄

Kentoseth commented 7 months ago

I hope this is correct. Im pasting the entire log until the 'busy' icon stopped spinning:

Qt: Session management error: Could not open network socket [I] 13:19:08.861 0x7f254f65cd00 init:49 - logging to stderr enabled [D] 13:19:08.861 0x7f254f65cd00 () - version: 4.4.0 [D] 13:19:08.861 0x7f254f65cd00 parse_cpuinfo:117 - cpu flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap [D] 13:19:08.861 0x7f254f65cd00 parse_cpuinfo:125 - cpuinfo: processor-count=16, flags=[avx, avx2, fma, f16c, ] [D] 13:19:08.861 0x7f254f65cd00 () - translation: "en_UK" [W] 13:19:08.862 0x7f254f65cd00 () - failed to install translation [D] 13:19:08.862 0x7f254f65cd00 () - starting standalone app [D] 13:19:08.862 0x7f254f65cd00 () - app: net.mkiol dsnote [D] 13:19:08.862 0x7f254f65cd00 () - config location: "/home/client/.var/app/net.mkiol.SpeechNote/config" [D] 13:19:08.862 0x7f254f65cd00 () - data location: "/home/client/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote" [D] 13:19:08.862 0x7f254f65cd00 () - cache location: "/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote" [D] 13:19:08.862 0x7f254f65cd00 () - settings file: "/home/client/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf" [D] 13:19:08.862 0x7f254f65cd00 () - platform: "xcb" [D] 13:19:08.862 0x7f254f65cd00 () - amd addon exists [D] 13:19:08.862 0x7f254f65cd00 () - enforcing num threads: 0 [D] 13:19:09.22 0x7f254f65cd00 () - supported audio input devices: ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp [D] 13:19:09.286 0x7f254f65cd00 () - "pulse" [D] 13:19:09.289 0x7f254f65cd00 () - "default" ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio) [D] 13:19:09.292 0x7f254f65cd00 () - "sysdefault:CARD=acp" [D] 13:19:09.292 0x7f254f65cd00 () - "alsa_input.pci-0000_07_00.6.HiFihw_Generic_1source" [D] 13:19:09.292 0x7f254f65cd00 () - "alsa_output.pci-0000_07_00.6.HiFihw_Generic_1sink.monitor" [D] 13:19:09.292 0x7f254f65cd00 () - "alsa_input.pci-0000_07_00.6.HiFihw_acp__source" [D] 13:19:09.292 0x7f254f65cd00 () - "alsa_output.pci-0000_03_00.1.HiFihw_HDMI_3__sink.monitor" [D] 13:19:09.313 0x7f254f65cd00 () - starting service: app-standalone [D] 13:19:09.314 0x7f254f65cd00 () - mbrola dir: "/app/bin" [D] 13:19:09.314 0x7f254f65cd00 () - espeak dir: "/app/bin" [D] 13:19:09.314 0x7f2535876600 loop:75 - py executor loop started [D] 13:19:09.317 0x7f254f65cd00 () - module already unpacked: "rhvoicedata" [D] 13:19:09.317 0x7f254f65cd00 () - module already unpacked: "rhvoiceconfig" [D] 13:19:09.320 0x7f254f65cd00 () - module already unpacked: "espeakdata" [D] 13:19:09.321 0x7f254f65cd00 () - default stt model not found: "ar_whisper_medium" [D] 13:19:09.321 0x7f254f65cd00 () - default tts model not found: "ar_piper_jo_kareem_medium" [D] 13:19:09.321 0x7f254f65cd00 () - default mnt lang not found: "en" [D] 13:19:09.321 0x7f254f65cd00 () - new default mnt lang: "en" [D] 13:19:09.321 0x7f254f65cd00 () - service refresh status, new state: busy [D] 13:19:09.321 0x7f254f65cd00 () - service state changed: unknown => busy [D] 13:19:09.321 0x7f254f65cd00 () - delaying features availability [D] 13:19:09.322 0x7f254f65cd00 () - runtime prefix: "/app" [D] 13:19:09.322 0x7f254f65cd00 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal") [D] 13:19:09.322 0x7f254f65cd00 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2") [D] 13:19:09.322 0x7f254f65cd00 () - import paths: ("/usr/lib/qml", "/app/bin", "qrc:/qt-project.org/imports") [D] 13:19:09.322 0x7f254f65cd00 () - library paths: ("/usr/share/runtime/lib/plugins", "/usr/lib/plugins", "/app/bin") [D] 13:19:09.322 0x7f254f65cd00 () - using auto qt style [D] 13:19:09.322 0x7f254f65cd00 () - XDG_CURRENT_DESKTOP: KDE [D] 13:19:09.322 0x7f254f65cd00 () - switching to style: "org.kde.desktop" [D] 13:19:09.323 0x7f25363fd600 () - config version: 65 65 [D] 13:19:09.324 0x7f2535876600 libs_availability:61 - checking: torch cuda [D] 13:19:09.348 0x7f25363fd600 () - models changed [D] 13:19:09.934 0x7f254f65cd00 () - starting app: app-standalone [D] 13:19:09.935 0x7f254f65cd00 () - app service state: unknown => busy logger error: invalid format string qrc:/qml/main.qml:340:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/main.qml:331:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/MainToolBar.qml:282:13: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } [D] 13:19:10.90 0x7f254f65cd00 onCompleted:180 - default font pixel size: 16 [D] 13:19:10.100 0x7f254f65cd00 () - service refresh status, new state: busy [D] 13:19:10.100 0x7f254f65cd00 () - service refresh status, new state: busy [D] 13:19:10.153 0x7f254f65cd00 () - stt models changed [D] 13:19:10.155 0x7f254f65cd00 () - update listen [D] 13:19:10.155 0x7f254f65cd00 () - app stt configured: false => true [D] 13:19:10.156 0x7f254f65cd00 () - app active stt model: "" => "ar_whisper_medium" [D] 13:19:10.157 0x7f254f65cd00 () - update listen [D] 13:19:10.157 0x7f254f65cd00 () - tts models changed [D] 13:19:10.157 0x7f254f65cd00 () - update listen [D] 13:19:10.157 0x7f254f65cd00 () - app tts configured: false => true [D] 13:19:10.157 0x7f254f65cd00 () - app active tts model: "" => "ar_piper_jo_kareem_medium" [D] 13:19:10.158 0x7f254f65cd00 () - update listen [W] 13:19:10.158 0x7f254f65cd00 () - no available tts models for in mnt [W] 13:19:10.158 0x7f254f65cd00 () - no available tts models for out mnt [D] 13:19:10.158 0x7f254f65cd00 () - ttt models changed [D] 13:19:10.161 0x7f254f65cd00 () - mnt langs changed [D] 13:19:10.162 0x7f254f65cd00 () - update listen [D] 13:19:10.162 0x7f254f65cd00 () - app mnt configured: false => true [D] 13:19:10.162 0x7f254f65cd00 () - app active mnt lang: "" => "en" [D] 13:19:10.162 0x7f254f65cd00 () - app mnt available out langs: 0 => 2 [W] 13:19:10.163 0x7f254f65cd00 () - no available tts models for in mnt [D] 13:19:10.163 0x7f254f65cd00 () - app active mnt out lang: "" => "fa" [D] 13:19:10.163 0x7f254f65cd00 () - app tts available models for out mnt: 0 => 1 [D] 13:19:10.164 0x7f254f65cd00 () - app active tts model for out mnt: "" => "fa_espeak_mb_ir1" [D] 13:19:10.692 0x7f2535876600 libs_availability:69 - checking: coqui tts [D] 13:19:10.692 0x7f2535876600 libs_availability:77 - checking: faster-whisper [D] 13:19:10.795 0x7f254f65cd00 () - trying features availability update: false [D] 13:19:11.68 0x7f2535876600 libs_availability:85 - checking: transformers [D] 13:19:11.68 0x7f2535876600 libs_availability:87 - checking: accelerate [D] 13:19:11.461 0x7f2535876600 libs_availability:95 - checking: unikud [D] 13:19:11.461 0x7f2535876600 libs_availability:106 - checking: mimic3 tts [D] 13:19:11.795 0x7f254f65cd00 () - trying features availability update: false [D] 13:19:12.44 0x7f2535876600 libs_availability:114 - checking: gruut [D] 13:19:12.44 0x7f2535876600 libs_availability:118 - checking: gruut-de [D] 13:19:12.44 0x7f2535876600 libs_availability:126 - checking: gruut-es [D] 13:19:12.44 0x7f2535876600 libs_availability:134 - checking: gruut-fr [D] 13:19:12.44 0x7f2535876600 libs_availability:142 - checking: gruut-it [D] 13:19:12.45 0x7f2535876600 libs_availability:150 - checking: gruut-ru [D] 13:19:12.45 0x7f2535876600 libs_availability:158 - checking: gruut-fa [D] 13:19:12.45 0x7f2535876600 libs_availability:166 - checking: gruut-sw [D] 13:19:12.45 0x7f2535876600 libs_availability:174 - checking: gruut-nl [D] 13:19:12.45 0x7f2535876600 libs_availability:185 - checking: mecab [D] 13:19:12.47 0x7f2535876600 libs_availability:187 - checking: unidic-lite [D] 13:19:12.48 0x7f2535876600 libs_availability:194 - py libs availability: [coqui-tts=true, faster-whisper=true, mimic3-tts=true, transformers=true, unikud=true, gruut_de=true, gruut_es=true, gruut_fa=true, gruut_fr=true, gruut_nl=true, gruut_it=true, gruut_ru=true, gruut_sw=true, mecab=true, torch-cuda=true] [D] 13:19:12.795 0x7f254f65cd00 () - trying features availability update: true [D] 13:19:12.795 0x7f254f65cd00 () - features availability ready [W] 13:19:12.796 0x7f254f65cd00 has_lib:477 - failed to open libcudart.so: libcudart.so: cannot open shared object file: No such file or directory [W] 13:19:12.796 0x7f254f65cd00 has_lib:477 - failed to open libcudnn.so: libcudnn.so: cannot open shared object file: No such file or directory [W] 13:19:12.796 0x7f254f65cd00 has_lib:477 - failed to open libcudnn.so.8: libcudnn.so.8: cannot open shared object file: No such file or directory [W] 13:19:12.802 0x7f254f65cd00 has_cuda:56 - failed to open whisper-cublas lib: libwhisper-cublas.so: cannot open shared object file: No such file or directory [D] 13:19:12.811 0x7f254f65cd00 () - updating model using availability [D] 13:19:12.811 0x7f254f65cd00 () - updating model using availability internal [D] 13:19:12.813 0x7f254f65cd00 () - service refresh status, new state: idle [D] 13:19:12.813 0x7f254f65cd00 () - service state changed: busy => idle [D] 13:19:12.813 0x7f254f65cd00 () - scan cuda: false [D] 13:19:12.813 0x7f254f65cd00 () - scan hip: true [D] 13:19:12.813 0x7f254f65cd00 () - scan opencl: true false [D] 13:19:12.813 0x7f254f65cd00 add_hip_devices:318 - scanning for hip devices [D] 13:19:12.817 0x7f254f65cd00 add_hip_devices:327 - hip version: driver=50631062, runtime=50631062 [D] 13:19:12.817 0x7f254f65cd00 add_hip_devices:336 - hip number of devices: 2 [D] 13:19:12.817 0x7f254f65cd00 add_hip_devices:345 - hip device: 0, name=AMD Radeon RX 5500M, gcn-arch=1012, gcn-arch-name=gfx1012:xnack- [D] 13:19:12.817 0x7f254f65cd00 add_hip_devices:345 - hip device: 1, name=AMD Radeon Graphics, gcn-arch=912, gcn-arch-name=gfx90c:xnack- [D] 13:19:12.817 0x7f254f65cd00 () - service refresh status, new state: idle [D] 13:19:12.817 0x7f254f65cd00 () - app service state: busy => idle [W] 13:19:12.820 0x7f254f65cd00 () - no available tts models for in mnt [W] 13:19:12.820 0x7f254f65cd00 () - invalid task, reseting task state [D] 13:19:12.820 0x7f254f65cd00 () - app busy: true => false [D] 13:19:12.821 0x7f254f65cd00 () - stt models changed [D] 13:19:12.821 0x7f254f65cd00 () - update listen [D] 13:19:12.821 0x7f254f65cd00 () - tts models changed [D] 13:19:12.821 0x7f254f65cd00 () - update listen [W] 13:19:12.821 0x7f254f65cd00 () - no available tts models for in mnt [D] 13:19:12.821 0x7f254f65cd00 () - ttt models changed [D] 13:19:12.823 0x7f254f65cd00 () - mnt langs changed [D] 13:19:12.823 0x7f254f65cd00 () - update listen

and when it crashes:

[D] 13:21:32.584 0x7f254f65cd00 () - stt start listen [D] 13:21:32.585 0x7f254f65cd00 () - choosing model for id: "ar_whisper_medium" "fa" [D] 13:21:32.585 0x7f254f65cd00 () - gpu device str: ("ROCm", " 0", " AMD Radeon RX 5500M") [D] 13:21:32.585 0x7f254f65cd00 () - restart stt engine config: "lang=ar, lang_code=, model-files=[model-file=/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_medium.ggml, scorer-file=, ttt-model-file=], speech-mode=single-sentence, vad-mode=aggressiveness-3, speech-started=0, text-format=raw, options=t, use-gpu=1, gpu-device=[id=0, api=rocm, name=AMD Radeon RX 5500M, platform-name=], sub-config=[min-segment-dur=4, min-line-length=30, max-line-length=60]" [D] 13:21:32.585 0x7f254f65cd00 () - new stt engine required [D] 13:21:32.585 0x7f254f65cd00 open_whisper_lib:122 - using whisper-hipblas [D] 13:21:32.587 0x7f254f65cd00 make_wparams:429 - cpu info: arch=x86_64, cores=16 [D] 13:21:32.587 0x7f254f65cd00 make_wparams:431 - using threads: 5/16 [D] 13:21:32.587 0x7f254f65cd00 make_wparams:433 - system info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | [D] 13:21:32.587 0x7f254f65cd00 start:225 - starting engine [D] 13:21:32.587 0x7f254f65cd00 start:234 - engine started [D] 13:21:32.587 0x7f254f65cd00 () - creating audio source [D] 13:21:32.587 0x7f254f65cd00 () - mic source created [D] 13:21:32.587 0x7f2165bfe600 start_processing:271 - processing started [D] 13:21:32.587 0x7f2165bfe600 set_processing_state:457 - processing state: idle => initializing [D] 13:21:32.587 0x7f2165bfe600 set_processing_state:464 - speech detection status: no-speech => initializing (no-speech) [D] 13:21:32.587 0x7f2165bfe600 () - service refresh status, new state: idle [D] 13:21:32.587 0x7f2165bfe600 () - task state changed: 0 => 3 [D] 13:21:32.587 0x7f2165bfe600 create_whisper_model:239 - creating whisper model whisper_init_from_file_with_params_no_state: loading model from '/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_medium.ggml' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1024 whisper_model_load: n_audio_head = 16 whisper_model_load: n_audio_layer = 24 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1024 whisper_model_load: n_text_head = 16 whisper_model_load: n_text_layer = 24 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 9 whisper_model_load: qntvr = 2 whisper_model_load: type = 4 (medium) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 rocBLAS error: Cannot read /app/extensions/amd/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory

mkiol commented 7 months ago

Thank you so much. This is very helpful.

I need to ask you for one more try. I promise this is the last one. Could you please enable override GPU version and collect logs in the same way as previously?

Kentoseth commented 7 months ago

You can ask me for 10 more tries and I will do it. This application is amazing (and has made my life easier) and any way I can make it better, I will do so.

Start logs when override is active:

Qt: Session management error: Could not open network socket [I] 14:10:18.170 0x7f46e7841d00 init:49 - logging to stderr enabled [D] 14:10:18.170 0x7f46e7841d00 () - version: 4.4.0 [D] 14:10:18.170 0x7f46e7841d00 parse_cpuinfo:117 - cpu flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap [D] 14:10:18.171 0x7f46e7841d00 parse_cpuinfo:125 - cpuinfo: processor-count=16, flags=[avx, avx2, fma, f16c, ] [D] 14:10:18.171 0x7f46e7841d00 () - translation: "en_ZA" [W] 14:10:18.171 0x7f46e7841d00 () - failed to install translation [D] 14:10:18.171 0x7f46e7841d00 () - starting standalone app [D] 14:10:18.171 0x7f46e7841d00 () - app: net.mkiol dsnote [D] 14:10:18.171 0x7f46e7841d00 () - config location: "/home/client/.var/app/net.mkiol.SpeechNote/config" [D] 14:10:18.171 0x7f46e7841d00 () - data location: "/home/client/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote" [D] 14:10:18.171 0x7f46e7841d00 () - cache location: "/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote" [D] 14:10:18.171 0x7f46e7841d00 () - settings file: "/home/client/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf" [D] 14:10:18.171 0x7f46e7841d00 () - platform: "xcb" [D] 14:10:18.171 0x7f46e7841d00 () - amd addon exists [D] 14:10:18.171 0x7f46e7841d00 () - enforcing num threads: 0 [D] 14:10:18.335 0x7f46e7841d00 () - supported audio input devices: ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp [D] 14:10:18.337 0x7f46e7841d00 () - "pulse" [D] 14:10:18.339 0x7f46e7841d00 () - "default" ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio) [D] 14:10:18.396 0x7f46e7841d00 () - "alsa_input.pci-0000_07_00.6.HiFihw_acp__source" [D] 14:10:18.396 0x7f46e7841d00 () - "alsa_output.pci-0000_07_00.6.HiFihw_Generic_1sink.monitor" [D] 14:10:18.396 0x7f46e7841d00 () - "alsa_input.pci-0000_07_00.6.HiFihw_Generic_1source" [D] 14:10:18.396 0x7f46e7841d00 () - "alsa_output.pci-0000_03_00.1.HiFihw_HDMI_3__sink.monitor" [D] 14:10:18.415 0x7f46e7841d00 () - starting service: app-standalone [D] 14:10:18.417 0x7f46e7841d00 () - mbrola dir: "/app/bin" [D] 14:10:18.417 0x7f46e7841d00 () - espeak dir: "/app/bin" [D] 14:10:18.417 0x7f46e7841d00 () - overrided gpu version: "10.1.0" [D] 14:10:18.417 0x7f46e7841d00 () - HSA_OVERRIDE_GFX_VERSION: 10.1.0 [D] 14:10:18.417 0x7f46cc957600 loop:75 - py executor loop started [D] 14:10:18.428 0x7f46cd3fb600 () - config version: 65 65 [D] 14:10:18.436 0x7f46e7841d00 () - module already unpacked: "rhvoicedata" [D] 14:10:18.437 0x7f46e7841d00 () - module already unpacked: "rhvoiceconfig" [D] 14:10:18.446 0x7f46cc957600 libs_availability:61 - checking: torch cuda [D] 14:10:18.458 0x7f46e7841d00 () - module already unpacked: "espeakdata" [D] 14:10:18.458 0x7f46e7841d00 () - default stt model not found: "en_whisper_medium" [D] 14:10:18.458 0x7f46e7841d00 () - default tts model not found: "ar_coqui_fairseq_ara" [D] 14:10:18.458 0x7f46e7841d00 () - default mnt lang not found: "en" [D] 14:10:18.458 0x7f46e7841d00 () - new default mnt lang: "en" [D] 14:10:18.458 0x7f46e7841d00 () - service refresh status, new state: busy [D] 14:10:18.458 0x7f46e7841d00 () - service state changed: unknown => busy [D] 14:10:18.458 0x7f46e7841d00 () - delaying features availability [D] 14:10:18.460 0x7f46e7841d00 () - runtime prefix: "/app" [D] 14:10:18.460 0x7f46e7841d00 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal") [D] 14:10:18.460 0x7f46e7841d00 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2") [D] 14:10:18.460 0x7f46e7841d00 () - import paths: ("/usr/lib/qml", "/app/bin", "qrc:/qt-project.org/imports") [D] 14:10:18.460 0x7f46e7841d00 () - library paths: ("/usr/share/runtime/lib/plugins", "/usr/lib/plugins", "/app/bin") [D] 14:10:18.460 0x7f46e7841d00 () - using auto qt style [D] 14:10:18.460 0x7f46e7841d00 () - XDG_CURRENT_DESKTOP: KDE [D] 14:10:18.460 0x7f46e7841d00 () - switching to style: "org.kde.desktop" [D] 14:10:18.510 0x7f46cd3fb600 () - models changed [D] 14:10:19.114 0x7f46e7841d00 () - starting app: app-standalone [D] 14:10:19.115 0x7f46e7841d00 () - app service state: unknown => busy logger error: invalid format string qrc:/qml/main.qml:340:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/main.qml:331:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/MainToolBar.qml:282:13: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } [D] 14:10:19.291 0x7f46e7841d00 onCompleted:180 - default font pixel size: 16 [D] 14:10:19.303 0x7f46e7841d00 () - service refresh status, new state: busy [D] 14:10:19.303 0x7f46e7841d00 () - service refresh status, new state: busy [D] 14:10:19.359 0x7f46e7841d00 () - stt models changed [D] 14:10:19.362 0x7f46e7841d00 () - update listen [D] 14:10:19.362 0x7f46e7841d00 () - app stt configured: false => true [D] 14:10:19.364 0x7f46e7841d00 () - app active stt model: "" => "en_whisper_medium" [D] 14:10:19.364 0x7f46e7841d00 () - update listen [D] 14:10:19.364 0x7f46e7841d00 () - tts models changed [D] 14:10:19.365 0x7f46e7841d00 () - update listen [D] 14:10:19.365 0x7f46e7841d00 () - app tts configured: false => true [D] 14:10:19.365 0x7f46e7841d00 () - app active tts model: "" => "ar_coqui_fairseq_ara" [D] 14:10:19.365 0x7f46e7841d00 () - update listen [W] 14:10:19.365 0x7f46e7841d00 () - no available tts models for in mnt [W] 14:10:19.365 0x7f46e7841d00 () - no available tts models for out mnt [D] 14:10:19.365 0x7f46e7841d00 () - ttt models changed [D] 14:10:19.369 0x7f46e7841d00 () - mnt langs changed [D] 14:10:19.371 0x7f46e7841d00 () - update listen [D] 14:10:19.371 0x7f46e7841d00 () - app mnt configured: false => true [D] 14:10:19.371 0x7f46e7841d00 () - app active mnt lang: "" => "en" [D] 14:10:19.371 0x7f46e7841d00 () - app mnt available out langs: 0 => 2 [W] 14:10:19.372 0x7f46e7841d00 () - no available tts models for in mnt [D] 14:10:19.372 0x7f46e7841d00 () - app active mnt out lang: "" => "fa" [D] 14:10:19.372 0x7f46e7841d00 () - app tts available models for out mnt: 0 => 1 [D] 14:10:19.373 0x7f46e7841d00 () - app active tts model for out mnt: "" => "fa_espeak_mb_ir1" [D] 14:10:19.795 0x7f46e7841d00 () - trying features availability update: false [D] 14:10:20.3 0x7f46cc957600 libs_availability:69 - checking: coqui tts [D] 14:10:20.3 0x7f46cc957600 libs_availability:77 - checking: faster-whisper [D] 14:10:20.395 0x7f46cc957600 libs_availability:85 - checking: transformers [D] 14:10:20.395 0x7f46cc957600 libs_availability:87 - checking: accelerate [D] 14:10:20.795 0x7f46e7841d00 () - trying features availability update: false [D] 14:10:20.811 0x7f46cc957600 libs_availability:95 - checking: unikud [D] 14:10:20.811 0x7f46cc957600 libs_availability:106 - checking: mimic3 tts [D] 14:10:21.421 0x7f46cc957600 libs_availability:114 - checking: gruut [D] 14:10:21.421 0x7f46cc957600 libs_availability:118 - checking: gruut-de [D] 14:10:21.421 0x7f46cc957600 libs_availability:126 - checking: gruut-es [D] 14:10:21.422 0x7f46cc957600 libs_availability:134 - checking: gruut-fr [D] 14:10:21.422 0x7f46cc957600 libs_availability:142 - checking: gruut-it [D] 14:10:21.422 0x7f46cc957600 libs_availability:150 - checking: gruut-ru [D] 14:10:21.422 0x7f46cc957600 libs_availability:158 - checking: gruut-fa [D] 14:10:21.422 0x7f46cc957600 libs_availability:166 - checking: gruut-sw [D] 14:10:21.422 0x7f46cc957600 libs_availability:174 - checking: gruut-nl [D] 14:10:21.423 0x7f46cc957600 libs_availability:185 - checking: mecab [D] 14:10:21.425 0x7f46cc957600 libs_availability:187 - checking: unidic-lite [D] 14:10:21.426 0x7f46cc957600 libs_availability:194 - py libs availability: [coqui-tts=true, faster-whisper=true, mimic3-tts=true, transformers=true, unikud=true, gruut_de=true, gruut_es=true, gruut_fa=true, gruut_fr=true, gruut_nl=true, gruut_it=true, gruut_ru=true, gruut_sw=true, mecab=true, torch-cuda=true] [D] 14:10:21.795 0x7f46e7841d00 () - trying features availability update: true [D] 14:10:21.795 0x7f46e7841d00 () - features availability ready [W] 14:10:21.796 0x7f46e7841d00 has_lib:477 - failed to open libcudart.so: libcudart.so: cannot open shared object file: No such file or directory [W] 14:10:21.796 0x7f46e7841d00 has_lib:477 - failed to open libcudnn.so: libcudnn.so: cannot open shared object file: No such file or directory [W] 14:10:21.796 0x7f46e7841d00 has_lib:477 - failed to open libcudnn.so.8: libcudnn.so.8: cannot open shared object file: No such file or directory [W] 14:10:21.811 0x7f46e7841d00 has_cuda:56 - failed to open whisper-cublas lib: libwhisper-cublas.so: cannot open shared object file: No such file or directory [D] 14:10:21.820 0x7f46e7841d00 () - updating model using availability [D] 14:10:21.820 0x7f46e7841d00 () - updating model using availability internal [D] 14:10:21.822 0x7f46e7841d00 () - service refresh status, new state: idle [D] 14:10:21.822 0x7f46e7841d00 () - service state changed: busy => idle [D] 14:10:21.822 0x7f46e7841d00 () - scan cuda: false [D] 14:10:21.822 0x7f46e7841d00 () - scan hip: false [D] 14:10:21.822 0x7f46e7841d00 () - scan opencl: true false [D] 14:10:21.822 0x7f46e7841d00 add_opencl_devices:357 - scanning for opencl devices [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:374 - opencl number of platforms: 2 [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:399 - opencl platform: 0, name=Clover, vendor=Mesa [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:413 - opencl number of devices: 2 [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:437 - opencl device: 0, platform name=Clover, device name=AMD Radeon RX 5500M (radeonsi, navi14, LLVM 15.0.7, DRM 3.54, 6.6.10-1-MANJARO), types=[GPU, ] [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:437 - opencl device: 1, platform name=Clover, device name=AMD Radeon Graphics (radeonsi, renoir, LLVM 15.0.7, DRM 3.54, 6.6.10-1-MANJARO), types=[GPU, ] [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:399 - opencl platform: 1, name=AMD Accelerated Parallel Processing, vendor=Advanced Micro Devices, Inc. [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:413 - opencl number of devices: 2 [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:437 - opencl device: 0, platform name=AMD Accelerated Parallel Processing, device name=gfx1010:xnack-, types=[GPU, ] [D] 14:10:21.866 0x7f46e7841d00 add_opencl_devices:437 - opencl device: 1, platform name=AMD Accelerated Parallel Processing, device name=gfx1010:xnack-, types=[GPU, ] [D] 14:10:21.866 0x7f46e7841d00 () - service refresh status, new state: idle [D] 14:10:21.867 0x7f46e7841d00 () - app service state: busy => idle [W] 14:10:21.871 0x7f46e7841d00 () - no available tts models for in mnt [W] 14:10:21.871 0x7f46e7841d00 () - invalid task, reseting task state [D] 14:10:21.871 0x7f46e7841d00 () - app busy: true => false [D] 14:10:21.872 0x7f46e7841d00 () - stt models changed [D] 14:10:21.872 0x7f46e7841d00 () - update listen [D] 14:10:21.872 0x7f46e7841d00 () - tts models changed [D] 14:10:21.872 0x7f46e7841d00 () - update listen [W] 14:10:21.872 0x7f46e7841d00 () - no available tts models for in mnt [D] 14:10:21.872 0x7f46e7841d00 () - ttt models changed [D] 14:10:21.875 0x7f46e7841d00 () - mnt langs changed [D] 14:10:21.875 0x7f46e7841d00 () - update listen

Output when doing STT (Arabic):

[D] 14:12:18.839 0x7f46e7841d00 () - app active stt model: "en_whisper_medium" => "ar_whisper_medium" [D] 14:12:18.839 0x7f46e7841d00 () - update listen [D] 14:12:31.310 0x7f46e7841d00 () - stt start listen [D] 14:12:31.311 0x7f46e7841d00 () - choosing model for id: "ar_whisper_medium" "fa" [D] 14:12:31.311 0x7f46e7841d00 () - gpu device str: ("OpenCL", " AMD Accelerated Parallel Processing", " gfx1010:xnack-") [D] 14:12:31.311 0x7f46e7841d00 () - restart stt engine config: "lang=ar, lang_code=, model-files=[model-file=/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_medium.ggml, scorer-file=, ttt-model-file=], speech-mode=single-sentence, vad-mode=aggressiveness-3, speech-started=0, text-format=raw, options=t, use-gpu=1, gpu-device=[id=0, api=opencl, name=gfx1010:xnack-, platform-name=AMD Accelerated Parallel Processing], sub-config=[min-segment-dur=4, min-line-length=30, max-line-length=60]" [D] 14:12:31.311 0x7f46e7841d00 () - new stt engine required [D] 14:12:31.312 0x7f46e7841d00 open_whisper_lib:135 - using whisper-clblast [D] 14:12:31.312 0x7f46e7841d00 make_wparams:429 - cpu info: arch=x86_64, cores=16 [D] 14:12:31.312 0x7f46e7841d00 make_wparams:431 - using threads: 5/16 [D] 14:12:31.312 0x7f46e7841d00 make_wparams:433 - system info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | [D] 14:12:31.312 0x7f46e7841d00 start:225 - starting engine [D] 14:12:31.312 0x7f46e7841d00 start:234 - engine started [D] 14:12:31.312 0x7f46e7841d00 () - creating audio source [D] 14:12:31.312 0x7f46e7841d00 () - mic source created [D] 14:12:31.312 0x7f42da1fe600 start_processing:271 - processing started [D] 14:12:31.312 0x7f42da1fe600 set_processing_state:457 - processing state: idle => initializing [D] 14:12:31.312 0x7f42da1fe600 set_processing_state:464 - speech detection status: no-speech => initializing (no-speech) [D] 14:12:31.312 0x7f42da1fe600 () - service refresh status, new state: idle [D] 14:12:31.312 0x7f42da1fe600 () - task state changed: 0 => 3 [D] 14:12:31.312 0x7f42da1fe600 create_whisper_model:239 - creating whisper model whisper_init_from_file_with_params_no_state: loading model from '/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_medium.ggml' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1024 whisper_model_load: n_audio_head = 16 whisper_model_load: n_audio_layer = 24 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1024 whisper_model_load: n_text_head = 16 whisper_model_load: n_text_layer = 24 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 9 whisper_model_load: qntvr = 2 whisper_model_load: type = 4 (medium) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing' ggml_opencl: selecting device: 'gfx1010:xnack-' ggml_opencl: device FP16 support: true [D] 14:12:31.483 0x7f46e7841d00 () - using audio input: "alsa_input.pci-0000_07_00.6.HiFi__hw_acp__source" [D] 14:12:31.484 0x7f46e7841d00 () - audio state: IdleState [D] 14:12:31.484 0x7f46e7841d00 set_speech_started:515 - speech started: false => true [D] 14:12:31.484 0x7f46e7841d00 set_speech_detection_status:537 - speech detection status: initializing => initializing (speech-detected) [D] 14:12:31.484 0x7f46e7841d00 () - service refresh status, new state: listening-single-sentence [D] 14:12:31.484 0x7f46e7841d00 () - service state changed: idle => listening-single-sentence [W] 14:12:31.485 0x7f46e7841d00 () - ignore TaskStatePropertyChanged signal [D] 14:12:31.485 0x7f46e7841d00 () - app current task: -1 => 0 [D] 14:12:31.485 0x7f46e7841d00 () - app speech state: idle => initializing [D] 14:12:31.485 0x7f46e7841d00 () - app service state: idle => listening-single-sentence [W] 14:12:31.490 0x7f46e7841d00 () - no available tts models for in mnt [D] 14:12:31.674 0x7f46e7841d00 () - mic clear [D] 14:12:31.674 0x7f46e7841d00 () - audio state: ActiveState [D] 14:12:31.865 0x7f46e7841d00 () - mic clear whisper_model_load: CPU buffer size = 586.33 MB [D] 14:12:32.55 0x7f46e7841d00 () - mic clear whisper_model_load: model size = 585.95 MB whisper_init_state: kv self size = 132.12 MB whisper_init_state: kv cross size = 147.46 MB whisper_init_state: compute buffer (conv) = 25.61 MB whisper_init_state: compute buffer (encode) = 170.28 MB whisper_init_state: compute buffer (cross) = 7.85 MB whisper_init_state: compute buffer (decode) = 98.32 MB [D] 14:12:32.222 0x7f42da1fe600 create_whisper_model:249 - whisper model created [D] 14:12:32.222 0x7f42da1fe600 set_processing_state:457 - processing state: initializing => idle [D] 14:12:32.222 0x7f42da1fe600 set_processing_state:464 - speech detection status: initializing => speech-detected (speech-detected) [D] 14:12:32.222 0x7f42da1fe600 () - service refresh status, new state: listening-single-sentence [D] 14:12:32.222 0x7f42da1fe600 () - task state changed: 3 => 1 [D] 14:12:32.222 0x7f46e7841d00 () - app task state: initializing => speech-detected [D] 14:12:33.604 0x7f42da1fe600 process_buff:259 - process samples buf: mode=single-sentence, in-buf size=24000, speech-buf size=0, sof=true, eof=false [D] 14:12:33.628 0x7f42da1fe600 process_buff:279 - vad: speech detected [D] 14:12:35.208 0x7f42da1fe600 process_buff:259 - process samples buf: mode=single-sentence, in-buf size=24000, speech-buf size=24000, sof=false, eof=false [D] 14:12:35.233 0x7f42da1fe600 process_buff:279 - vad: speech detected [D] 14:12:36.604 0x7f42da1fe600 process_buff:259 - process samples buf: mode=single-sentence, in-buf size=24000, speech-buf size=45600, sof=false, eof=false [D] 14:12:36.628 0x7f42da1fe600 process_buff:279 - vad: speech detected [D] 14:12:38.208 0x7f42da1fe600 process_buff:259 - process samples buf: mode=single-sentence, in-buf size=24000, speech-buf size=68640, sof=false, eof=false [D] 14:12:38.233 0x7f42da1fe600 process_buff:294 - vad: no speech [D] 14:12:38.233 0x7f42da1fe600 set_processing_state:457 - processing state: idle => decoding [D] 14:12:38.233 0x7f42da1fe600 set_speech_detection_status:537 - speech detection status: speech-detected => decoding (no-speech) [D] 14:12:38.233 0x7f42da1fe600 () - service refresh status, new state: listening-single-sentence [D] 14:12:38.233 0x7f42da1fe600 () - task state changed: 1 => 2 [D] 14:12:38.233 0x7f42da1fe600 process_buff:362 - speech frame: samples=68640 [D] 14:12:38.233 0x7f42da1fe600 decode_speech:439 - speech decoding started [D] 14:12:38.236 0x7f46e7841d00 () - app task state: speech-detected => processing [D] 14:12:45.269 0x7f42da1fe600 decode_speech:453 - decoded segments: 1 [D] 14:12:45.269 0x7f42da1fe600 decode_speech:501 - speech decoded, stats: samples=68640, duration=7036ms (1.64009) [D] 14:12:45.269 0x7f42da1fe600 set_processing_state:457 - processing state: decoding => idle [D] 14:12:45.269 0x7f42da1fe600 set_processing_state:464 - speech detection status: decoding => no-speech (no-speech) [D] 14:12:45.269 0x7f46e7841d00 () - stt intermediate text decoded: "ar_whisper_medium" 0 [D] 14:12:45.270 0x7f42da1fe600 () - service refresh status, new state: listening-single-sentence [D] 14:12:45.270 0x7f42da1fe600 () - task state changed: 2 => 0 [D] 14:12:45.270 0x7f42da1fe600 flush:473 - flush: eof [D] 14:12:45.270 0x7f42da1fe600 set_speech_started:515 - speech started: true => false [D] 14:12:45.270 0x7f46e7841d00 () - app task state: processing => idle [D] 14:12:45.270 0x7f46e7841d00 () - stt stop listen [D] 14:12:45.270 0x7f46e7841d00 () - stop stt engine gracefully [D] 14:12:45.270 0x7f46e7841d00 () - mic source stop [D] 14:12:45.270 0x7f46e7841d00 () - audio state: SuspendedState [D] 14:12:45.270 0x7f46e7841d00 () - audio ended [D] 14:12:45.271 0x7f46e7841d00 () - stt text decoded: "ar_whisper_medium" 0 [D] 14:12:45.273 0x7f46e7841d00 () - stt intermediate text decoded: *** "ar_whisper_medium" 0 [D] 14:12:45.273 0x7f46e7841d00 () - engine eof [D] 14:12:45.273 0x7f46e7841d00 () - cancel [D] 14:12:45.274 0x7f46e7841d00 () - stop stt engine [D] 14:12:45.274 0x7f46e7841d00 stop:252 - stop requested [D] 14:12:45.274 0x7f46e7841d00 stop_processing_impl:230 - whisper cancel [D] 14:12:45.274 0x7f42da1fe600 flush:473 - flush: exit [D] 14:12:45.274 0x7f42da1fe600 reset_in_processing:383 - reset in processing [D] 14:12:45.274 0x7f42da1fe600 start_processing:306 - processing ended [D] 14:12:45.274 0x7f46e7841d00 stop:267 - stop completed [D] 14:12:45.274 0x7f46e7841d00 () - mic source dtor [D] 14:12:45.292 0x7f46e7841d00 () - service refresh status, new state: idle [D] 14:12:45.292 0x7f46e7841d00 () - service state changed: listening-single-sentence => idle [D] 14:12:45.292 0x7f46e7841d00 () - service refresh status, new state: idle [D] 14:12:45.292 0x7f46e7841d00 () - app current task: 0 => -1 [W] 14:12:45.292 0x7f46e7841d00 () - invalid task, reseting task state [D] 14:12:45.292 0x7f46e7841d00 () - app service state: listening-single-sentence => idle [W] 14:12:45.298 0x7f46e7841d00 () - no available tts models for in mnt [W] 14:12:45.298 0x7f46e7841d00 () - invalid task, reseting task state

mkiol commented 7 months ago

Thanks

It looks that you have disabled "Use AMD ROCm" option. It should be enabled. Please make sure that "Graphics card option" are exacly as below:

image

Kentoseth commented 7 months ago

The app still crashes when all options are ticked as per your screenshot. Here is the start output:

Qt: Session management error: Could not open network socket

[I] 11:33:20.790 0x7f0119e1ad00 init:49 - logging to stderr enabled [D] 11:33:20.790 0x7f0119e1ad00 () - version: 4.4.0 [D] 11:33:20.791 0x7f0119e1ad00 parse_cpuinfo:117 - cpu flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap [D] 11:33:20.791 0x7f0119e1ad00 parse_cpuinfo:125 - cpuinfo: processor-count=16, flags=[avx, avx2, fma, f16c, ] [D] 11:33:20.791 0x7f0119e1ad00 () - translation: "en_ZA" [W] 11:33:20.791 0x7f0119e1ad00 () - failed to install translation [D] 11:33:20.791 0x7f0119e1ad00 () - starting standalone app [D] 11:33:20.792 0x7f0119e1ad00 () - app: net.mkiol dsnote [D] 11:33:20.792 0x7f0119e1ad00 () - config location: "/home/client/.var/app/net.mkiol.SpeechNote/config" [D] 11:33:20.792 0x7f0119e1ad00 () - data location: "/home/client/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote" [D] 11:33:20.792 0x7f0119e1ad00 () - cache location: "/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote" [D] 11:33:20.792 0x7f0119e1ad00 () - settings file: "/home/client/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf" [D] 11:33:20.792 0x7f0119e1ad00 () - platform: "xcb" [D] 11:33:20.792 0x7f0119e1ad00 () - amd addon exists [D] 11:33:20.792 0x7f0119e1ad00 () - enforcing num threads: 0 [D] 11:33:20.960 0x7f0119e1ad00 () - supported audio input devices: ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp [D] 11:33:20.962 0x7f0119e1ad00 () - "pulse" [D] 11:33:20.964 0x7f0119e1ad00 () - "default" ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio) [D] 11:33:21.8 0x7f0119e1ad00 () - "alsa_input.pci-0000_07_00.6.HiFihw_acp__source" [D] 11:33:21.8 0x7f0119e1ad00 () - "alsa_output.pci-0000_07_00.6.HiFihw_Generic_1sink.monitor" [D] 11:33:21.8 0x7f0119e1ad00 () - "alsa_input.pci-0000_07_00.6.HiFihw_Generic_1source" [D] 11:33:21.8 0x7f0119e1ad00 () - "alsa_output.pci-0000_03_00.1.HiFihw_HDMI_3__sink.monitor" [D] 11:33:21.78 0x7f0119e1ad00 () - starting service: app-standalone [D] 11:33:21.79 0x7f0119e1ad00 () - mbrola dir: "/app/bin" [D] 11:33:21.79 0x7f0119e1ad00 () - espeak dir: "/app/bin" [D] 11:33:21.79 0x7f0119e1ad00 () - overrided gpu version: "10.1.0" [D] 11:33:21.79 0x7f0119e1ad00 () - HSA_OVERRIDE_GFX_VERSION: 10.1.0 [D] 11:33:21.79 0x7f0100b12600 loop:75 - py executor loop started [D] 11:33:21.83 0x7f0119e1ad00 () - module already unpacked: "rhvoicedata" [D] 11:33:21.83 0x7f0119e1ad00 () - module already unpacked: "rhvoiceconfig" [D] 11:33:21.86 0x7f0119e1ad00 () - module already unpacked: "espeakdata" [D] 11:33:21.86 0x7f0119e1ad00 () - default stt model not found: "en_whisper_medium" [D] 11:33:21.86 0x7f0119e1ad00 () - default tts model not found: "ar_coqui_fairseq_ara" [D] 11:33:21.86 0x7f0119e1ad00 () - default mnt lang not found: "en" [D] 11:33:21.86 0x7f0119e1ad00 () - new default mnt lang: "en" [D] 11:33:21.86 0x7f0119e1ad00 () - service refresh status, new state: busy [D] 11:33:21.86 0x7f0119e1ad00 () - service state changed: unknown => busy [D] 11:33:21.86 0x7f0119e1ad00 () - delaying features availability [D] 11:33:21.88 0x7f0119e1ad00 () - runtime prefix: "/app" [D] 11:33:21.88 0x7f01013fb600 () - config version: 65 65 [D] 11:33:21.88 0x7f0119e1ad00 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal") [D] 11:33:21.88 0x7f0119e1ad00 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2") [D] 11:33:21.88 0x7f0119e1ad00 () - import paths: ("/usr/lib/qml", "/app/bin", "qrc:/qt-project.org/imports") [D] 11:33:21.88 0x7f0119e1ad00 () - library paths: ("/usr/share/runtime/lib/plugins", "/usr/lib/plugins", "/app/bin") [D] 11:33:21.88 0x7f0119e1ad00 () - using auto qt style [D] 11:33:21.88 0x7f0119e1ad00 () - XDG_CURRENT_DESKTOP: KDE [D] 11:33:21.88 0x7f0119e1ad00 () - switching to style: "org.kde.desktop" [D] 11:33:21.90 0x7f0100b12600 libs_availability:61 - checking: torch cuda [D] 11:33:21.113 0x7f01013fb600 () - models changed [D] 11:33:21.723 0x7f0119e1ad00 () - starting app: app-standalone [D] 11:33:21.723 0x7f0119e1ad00 () - app service state: unknown => busy logger error: invalid format string qrc:/qml/main.qml:340:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/main.qml:331:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/MainToolBar.qml:282:13: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } [D] 11:33:21.891 0x7f0119e1ad00 onCompleted:180 - default font pixel size: 16 [D] 11:33:21.903 0x7f0119e1ad00 () - service refresh status, new state: busy [D] 11:33:21.903 0x7f0119e1ad00 () - service refresh status, new state: busy [D] 11:33:21.959 0x7f0119e1ad00 () - stt models changed [D] 11:33:21.962 0x7f0119e1ad00 () - update listen [D] 11:33:21.962 0x7f0119e1ad00 () - app stt configured: false => true [D] 11:33:21.963 0x7f0119e1ad00 () - app active stt model: "" => "en_whisper_medium" [D] 11:33:21.964 0x7f0119e1ad00 () - update listen [D] 11:33:21.964 0x7f0119e1ad00 () - tts models changed [D] 11:33:21.965 0x7f0119e1ad00 () - update listen [D] 11:33:21.965 0x7f0119e1ad00 () - app tts configured: false => true [D] 11:33:21.965 0x7f0119e1ad00 () - app active tts model: "" => "ar_coqui_fairseq_ara" [D] 11:33:21.965 0x7f0119e1ad00 () - update listen [W] 11:33:21.965 0x7f0119e1ad00 () - no available tts models for in mnt [W] 11:33:21.965 0x7f0119e1ad00 () - no available tts models for out mnt [D] 11:33:21.965 0x7f0119e1ad00 () - ttt models changed [D] 11:33:21.968 0x7f0119e1ad00 () - mnt langs changed [D] 11:33:21.969 0x7f0119e1ad00 () - update listen [D] 11:33:21.969 0x7f0119e1ad00 () - app mnt configured: false => true [D] 11:33:21.970 0x7f0119e1ad00 () - app active mnt lang: "" => "en" [D] 11:33:21.970 0x7f0119e1ad00 () - app mnt available out langs: 0 => 2 [W] 11:33:21.971 0x7f0119e1ad00 () - no available tts models for in mnt [D] 11:33:21.971 0x7f0119e1ad00 () - app active mnt out lang: "" => "fa" [D] 11:33:21.971 0x7f0119e1ad00 () - app tts available models for out mnt: 0 => 1 [D] 11:33:21.972 0x7f0119e1ad00 () - app active tts model for out mnt: "" => "fa_espeak_mb_ir1" [D] 11:33:22.336 0x7f0119e1ad00 () - trying features availability update: false [D] 11:33:22.543 0x7f0100b12600 libs_availability:69 - checking: coqui tts [D] 11:33:22.544 0x7f0100b12600 libs_availability:77 - checking: faster-whisper [D] 11:33:22.940 0x7f0100b12600 libs_availability:85 - checking: transformers [D] 11:33:22.940 0x7f0100b12600 libs_availability:87 - checking: accelerate [D] 11:33:23.336 0x7f0119e1ad00 () - trying features availability update: false [D] 11:33:23.361 0x7f0100b12600 libs_availability:95 - checking: unikud [D] 11:33:23.361 0x7f0100b12600 libs_availability:106 - checking: mimic3 tts [D] 11:33:23.972 0x7f0100b12600 libs_availability:114 - checking: gruut [D] 11:33:23.972 0x7f0100b12600 libs_availability:118 - checking: gruut-de [D] 11:33:23.973 0x7f0100b12600 libs_availability:126 - checking: gruut-es [D] 11:33:23.973 0x7f0100b12600 libs_availability:134 - checking: gruut-fr [D] 11:33:23.973 0x7f0100b12600 libs_availability:142 - checking: gruut-it [D] 11:33:23.973 0x7f0100b12600 libs_availability:150 - checking: gruut-ru [D] 11:33:23.974 0x7f0100b12600 libs_availability:158 - checking: gruut-fa [D] 11:33:23.974 0x7f0100b12600 libs_availability:166 - checking: gruut-sw [D] 11:33:23.974 0x7f0100b12600 libs_availability:174 - checking: gruut-nl [D] 11:33:23.974 0x7f0100b12600 libs_availability:185 - checking: mecab [D] 11:33:23.976 0x7f0100b12600 libs_availability:187 - checking: unidic-lite [D] 11:33:23.977 0x7f0100b12600 libs_availability:194 - py libs availability: [coqui-tts=true, faster-whisper=true, mimic3-tts=true, transformers=true, unikud=true, gruut_de=true, gruut_es=true, gruut_fa=true, gruut_fr=true, gruut_nl=true, gruut_it=true, gruut_ru=true, gruut_sw=true, mecab=true, torch-cuda=true] [D] 11:33:24.336 0x7f0119e1ad00 () - trying features availability update: true [D] 11:33:24.336 0x7f0119e1ad00 () - features availability ready [W] 11:33:24.336 0x7f0119e1ad00 has_lib:477 - failed to open libcudart.so: libcudart.so: cannot open shared object file: No such file or directory [W] 11:33:24.337 0x7f0119e1ad00 has_lib:477 - failed to open libcudnn.so: libcudnn.so: cannot open shared object file: No such file or directory [W] 11:33:24.337 0x7f0119e1ad00 has_lib:477 - failed to open libcudnn.so.8: libcudnn.so.8: cannot open shared object file: No such file or directory [W] 11:33:24.342 0x7f0119e1ad00 has_cuda:56 - failed to open whisper-cublas lib: libwhisper-cublas.so: cannot open shared object file: No such file or directory [D] 11:33:24.350 0x7f0119e1ad00 () - updating model using availability [D] 11:33:24.350 0x7f0119e1ad00 () - updating model using availability internal [D] 11:33:24.352 0x7f0119e1ad00 () - service refresh status, new state: idle [D] 11:33:24.352 0x7f0119e1ad00 () - service state changed: busy => idle [D] 11:33:24.352 0x7f0119e1ad00 () - scan cuda: true [D] 11:33:24.352 0x7f0119e1ad00 () - scan hip: true [D] 11:33:24.352 0x7f0119e1ad00 () - scan opencl: true false [D] 11:33:24.352 0x7f0119e1ad00 add_cuda_devices:281 - scanning for cuda devices [W] 11:33:24.352 0x7f0119e1ad00 cuda_api:168 - failed to open cudart lib: libcudart.so: cannot open shared object file: No such file or directory [D] 11:33:24.352 0x7f0119e1ad00 add_hip_devices:318 - scanning for hip devices [D] 11:33:24.356 0x7f0119e1ad00 add_hip_devices:327 - hip version: driver=50631062, runtime=50631062 [D] 11:33:24.356 0x7f0119e1ad00 add_hip_devices:336 - hip number of devices: 2 [D] 11:33:24.356 0x7f0119e1ad00 add_hip_devices:345 - hip device: 0, name=AMD Radeon RX 5500M, gcn-arch=1010, gcn-arch-name=gfx1010:xnack- [D] 11:33:24.356 0x7f0119e1ad00 add_hip_devices:345 - hip device: 1, name=AMD Radeon Graphics, gcn-arch=1010, gcn-arch-name=gfx1010:xnack- [D] 11:33:24.356 0x7f0119e1ad00 () - service refresh status, new state: idle [D] 11:33:24.356 0x7f0119e1ad00 () - app service state: busy => idle [W] 11:33:24.360 0x7f0119e1ad00 () - no available tts models for in mnt [W] 11:33:24.360 0x7f0119e1ad00 () - invalid task, reseting task state [D] 11:33:24.360 0x7f0119e1ad00 () - app busy: true => false [D] 11:33:24.360 0x7f0119e1ad00 () - stt models changed [D] 11:33:24.360 0x7f0119e1ad00 () - update listen [D] 11:33:24.360 0x7f0119e1ad00 () - tts models changed [D] 11:33:24.360 0x7f0119e1ad00 () - update listen [W] 11:33:24.360 0x7f0119e1ad00 () - no available tts models for in mnt [D] 11:33:24.360 0x7f0119e1ad00 () - ttt models changed [D] 11:33:24.363 0x7f0119e1ad00 () - mnt langs changed [D] 11:33:24.363 0x7f0119e1ad00 () - update listen

and the crash output:

[D] 11:39:09.189 0x7fd15f65ed00 () - stt start listen [D] 11:39:09.190 0x7fd15f65ed00 () - choosing model for id: "en_whisper_medium" "fa" [D] 11:39:09.190 0x7fd15f65ed00 () - gpu device str: ("ROCm", " 0", " AMD Radeon RX 5500M") [D] 11:39:09.190 0x7fd15f65ed00 () - restart stt engine config: "lang=en, lang_code=, model-files=[model-file=/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_whisper_medium.ggml, scorer-file=, ttt-model-file=], speech-mode=single-sentence, vad-mode=aggressiveness-3, speech-started=0, text-format=raw, options=, use-gpu=1, gpu-device=[id=0, api=rocm, name=AMD Radeon RX 5500M, platform-name=], sub-config=[min-segment-dur=4, min-line-length=30, max-line-length=60]" [D] 11:39:09.190 0x7fd15f65ed00 () - new stt engine required [D] 11:39:09.190 0x7fd15f65ed00 open_whisper_lib:122 - using whisper-hipblas [D] 11:39:09.191 0x7fd15f65ed00 make_wparams:429 - cpu info: arch=x86_64, cores=16 [D] 11:39:09.191 0x7fd15f65ed00 make_wparams:431 - using threads: 5/16 [D] 11:39:09.191 0x7fd15f65ed00 make_wparams:433 - system info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | [D] 11:39:09.191 0x7fd15f65ed00 start:225 - starting engine [D] 11:39:09.191 0x7fd15f65ed00 start:234 - engine started [D] 11:39:09.191 0x7fd15f65ed00 () - creating audio source [D] 11:39:09.191 0x7fd15f65ed00 () - mic source created [D] 11:39:09.192 0x7fcd715fe600 start_processing:271 - processing started [D] 11:39:09.192 0x7fcd715fe600 set_processing_state:457 - processing state: idle => initializing [D] 11:39:09.192 0x7fcd715fe600 set_processing_state:464 - speech detection status: no-speech => initializing (no-speech) [D] 11:39:09.192 0x7fcd715fe600 () - service refresh status, new state: idle [D] 11:39:09.192 0x7fcd715fe600 () - task state changed: 0 => 3 [D] 11:39:09.192 0x7fcd715fe600 create_whisper_model:239 - creating whisper model whisper_init_from_file_with_params_no_state: loading model from '/home/client/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_whisper_medium.ggml' whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1024 whisper_model_load: n_audio_head = 16 whisper_model_load: n_audio_layer = 24 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1024 whisper_model_load: n_text_head = 16 whisper_model_load: n_text_layer = 24 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 9 whisper_model_load: qntvr = 2 whisper_model_load: type = 4 (medium) whisper_model_load: adding 1607 extra tokens whisper_model_load: n_langs = 99 rocBLAS error: Cannot read /app/extensions/amd/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory

mkiol commented 7 months ago

Many thanks for checking.

My final conclusion is that with your GPU you can't use AMD ROCm. To be fair, according to AMD docs, your card is not officially supported on ROCm, so this is not very surprising :/

Another option is GPU acceleration with OpenCL. To enable it, please disable "Use AMD ROCm" and "Override GPU version" (like below) and restart the app. After restart make sure that "OpenCL" device is selected in "Graphics card" in "Speech to Text" settings. Most likely "Auto" should automatically select the right card - I think you've already tested it.

image

OpenCL provides quicker STT on all "Whisper" models (not on "Faster Whisper"). It it not as fast as ROCm but still much faster that CPU. TTS does not support OpenCL right now, therefore you don't see "Use GPU acceleration" on "Text to Speech" tab.

hobbesjaap commented 6 months ago

Just found this thread because my Radeon RX 7800 XT kept crashing Speech Note too.

Following the suggestions above (overriding the version did the trick) meant it's working now for me. So ROCm on Manjaro Linux using the RX 7800 XT within Speech Note works.

Not sure if it's helpful, but I gave it some simple Dutch text-to-speech to run with and the log is as follows:

❯ flatpak run net.mkiol.SpeechNote --verbose
QSocketNotifier: Can only be used with threads started with QThread
qt.qpa.qgnomeplatform: Could not find color scheme  ""
[I] 10:49:06.117 0x726489e41d00 init:49 - logging to stderr enabled
[D] 10:49:06.117 0x726489e41d00 () - version: 4.4.0
[D] 10:49:06.117 0x726489e41d00 parse_cpuinfo:117 - cpu flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
[D] 10:49:06.118 0x726489e41d00 parse_cpuinfo:125 - cpuinfo: processor-count=16, flags=[avx, avx2, fma, f16c, ]
[D] 10:49:06.118 0x726489e41d00 () - translation: "en_US"
[W] 10:49:06.118 0x726489e41d00 () - failed to install translation
[D] 10:49:06.118 0x726489e41d00 () - starting standalone app
[D] 10:49:06.118 0x726489e41d00 () - app: net.mkiol dsnote
[D] 10:49:06.118 0x726489e41d00 () - config location: "/home/jaap/.var/app/net.mkiol.SpeechNote/config"
[D] 10:49:06.118 0x726489e41d00 () - data location: "/home/jaap/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote"
[D] 10:49:06.118 0x726489e41d00 () - cache location: "/home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote"
[D] 10:49:06.118 0x726489e41d00 () - settings file: "/home/jaap/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf"
[D] 10:49:06.118 0x726489e41d00 () - platform: "wayland"
[D] 10:49:06.118 0x726489e41d00 () - amd addon exists
[D] 10:49:06.118 0x726489e41d00 () - enforcing num threads: 0
[D] 10:49:06.222 0x726489e41d00 () - supported audio input devices:
ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib ../../pulse/pcm_pulse.c:758:(pulse_prepare) PulseAudio: Unable to create stream: Input/Output error

[D] 10:49:06.404 0x726489e41d00 () - "upmix"
ALSA lib ../../pulse/pcm_pulse.c:758:(pulse_prepare) PulseAudio: Unable to create stream: Input/Output error

ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio)
[D] 10:49:06.412 0x726489e41d00 () - "alsa_input.pci-0000_00_1f.3.analog-stereo"
[D] 10:49:06.412 0x726489e41d00 () - "alsa_output.pci-0000_00_1f.3.iec958-stereo.monitor"
[D] 10:49:06.412 0x726489e41d00 () - "alsa_output.usb-Jieli_Technology_UACDemoV1.0_50346805930D279F-00.iec958-stereo.monitor"
[D] 10:49:06.425 0x726489e41d00 () - starting service: app-standalone
[D] 10:49:06.426 0x726489e41d00 () - mbrola dir: "/app/bin"
[D] 10:49:06.426 0x726489e41d00 () - espeak dir: "/app/bin"
[D] 10:49:06.426 0x726489e41d00 () - overrided gpu version: "11.0.0"
[D] 10:49:06.426 0x726489e41d00 () - HSA_OVERRIDE_GFX_VERSION: 11.0.0
[D] 10:49:06.427 0x726477400600 loop:75 - py executor loop started
[D] 10:49:06.430 0x726489e41d00 () - module already unpacked: "rhvoicedata"
[D] 10:49:06.430 0x726489e41d00 () - module already unpacked: "rhvoiceconfig"
[D] 10:49:06.432 0x726477e00600 () - config version: 65 65
[D] 10:49:06.434 0x726489e41d00 () - module already unpacked: "espeakdata"
[D] 10:49:06.434 0x726489e41d00 () - default stt model not found: "en_vosk_large"
[D] 10:49:06.434 0x726489e41d00 () - default tts model not found: "nl_coqui_css100_vits"
[D] 10:49:06.434 0x726489e41d00 () - default mnt lang not found: "nl"
[D] 10:49:06.434 0x726489e41d00 () - new default mnt lang: "nl"
[D] 10:49:06.434 0x726489e41d00 () - service refresh status, new state: busy
[D] 10:49:06.434 0x726489e41d00 () - service state changed: unknown => busy
[D] 10:49:06.434 0x726489e41d00 () - delaying features availability
[D] 10:49:06.436 0x726489e41d00 () - runtime prefix: "/app"
[D] 10:49:06.436 0x726477400600 libs_availability:61 - checking: torch cuda
[D] 10:49:06.436 0x726489e41d00 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal")
[D] 10:49:06.436 0x726489e41d00 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2")
[D] 10:49:06.436 0x726489e41d00 () - import paths: ("/usr/lib/qml", "/app/bin", "qrc:/qt-project.org/imports")
[D] 10:49:06.436 0x726489e41d00 () - library paths: ("/usr/share/runtime/lib/plugins", "/usr/lib/plugins", "/app/bin")
[D] 10:49:06.436 0x726489e41d00 () - using auto qt style
[D] 10:49:06.436 0x726489e41d00 () - XDG_CURRENT_DESKTOP: GNOME
[D] 10:49:06.436 0x726489e41d00 () - switching to style: "org.kde.breeze"
[D] 10:49:06.454 0x726477e00600 () - models changed
[D] 10:49:07.111 0x726489e41d00 () - starting app: app-standalone
[D] 10:49:07.111 0x726489e41d00 () - app service state: unknown => busy
[W] 10:49:07.112 0x726489e41d00 () - hot keys are supported only under x11
[W] 10:49:07.133 0x726489e41d00 ():36 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.breeze/ScrollView.qml:36:25: QML ScrollBar: Binding loop detected for property "x"
[W] 10:49:07.141 0x726489e41d00 ():36 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.breeze/ScrollView.qml:36:25: QML ScrollBar: Binding loop detected for property "x"
[W] 10:49:07.146 0x726489e41d00 ():36 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.breeze/ScrollView.qml:36:25: QML ScrollBar: Binding loop detected for property "x"
logger error: invalid format string
qrc:/qml/main.qml:340:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
logger error: invalid format string
qrc:/qml/main.qml:331:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
[W] 10:49:07.219 0x726489e41d00 virtual QVariant ModelSource::item(int) const:81 - ModelSource: Invalid role  -1 "color"
[W] 10:49:07.219 0x726489e41d00 virtual QVariant ModelSource::item(int) const:81 - ModelSource: Invalid role  -1 "color"
logger error: invalid format string
qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
logger error: invalid format string
qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
logger error: invalid format string
qrc:/qml/MainToolBar.qml:282:13: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
[W] 10:49:07.252 0x726489e41d00 ():173 - qrc:/qml/MainToolBar.qml:173:21: QML MenuItem: Binding loop detected for property "__reserveSpaceForIcon"
[W] 10:49:07.252 0x726489e41d00 ():173 - qrc:/qml/MainToolBar.qml:173:21: QML MenuItem: Binding loop detected for property "__reserveSpaceForIcon"
[W] 10:49:07.255 0x726489e41d00 ():59 - qrc:/qml/MainToolBar.qml:59:21: QML MenuItem: Binding loop detected for property "__reserveSpaceForIcon"
[W] 10:49:07.255 0x726489e41d00 ():59 - qrc:/qml/MainToolBar.qml:59:21: QML MenuItem: Binding loop detected for property "__reserveSpaceForIcon"
[D] 10:49:07.258 0x726489e41d00 onCompleted:180 - default font pixel size: 14
[D] 10:49:07.270 0x726489e41d00 () - service refresh status, new state: busy
[D] 10:49:07.270 0x726489e41d00 () - service refresh status, new state: busy
[D] 10:49:07.434 0x726489e41d00 () - stt models changed
[D] 10:49:07.436 0x726489e41d00 () - update listen
[D] 10:49:07.436 0x726489e41d00 () - app stt configured: false => true
[D] 10:49:07.437 0x726489e41d00 () - app active stt model: "" => "en_vosk_large"
[D] 10:49:07.437 0x726489e41d00 () - update listen
[D] 10:49:07.437 0x726489e41d00 () - tts models changed
[D] 10:49:07.438 0x726489e41d00 () - update listen
[D] 10:49:07.438 0x726489e41d00 () - app tts configured: false => true
[D] 10:49:07.438 0x726489e41d00 () - app active tts model: "" => "nl_coqui_css100_vits"
[D] 10:49:07.438 0x726489e41d00 () - update listen
[W] 10:49:07.438 0x726489e41d00 () - no available tts models for in mnt
[W] 10:49:07.438 0x726489e41d00 () - no available tts models for out mnt
[D] 10:49:07.438 0x726489e41d00 () - ttt models changed
[D] 10:49:07.438 0x726489e41d00 () - app ttt configured: false => true
[D] 10:49:07.441 0x726489e41d00 () - mnt langs changed
[D] 10:49:07.443 0x726489e41d00 () - update listen
[D] 10:49:07.443 0x726489e41d00 () - app mnt configured: false => true
[D] 10:49:07.443 0x726489e41d00 () - app active mnt lang: "" => "nl"
[D] 10:49:07.443 0x726489e41d00 () - app mnt available out langs: 0 => 1
[D] 10:49:07.443 0x726489e41d00 () - app tts available models for in mnt: 0 => 2
[D] 10:49:07.444 0x726489e41d00 () - app active tts model for in mnt: "" => "nl_coqui_css100_vits"
[D] 10:49:07.444 0x726489e41d00 () - app active mnt out lang: "" => "en"
[D] 10:49:07.444 0x726489e41d00 () - app tts available models for out mnt: 0 => 2
[D] 10:49:07.445 0x726489e41d00 () - app active tts model for out mnt: "" => "en_coqui_fairseq_eng"
[D] 10:49:07.672 0x726489e41d00 () - trying features availability update: false
[D] 10:49:07.842 0x726477400600 libs_availability:69 - checking: coqui tts
[D] 10:49:07.843 0x726477400600 libs_availability:77 - checking: faster-whisper
[D] 10:49:08.191 0x726477400600 libs_availability:85 - checking: transformers
[D] 10:49:08.191 0x726477400600 libs_availability:87 - checking: accelerate
[D] 10:49:08.562 0x726477400600 libs_availability:95 - checking: unikud
[D] 10:49:08.563 0x726477400600 libs_availability:106 - checking: mimic3 tts
[D] 10:49:08.571 0x726489e41d00 () - trying features availability update: false
[D] 10:49:09.113 0x726477400600 libs_availability:114 - checking: gruut
[D] 10:49:09.113 0x726477400600 libs_availability:118 - checking: gruut-de
[D] 10:49:09.114 0x726477400600 libs_availability:126 - checking: gruut-es
[D] 10:49:09.114 0x726477400600 libs_availability:134 - checking: gruut-fr
[D] 10:49:09.114 0x726477400600 libs_availability:142 - checking: gruut-it
[D] 10:49:09.114 0x726477400600 libs_availability:150 - checking: gruut-ru
[D] 10:49:09.114 0x726477400600 libs_availability:158 - checking: gruut-fa
[D] 10:49:09.115 0x726477400600 libs_availability:166 - checking: gruut-sw
[D] 10:49:09.115 0x726477400600 libs_availability:174 - checking: gruut-nl
[D] 10:49:09.115 0x726477400600 libs_availability:185 - checking: mecab
[D] 10:49:09.117 0x726477400600 libs_availability:187 - checking: unidic-lite
[D] 10:49:09.117 0x726477400600 libs_availability:194 - py libs availability: [coqui-tts=true, faster-whisper=true, mimic3-tts=true, transformers=true, unikud=true, gruut_de=true, gruut_es=true, gruut_fa=true, gruut_fr=true, gruut_nl=true, gruut_it=true, gruut_ru=true, gruut_sw=true, mecab=true, torch-cuda=true]
[D] 10:49:09.556 0x726489e41d00 () - trying features availability update: true
[D] 10:49:09.556 0x726489e41d00 () - features availability ready
[W] 10:49:09.556 0x726489e41d00 has_lib:477 - failed to open libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
[W] 10:49:09.557 0x726489e41d00 has_lib:477 - failed to open libcudnn.so: libcudnn.so: cannot open shared object file: No such file or directory
[W] 10:49:09.557 0x726489e41d00 has_lib:477 - failed to open libcudnn.so.8: libcudnn.so.8: cannot open shared object file: No such file or directory
[W] 10:49:09.564 0x726489e41d00 has_cuda:56 - failed to open whisper-cublas lib: libwhisper-cublas.so: cannot open shared object file: No such file or directory
[D] 10:49:09.571 0x726489e41d00 () - updating model using availability
[D] 10:49:09.571 0x726489e41d00 () - updating model using availability internal
[D] 10:49:09.572 0x726489e41d00 () - service refresh status, new state: idle
[D] 10:49:09.572 0x726489e41d00 () - service state changed: busy => idle
[D] 10:49:09.572 0x726489e41d00 () - scan cuda: false
[D] 10:49:09.572 0x726489e41d00 () - scan hip: true
[D] 10:49:09.572 0x726489e41d00 () - scan opencl: true false
[D] 10:49:09.572 0x726489e41d00 add_hip_devices:318 - scanning for hip devices
[D] 10:49:09.574 0x726489e41d00 add_hip_devices:327 - hip version: driver=50631062, runtime=50631062
[D] 10:49:09.574 0x726489e41d00 add_hip_devices:336 - hip number of devices: 1
[D] 10:49:09.574 0x726489e41d00 add_hip_devices:345 - hip device: 0, name=AMD Radeon RX 7800 XT, gcn-arch=1100, gcn-arch-name=gfx1100
[D] 10:49:09.575 0x726489e41d00 () - service refresh status, new state: idle
[D] 10:49:09.588 0x726489e41d00 () - app service state: busy => idle
[W] 10:49:09.593 0x726489e41d00 () - invalid task, reseting task state
[D] 10:49:09.595 0x726489e41d00 () - app busy: true => false
[D] 10:49:09.595 0x726489e41d00 () - stt models changed
[D] 10:49:09.595 0x726489e41d00 () - update listen
[D] 10:49:09.595 0x726489e41d00 () - tts models changed
[D] 10:49:09.595 0x726489e41d00 () - update listen
[D] 10:49:09.595 0x726489e41d00 () - ttt models changed
[D] 10:49:09.599 0x726489e41d00 () - mnt langs changed
[D] 10:49:09.599 0x726489e41d00 () - update listen
[D] 10:49:13.214 0x726489e41d00 () - tts play speech
[D] 10:49:13.215 0x726489e41d00 () - choosing model for id: "nl_coqui_css100_vits" "en"
[D] 10:49:13.215 0x726489e41d00 () - options: QMap(("speech_speed", QVariant(uint, 10))("text_format", QVariant(int, 0)))
[D] 10:49:13.215 0x726489e41d00 () - gpu device str: ("ROCm", " 0", " AMD Radeon RX 7800 XT")
[D] 10:49:13.215 0x726489e41d00 () - restart tts engine config: "lang=nl, speaker=, model-files=[model-path=/home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/nl_coqui_css100_vits, vocoder-path=, diacritizer=], speaker=, ref_voice_file=, text-format=raw, options=nc, lang_code=, share-dir=/app/share, cache-dir=/home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote, data-dir=, speech-speed=10, use-gpu=1, gpu-device=[id=0, api=rocm, name=AMD Radeon RX 7800 XT, platform-name=], audio-format=ogg-opus"
[D] 10:49:13.215 0x726489e41d00 () - new tts engine required
[D] 10:49:13.215 0x726489e41d00 start:183 - tts start
[D] 10:49:13.215 0x726489e41d00 start:194 - tts start completed
[D] 10:49:13.216 0x726295e00600 process:536 - tts prosessing started
[D] 10:49:13.216 0x726489e41d00 encode_speech:269 - task pushed
[D] 10:49:13.216 0x726295e00600 set_state:287 - tts engine state: idle => initializing
[D] 10:49:13.216 0x726295e00600 () - tts engine state changed
[D] 10:49:13.216 0x726295e00600 execute:51 - task pushed
[D] 10:49:13.217 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:13.217 0x726489e41d00 () - service state changed: idle => playing-speech
[D] 10:49:13.217 0x726477400600 operator():176 - model files: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/nl_coqui_css100_vits/model_file.pth.tar /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/nl_coqui_css100_vits/config.json
[D] 10:49:13.217 0x726489e41d00 () - app current task: -1 => 0
[D] 10:49:13.217 0x726477400600 fix_config_file:143 - path replace: /root/.cache/huggingface/hub/models--neongeckocom--tts-vits-css10-nl/snapshots/be7a7c7bee463588626b10777d7fc14ed8c07a3e => /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/nl_coqui_css100_vits
[D] 10:49:13.217 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:13.217 0x726489e41d00 () - task state changed: 0 => 3
[D] 10:49:13.217 0x726489e41d00 () - app service state: idle => playing-speech
[D] 10:49:13.217 0x726477400600 operator():198 - using device: cuda 0
[D] 10:49:13.224 0x726489e41d00 () - app speech state: idle => initializing
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.
 > initialization of language-embedding layers.
[D] 10:49:17.316 0x726477400600 operator():239 - model class name: Vits
[D] 10:49:17.316 0x726477400600 operator():244 - initial length scale: 1
[D] 10:49:17.316 0x726295e00600 set_state:287 - tts engine state: initializing => encoding
[D] 10:49:17.316 0x726295e00600 () - tts engine state changed
[D] 10:49:17.316 0x726295e00600 preprocess:1043 - numbers-to-words pre-processing needed
[D] 10:49:17.316 0x726295e00600 preprocess:1058 - char replace pre-processing needed
[D] 10:49:17.316 0x726295e00600 execute:51 - task pushed
[D] 10:49:17.317 0x726477400600 operator():291 - speed: length_scale=1
 > Text splitted to sentences.
[D] 10:49:17.317 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:17.317 0x726489e41d00 () - task state changed: 3 => 2
[D] 10:49:17.317 0x726489e41d00 () - app task state: initializing => processing
['Fouten maken.']
 > Processing time: 3.6276984214782715
 > Real-time factor: 1.7848703631202225
[D] 10:49:20.948 0x726477400600 operator():336 - voice synthesized successfully
[D] 10:49:20.948 0x726295e00600 compress_internal:1021 - task compress: format=ogg-opus, quality=vbr-high, async=false
[D] 10:49:20.948 0x726295e00600 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/16753004613740131314.opus.wav
[D] 10:49:20.949 0x726295e00600 best_sample_format:148 - sample fmt exact match
[D] 10:49:20.949 0x726295e00600 init_av:686 - sample-rate change: 22050 => 24000
[D] 10:49:20.977 0x726295e00600 decode_frame:1232 - demuxer eof
[D] 10:49:20.977 0x726295e00600 encode_frame:1406 - encoder eof
[D] 10:49:20.977 0x726295e00600 compress_internal:1066 - task compress finished
[D] 10:49:20.977 0x726295e00600 compress:1014 - compress stats: format=ogg-opus, duration=29ms
[D] 10:49:20.978 0x726295e00600 preprocess:1043 - numbers-to-words pre-processing needed
[D] 10:49:20.978 0x726295e00600 preprocess:1058 - char replace pre-processing needed
[D] 10:49:20.978 0x726295e00600 execute:51 - task pushed
[D] 10:49:20.978 0x726477400600 operator():291 - speed: length_scale=1
 > Text splitted to sentences.
['We kunnen er zo bang voor zijn, dat het door kan slaan in een pathologisch probleem waar professionele hulp bij nodig is.']
[D] 10:49:20.985 0x726489e41d00 decompress:923 - task decompress
[D] 10:49:20.985 0x726489e41d00 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/16753004613740131314.opus
[D] 10:49:20.991 0x726489e41d00 decode_frame:1321 - demuxer eof
[D] 10:49:20.991 0x726489e41d00 encode_frame:1406 - encoder eof
[D] 10:49:21.1 0x726489e41d00 () - player new state: QMediaPlayer::PlayingState
[D] 10:49:21.1 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:21.1 0x726489e41d00 () - task state changed: 2 => 4
[D] 10:49:21.1 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:49:21.1 0x726489e41d00 () - app task state: processing => speech-playing
[D] 10:49:23.39 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:23.39 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:23.39 0x726489e41d00 () - task state changed: 4 => 2
[D] 10:49:23.39 0x726489e41d00 () - app task state: speech-playing => processing
[D] 10:49:23.39 0x726489e41d00 () - tts partial speech: *** 0
 > Processing time: 2.0933759212493896
 > Real-time factor: 0.2870866445886969
[D] 10:49:23.85 0x726477400600 operator():336 - voice synthesized successfully
[D] 10:49:23.85 0x726295e00600 compress_internal:1021 - task compress: format=ogg-opus, quality=vbr-high, async=false
[D] 10:49:23.85 0x726295e00600 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/18142420550802533788.opus.wav
[D] 10:49:23.86 0x726295e00600 best_sample_format:148 - sample fmt exact match
[D] 10:49:23.86 0x726295e00600 init_av:686 - sample-rate change: 22050 => 24000
[D] 10:49:23.180 0x726295e00600 decode_frame:1232 - demuxer eof
[D] 10:49:23.180 0x726295e00600 encode_frame:1406 - encoder eof
[D] 10:49:23.180 0x726295e00600 compress_internal:1066 - task compress finished
[D] 10:49:23.180 0x726295e00600 compress:1014 - compress stats: format=ogg-opus, duration=95ms
[D] 10:49:23.181 0x726295e00600 preprocess:1043 - numbers-to-words pre-processing needed
[D] 10:49:23.181 0x726295e00600 preprocess:1058 - char replace pre-processing needed
[D] 10:49:23.181 0x726295e00600 execute:51 - task pushed
[D] 10:49:23.181 0x726477400600 operator():291 - speed: length_scale=1
 > Text splitted to sentences.
['Eén van de bekendste vormen in het onderwijs is waarschijnlijk faalangst.']
[D] 10:49:23.185 0x726489e41d00 decompress:923 - task decompress
[D] 10:49:23.185 0x726489e41d00 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/18142420550802533788.opus
[D] 10:49:23.201 0x726489e41d00 decode_frame:1321 - demuxer eof
[D] 10:49:23.201 0x726489e41d00 encode_frame:1406 - encoder eof
[D] 10:49:23.208 0x726489e41d00 () - player new state: QMediaPlayer::PlayingState
[D] 10:49:23.208 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:23.208 0x726489e41d00 () - task state changed: 2 => 4
[D] 10:49:23.208 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:49:23.224 0x726489e41d00 () - app task state: processing => speech-playing
 > Processing time: 1.8863964080810547
 > Real-time factor: 0.3667216885155457
[D] 10:49:25.76 0x726477400600 operator():336 - voice synthesized successfully
[D] 10:49:25.76 0x726295e00600 compress_internal:1021 - task compress: format=ogg-opus, quality=vbr-high, async=false
[D] 10:49:25.76 0x726295e00600 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/3949185448911910736.opus.wav
[D] 10:49:25.77 0x726295e00600 best_sample_format:148 - sample fmt exact match
[D] 10:49:25.77 0x726295e00600 init_av:686 - sample-rate change: 22050 => 24000
[D] 10:49:25.143 0x726295e00600 decode_frame:1232 - demuxer eof
[D] 10:49:25.143 0x726295e00600 encode_frame:1406 - encoder eof
[D] 10:49:25.143 0x726295e00600 compress_internal:1066 - task compress finished
[D] 10:49:25.143 0x726295e00600 compress:1014 - compress stats: format=ogg-opus, duration=66ms
[D] 10:49:25.144 0x726295e00600 preprocess:1043 - numbers-to-words pre-processing needed
[D] 10:49:25.144 0x726295e00600 preprocess:1058 - char replace pre-processing needed
[D] 10:49:25.144 0x726295e00600 execute:51 - task pushed
[D] 10:49:25.144 0x726477400600 operator():291 - speed: length_scale=1
 > Text splitted to sentences.
['Het is heel mooi dat, met de juiste interventie, faalangst geen levenslang probleem hoeft te zijn.']
 > Processing time: 2.0029165744781494
 > Real-time factor: 0.31249512104638283
[D] 10:49:27.158 0x726477400600 operator():336 - voice synthesized successfully
[D] 10:49:27.158 0x726295e00600 compress_internal:1021 - task compress: format=ogg-opus, quality=vbr-high, async=false
[D] 10:49:27.158 0x726295e00600 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/11914056610857808336.opus.wav
[D] 10:49:27.159 0x726295e00600 best_sample_format:148 - sample fmt exact match
[D] 10:49:27.159 0x726295e00600 init_av:686 - sample-rate change: 22050 => 24000
[D] 10:49:27.242 0x726295e00600 decode_frame:1232 - demuxer eof
[D] 10:49:27.242 0x726295e00600 encode_frame:1406 - encoder eof
[D] 10:49:27.242 0x726295e00600 compress_internal:1066 - task compress finished
[D] 10:49:27.242 0x726295e00600 compress:1014 - compress stats: format=ogg-opus, duration=84ms
[D] 10:49:27.243 0x726295e00600 preprocess:1043 - numbers-to-words pre-processing needed
[D] 10:49:27.243 0x726295e00600 preprocess:1058 - char replace pre-processing needed
[D] 10:49:27.243 0x726295e00600 execute:51 - task pushed
[D] 10:49:27.243 0x726477400600 operator():291 - speed: length_scale=1
 > Text splitted to sentences.
['Het blijft een feit dat het onderwijs en de drang om te presteren (vanuit bijvoorbeeld thuis, de vriendengroep of intrinsiek) dergelijke angsten uit kan lokken en versterken.']
 > Processing time: 2.3569834232330322
 > Real-time factor: 0.2451670148798417
[D] 10:49:29.617 0x726477400600 operator():336 - voice synthesized successfully
[D] 10:49:29.617 0x726295e00600 compress_internal:1021 - task compress: format=ogg-opus, quality=vbr-high, async=false
[D] 10:49:29.617 0x726295e00600 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/9915293361278363284.opus.wav
[D] 10:49:29.618 0x726295e00600 best_sample_format:148 - sample fmt exact match
[D] 10:49:29.618 0x726295e00600 init_av:686 - sample-rate change: 22050 => 24000
[D] 10:49:29.741 0x726295e00600 decode_frame:1232 - demuxer eof
[D] 10:49:29.741 0x726295e00600 encode_frame:1406 - encoder eof
[D] 10:49:29.741 0x726295e00600 compress_internal:1066 - task compress finished
[D] 10:49:29.742 0x726295e00600 compress:1014 - compress stats: format=ogg-opus, duration=124ms
[D] 10:49:29.742 0x726295e00600 preprocess:1043 - numbers-to-words pre-processing needed
[D] 10:49:29.742 0x726295e00600 preprocess:1058 - char replace pre-processing needed
[D] 10:49:29.742 0x726295e00600 execute:51 - task pushed
[D] 10:49:29.742 0x726477400600 operator():291 - speed: length_scale=1
 > Text splitted to sentences.
['Maar fouten maken hoort bij het leven - en van onze grootste fouten leren we soms onze belangrijkste lessen.']
[D] 10:49:30.560 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:30.560 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:30.560 0x726489e41d00 () - task state changed: 4 => 2
[D] 10:49:30.560 0x726489e41d00 decompress:923 - task decompress
[D] 10:49:30.560 0x726489e41d00 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/3949185448911910736.opus
[D] 10:49:30.572 0x726489e41d00 decode_frame:1321 - demuxer eof
[D] 10:49:30.572 0x726489e41d00 encode_frame:1406 - encoder eof
[D] 10:49:30.578 0x726489e41d00 () - app task state: speech-playing => processing
[D] 10:49:30.578 0x726489e41d00 () - player new state: QMediaPlayer::PlayingState
[D] 10:49:30.578 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:30.578 0x726489e41d00 () - task state changed: 2 => 4
[D] 10:49:30.578 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:49:30.578 0x726489e41d00 () - app task state: processing => speech-playing
 > Processing time: 1.951267957687378
 > Real-time factor: 0.30834665224033
[D] 10:49:31.705 0x726477400600 operator():336 - voice synthesized successfully
[D] 10:49:31.705 0x726295e00600 compress_internal:1021 - task compress: format=ogg-opus, quality=vbr-high, async=false
[D] 10:49:31.705 0x726295e00600 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/14390933979132872781.opus.wav
[D] 10:49:31.706 0x726295e00600 best_sample_format:148 - sample fmt exact match
[D] 10:49:31.706 0x726295e00600 init_av:686 - sample-rate change: 22050 => 24000
[D] 10:49:31.788 0x726295e00600 decode_frame:1232 - demuxer eof
[D] 10:49:31.788 0x726295e00600 encode_frame:1406 - encoder eof
[D] 10:49:31.788 0x726295e00600 compress_internal:1066 - task compress finished
[D] 10:49:31.788 0x726295e00600 compress:1014 - compress stats: format=ogg-opus, duration=83ms
[D] 10:49:31.789 0x726295e00600 set_state:287 - tts engine state: encoding => idle
[D] 10:49:31.789 0x726295e00600 () - tts engine state changed
[D] 10:49:31.789 0x726489e41d00 () - service refresh status, new state: playing-speech

** (dsnote:2): WARNING **: 10:49:32.507: atk-bridge: get_device_events_reply: unknown signature
[D] 10:49:35.756 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:35.757 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:35.757 0x726489e41d00 () - task state changed: 4 => 0
[D] 10:49:35.757 0x726489e41d00 decompress:923 - task decompress
[D] 10:49:35.757 0x726489e41d00 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/11914056610857808336.opus
[D] 10:49:35.812 0x726489e41d00 decode_frame:1321 - demuxer eof
[D] 10:49:35.812 0x726489e41d00 encode_frame:1406 - encoder eof
[D] 10:49:35.820 0x726489e41d00 () - app task state: speech-playing => idle
[D] 10:49:35.820 0x726489e41d00 () - player new state: QMediaPlayer::PlayingState
[D] 10:49:35.820 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:35.820 0x726489e41d00 () - task state changed: 0 => 4
[D] 10:49:35.820 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:49:35.820 0x726489e41d00 () - app task state: idle => speech-playing
[D] 10:49:42.272 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:42.273 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:42.273 0x726489e41d00 () - task state changed: 4 => 0
[D] 10:49:42.273 0x726489e41d00 decompress:923 - task decompress
[D] 10:49:42.273 0x726489e41d00 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/9915293361278363284.opus
[D] 10:49:42.330 0x726489e41d00 decode_frame:1321 - demuxer eof
[D] 10:49:42.330 0x726489e41d00 encode_frame:1406 - encoder eof
[D] 10:49:42.337 0x726489e41d00 () - app task state: speech-playing => idle
[D] 10:49:42.338 0x726489e41d00 () - player new state: QMediaPlayer::PlayingState
[D] 10:49:42.338 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:42.338 0x726489e41d00 () - task state changed: 0 => 4
[D] 10:49:42.338 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:49:42.339 0x726489e41d00 () - app task state: idle => speech-playing
[D] 10:49:52.16 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:52.16 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:52.16 0x726489e41d00 () - task state changed: 4 => 0
[D] 10:49:52.17 0x726489e41d00 decompress:923 - task decompress
[D] 10:49:52.17 0x726489e41d00 init_av_in_format:391 - opening file: /home/jaap/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/14390933979132872781.opus
[D] 10:49:52.33 0x726489e41d00 decode_frame:1321 - demuxer eof
[D] 10:49:52.33 0x726489e41d00 encode_frame:1406 - encoder eof
[D] 10:49:52.39 0x726489e41d00 () - app task state: speech-playing => idle
[D] 10:49:52.39 0x726489e41d00 () - player new state: QMediaPlayer::PlayingState
[D] 10:49:52.40 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:52.40 0x726489e41d00 () - task state changed: 0 => 4
[D] 10:49:52.40 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:49:52.40 0x726489e41d00 () - app task state: idle => speech-playing
[D] 10:49:58.400 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:58.401 0x726489e41d00 () - service refresh status, new state: playing-speech
[D] 10:49:58.401 0x726489e41d00 () - task state changed: 4 => 0
[D] 10:49:58.401 0x726489e41d00 () - stop tts engine
[D] 10:49:58.401 0x726489e41d00 request_stop:211 - tts stop requested
[D] 10:49:58.401 0x726489e41d00 () - service refresh status, new state: idle
[D] 10:49:58.401 0x726489e41d00 () - service state changed: playing-speech => idle
[D] 10:49:58.401 0x726489e41d00 () - app task state: speech-playing => idle
[D] 10:49:58.401 0x726489e41d00 () - player new state: QMediaPlayer::PausedState
[D] 10:49:58.401 0x726489e41d00 () - service refresh status, new state: idle
[D] 10:49:58.401 0x726489e41d00 () - app current task: 0 => -1
[W] 10:49:58.401 0x726489e41d00 () - invalid task, reseting task state
[D] 10:49:58.401 0x726489e41d00 () - app service state: playing-speech => idle
[W] 10:49:58.405 0x726489e41d00 () - invalid task, reseting task state
[D] 10:49:58.405 0x726489e41d00 () - player new state: QMediaPlayer::StoppedState
[D] 10:49:58.405 0x726489e41d00 () - service refresh status, new state: idle
[D] 10:49:58.405 0x726489e41d00 () - tts partial speech: *** 0
[D] 10:50:40.275 0x726489e41d00 () - exiting
mkiol commented 6 months ago

@hobbesjaap Thank you for reporting. I'm glad it works! Just curious if the app crashed on STT (Whisper model) without override as well?