mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
580 stars 20 forks source link

dsnot is not stopping STT while still being usabel for STT #70

Closed devSJR closed 9 months ago

devSJR commented 11 months ago

Whenever I invoke dsnot 4.3. (linux, flatpack, cuda (4 GB VRAM)) for STT (German, whisper, large, v2) via shortcuts (happens also from the 'listen' button) it does not really stop listening. It indicates 'busy …'. However, it is still possible to do STT in this condition. Pushing cancel is not possible (greyed out). If I keep it in this state, I often get very long text in the notepad with some text (maybe things I have said while being in the room and talking to others). To really stop it, I have to restart dsnote.

mkiol commented 11 months ago

Thank you for the report.

Would you be able to provide a log when this problem occurs?

You can enable logging with --verbose option. Start the app with the following command:

flatpak run net.mkiol.SpeechNote --verbose
devSJR commented 11 months ago

flatpak run net.mkiol.SpeechNote --verbose Qt: Session management error: Could not open network socket [I] 08:39:38.420 0x7ff9f8e12d00 init:49 - logging to stderr enabled [D] 08:39:38.420 0x7ff9f8e12d00 () - version: 4.3.0 [D] 08:39:38.420 0x7ff9f8e12d00 () - translation: "en_US" [W] 08:39:38.420 0x7ff9f8e12d00 () - failed to install translation [D] 08:39:38.420 0x7ff9f8e12d00 () - starting standalone app [D] 08:39:38.421 0x7ff9f8e12d00 () - app: net.mkiol dsnote [D] 08:39:38.421 0x7ff9f8e12d00 () - config location: "/home/randomuser/.var/app/net.mkiol.SpeechNote/config" [D] 08:39:38.421 0x7ff9f8e12d00 () - data location: "/home/randomuser/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote" [D] 08:39:38.421 0x7ff9f8e12d00 () - cache location: "/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote" [D] 08:39:38.421 0x7ff9f8e12d00 () - settings file: "/home/randomuser/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf" [D] 08:39:38.421 0x7ff9f8e12d00 () - platform: "xcb" [D] 08:39:38.456 0x7ff9f8e12d00 () - supported audio input devices: ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp [D] 08:39:38.464 0x7ff9f8e12d00 () - "pulse" [D] 08:39:38.567 0x7ff9f8e12d00 () - "default" ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio) [D] 08:39:38.568 0x7ff9f8e12d00 () - "alsa_input.pci-0000_00_1f.3.analog-stereo" [D] 08:39:38.568 0x7ff9f8e12d00 () - "alsa_output.pci-0000_00_1f.3.analog-stereo.monitor" [D] 08:39:38.637 0x7ff9f8e12d00 () - starting service: app-standalone [D] 08:39:38.641 0x7ff9f8e12d00 () - mbrola dir: "/app/bin" [D] 08:39:38.641 0x7ff9f8e12d00 () - espeak dir: "/app/bin" [D] 08:39:38.641 0x7ff9ddffe600 loop:56 - py executor loop started [D] 08:39:38.646 0x7ff9f8e12d00 () - module already unpacked: "rhvoicedata" [D] 08:39:38.646 0x7ff9f8e12d00 () - module already unpacked: "rhvoiceconfig" [D] 08:39:38.648 0x7ff9de7ff600 () - config version: 51 51 [D] 08:39:38.649 0x7ff9f8e12d00 () - module already unpacked: "espeakdata" [D] 08:39:38.649 0x7ff9f8e12d00 () - default stt model not found: "de_fasterwhisper_large2" [D] 08:39:38.649 0x7ff9f8e12d00 () - default tts model not found: "en_piper_us_ryan_high" [D] 08:39:38.649 0x7ff9f8e12d00 () - default mnt lang not found: "de" [D] 08:39:38.649 0x7ff9f8e12d00 () - new default mnt lang: "de" [D] 08:39:38.649 0x7ff9f8e12d00 () - service refresh status, new state: busy [D] 08:39:38.649 0x7ff9f8e12d00 () - service state changed: unknown => busy [D] 08:39:38.649 0x7ff9f8e12d00 () - delaying features availability [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [D] 08:39:38.650 0x7ff9f8e12d00 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal") [D] 08:39:38.650 0x7ff9f8e12d00 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2") [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [D] 08:39:38.650 0x7ff9f8e12d00 () - import paths: ("/usr/lib/qml", "/app/bin", "qrc:/qt-project.org/imports") [D] 08:39:38.650 0x7ff9f8e12d00 () - library paths: ("/usr/share/runtime/lib/plugins", "/usr/lib/plugins", "/app/bin") [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [D] 08:39:38.650 0x7ff9f8e12d00 () - switching to style: "org.kde.desktop" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.650 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.651 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.652 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.653 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.654 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.654 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.654 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.654 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [W] 08:39:38.654 0x7ff9de7ff600 () - checksum mismatch: "dfa47af8" (expected: "b4157ea9" ) "multilang_whisper_large.ggml" [D] 08:39:38.663 0x7ff9de7ff600 () - models changed [D] 08:39:39.257 0x7ff9f8e12d00 () - starting app: app-standalone [D] 08:39:39.258 0x7ff9f8e12d00 () - app service state: unknown => busy logger error: invalid format string qrc:/qml/main.qml:269:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/main.qml:260:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } logger error: invalid format string qrc:/qml/MainToolBar.qml:221:13: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo() { ... } [D] 08:39:39.333 0x7ff9f8e12d00 onCompleted:155 - default font pixel size: 14 [D] 08:39:39.341 0x7ff9f8e12d00 () - service refresh status, new state: busy [D] 08:39:39.341 0x7ff9f8e12d00 () - service refresh status, new state: busy [D] 08:39:39.348 0x7ff9f8e12d00 () - trying features availability update: false [D] 08:39:39.413 0x7ff9f8e12d00 () - stt models changed [D] 08:39:39.414 0x7ff9f8e12d00 () - update listen [D] 08:39:39.414 0x7ff9f8e12d00 () - app stt configured: false => true [D] 08:39:39.417 0x7ff9f8e12d00 () - app active stt model: "" => "de_fasterwhisper_large2" [D] 08:39:39.417 0x7ff9f8e12d00 () - update listen [D] 08:39:39.417 0x7ff9f8e12d00 () - tts models changed [D] 08:39:39.417 0x7ff9f8e12d00 () - update listen [D] 08:39:39.417 0x7ff9f8e12d00 () - app tts configured: false => true [D] 08:39:39.417 0x7ff9f8e12d00 () - app active tts model: "" => "en_piper_us_ryan_high" [D] 08:39:39.417 0x7ff9f8e12d00 () - update listen [W] 08:39:39.417 0x7ff9f8e12d00 () - no available tts models for in mnt [W] 08:39:39.417 0x7ff9f8e12d00 () - no available tts models for out mnt [D] 08:39:39.417 0x7ff9f8e12d00 () - ttt models changed [D] 08:39:39.417 0x7ff9f8e12d00 () - app ttt configured: false => true [D] 08:39:39.419 0x7ff9f8e12d00 () - mnt langs changed [D] 08:39:39.420 0x7ff9f8e12d00 () - update listen [D] 08:39:39.420 0x7ff9f8e12d00 () - app mnt configured: false => true [D] 08:39:39.420 0x7ff9f8e12d00 () - app active mnt lang: "" => "de" [D] 08:39:39.420 0x7ff9f8e12d00 () - app mnt available out langs: 0 => 1 [D] 08:39:39.420 0x7ff9f8e12d00 () - app tts available models for in mnt: 0 => 2 [D] 08:39:39.421 0x7ff9f8e12d00 () - app active tts model for in mnt: "" => "de_piper_thorsten_high" [D] 08:39:39.421 0x7ff9f8e12d00 () - app active mnt out lang: "" => "en" [D] 08:39:39.421 0x7ff9f8e12d00 () - app tts available models for out mnt: 0 => 3 [D] 08:39:39.421 0x7ff9f8e12d00 () - app active tts model for out mnt: "" => "en_piper_us_ryan_high" [D] 08:39:40.68 0x7ff9ddffe600 libs_availability:171 - py libs availability: [coqui-tts=true, faster-whisper=true, mimic3-tts=true, transformers=true, unikud=true, gruut_de=true, gruut_es=true, gruut_fa=true, gruut_fr=true, gruut_nl=true, gruut_it=true, gruut_ru=true, gruut_sw=true, mecab=true, torch-cuda=true] [D] 08:39:40.298 0x7ff9f8e12d00 () - trying features availability update: true [D] 08:39:40.298 0x7ff9f8e12d00 () - features availability ready [W] 08:39:40.298 0x7ff9f8e12d00 has_lib:423 - failed to open libcudnn.so: libcudnn.so: Kann die Shared-Object-Datei nicht öffnen: Datei oder Verzeichnis nicht gefunden [W] 08:39:40.317 0x7ff9f8e12d00 has_hip:79 - failed to open whisper-hipblas lib: libwhisper-hipblas.so: Kann die Shared-Object-Datei nicht öffnen: Datei oder Verzeichnis nicht gefunden [D] 08:39:40.325 0x7ff9f8e12d00 () - updating model using availability [D] 08:39:40.325 0x7ff9f8e12d00 () - updating model using availability internal [D] 08:39:40.327 0x7ff9f8e12d00 () - service refresh status, new state: idle [D] 08:39:40.327 0x7ff9f8e12d00 () - service state changed: busy => idle [D] 08:39:40.327 0x7ff9f8e12d00 () - scan cuda: true [D] 08:39:40.327 0x7ff9f8e12d00 () - scan hip: true [D] 08:39:40.327 0x7ff9f8e12d00 () - scan opencl: true false [D] 08:39:40.328 0x7ff9f8e12d00 add_cuda_devices:229 - scanning for cuda devices [D] 08:39:40.328 0x7ff9f8e12d00 add_cuda_devices:238 - cuda version: driver=12020, runtime=11070 [D] 08:39:40.328 0x7ff9f8e12d00 add_cuda_devices:247 - cuda number of devices: 1 [D] 08:39:40.328 0x7ff9f8e12d00 add_cuda_devices:256 - cuda device: 0, name=NVIDIA GeForce RTX 3050 Ti Laptop GPU [D] 08:39:40.328 0x7ff9f8e12d00 add_hip_devices:266 - scanning for hip devices [W] 08:39:40.328 0x7ff9f8e12d00 hip_api:170 - failed to open hip lib: libamdhip64.so: Kann die Shared-Object-Datei nicht öffnen: Datei oder Verzeichnis nicht gefunden [D] 08:39:40.328 0x7ff9f8e12d00 () - service refresh status, new state: idle [D] 08:39:40.328 0x7ff9f8e12d00 () - app service state: busy => idle [W] 08:39:40.333 0x7ff9f8e12d00 () - invalid task, reseting task state [D] 08:39:40.333 0x7ff9f8e12d00 () - app busy: true => false [D] 08:39:40.333 0x7ff9f8e12d00 () - stt models changed [D] 08:39:40.333 0x7ff9f8e12d00 () - update listen [D] 08:39:40.333 0x7ff9f8e12d00 () - tts models changed [D] 08:39:40.333 0x7ff9f8e12d00 () - update listen [D] 08:39:40.333 0x7ff9f8e12d00 () - ttt models changed [D] 08:39:40.335 0x7ff9f8e12d00 () - mnt langs changed [D] 08:39:40.336 0x7ff9f8e12d00 () - update listen [D] 08:39:46.258 0x7ff9f8e12d00 () - hot key activated: start-listening [D] 08:39:46.258 0x7ff9f8e12d00 () - executing action: start-listening [D] 08:39:46.258 0x7ff9f8e12d00 () - stt start listen [D] 08:39:46.261 0x7ff9f8e12d00 () - choosing model for id: "de_fasterwhisper_large2" "en" [D] 08:39:46.261 0x7ff9f8e12d00 () - found ttt model for stt: "de_hftc_kredor" [D] 08:39:46.261 0x7ff9f8e12d00 () - gpu device str: ("CUDA", " 0", " NVIDIA GeForce RTX 3050 Ti Laptop GPU") [D] 08:39:46.261 0x7ff9f8e12d00 () - restart stt engine config: "lang=de, model-files=[model-file=/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_fasterwhisper_large2, scorer-file=, ttt-model-file=/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_hftc_kredor], speech-mode=manual, vad-mode=aggressiveness-3, speech-started=0, options=, use-gpu=1, gpu-device=[id=0, api=cuda, name=NVIDIA GeForce RTX 3050 Ti Laptop GPU, platform-name=]" [D] 08:39:46.261 0x7ff9f8e12d00 () - new stt engine required [D] 08:39:46.271 0x7ff9f8e12d00 start:199 - starting engine [D] 08:39:46.272 0x7ff9f8e12d00 start:207 - engine started [D] 08:39:46.272 0x7ff9f8e12d00 () - creating audio source [D] 08:39:46.272 0x7ff9f8e12d00 () - mic source created [D] 08:39:46.272 0x7ff8caffe600 start_processing:244 - processing started [D] 08:39:46.272 0x7ff8caffe600 set_processing_state:430 - processing state: idle => initializing [D] 08:39:46.272 0x7ff8caffe600 set_processing_state:437 - speech detection status: no-speech => initializing (no-speech) [D] 08:39:46.272 0x7ff8caffe600 () - service refresh status, new state: idle [D] 08:39:46.272 0x7ff8caffe600 () - task state changed: 0 => 3 [D] 08:39:46.272 0x7ff8caffe600 create_model:81 - creating fasterwhisper model [D] 08:39:46.272 0x7ff8caffe600 execute:48 - task pushed [W] 08:39:46.273 0x7ff9ddffe600 has_lib:423 - failed to open libcudnn.so: libcudnn.so: Kann die Shared-Object-Datei nicht öffnen: Datei oder Verzeichnis nicht gefunden [D] 08:39:46.273 0x7ff9ddffe600 operator():97 - cpu info: arch=x86_64, cores=20 [D] 08:39:46.273 0x7ff9ddffe600 operator():100 - using threads: 8/20 [D] 08:39:46.273 0x7ff9ddffe600 operator():103 - using device: cuda 0 [D] 08:39:46.316 0x7ff9f8e12d00 () - using audio input: "alsa_input.pci-0000_00_1f.3.analog-stereo" [D] 08:39:46.409 0x7ff9f8e12d00 () - audio state: IdleState [D] 08:39:46.409 0x7ff9f8e12d00 set_speech_started:486 - speech started: false => true [D] 08:39:46.409 0x7ff9f8e12d00 set_speech_detection_status:508 - speech detection status: initializing => initializing (speech-detected) [D] 08:39:46.409 0x7ff9f8e12d00 () - service refresh status, new state: listening-manual [D] 08:39:46.409 0x7ff9f8e12d00 () - service state changed: idle => listening-manual [W] 08:39:46.409 0x7ff9f8e12d00 () - ignore TaskStatePropertyChanged signal [D] 08:39:46.409 0x7ff9f8e12d00 () - app current task: -1 => 0 [D] 08:39:46.409 0x7ff9f8e12d00 () - app speech state: idle => initializing [D] 08:39:46.410 0x7ff9f8e12d00 () - app service state: idle => listening-manual [D] 08:39:46.650 0x7ff9f8e12d00 () - mic clear [D] 08:39:46.650 0x7ff9f8e12d00 () - audio state: ActiveState [D] 08:39:46.829 0x7ff9f8e12d00 () - mic clear [D] 08:39:47.39 0x7ff9f8e12d00 () - mic clear [D] 08:39:47.249 0x7ff9f8e12d00 () - mic clear [D] 08:39:47.463 0x7ff9f8e12d00 () - mic clear [D] 08:39:47.669 0x7ff9f8e12d00 () - mic clear [D] 08:39:47.741 0x7ff8caffe600 create_model:130 - fasterwhisper model created [D] 08:39:47.741 0x7ff8caffe600 set_processing_state:430 - processing state: initializing => idle [D] 08:39:47.741 0x7ff8caffe600 set_processing_state:437 - speech detection status: initializing => speech-detected (speech-detected) [D] 08:39:47.741 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:39:47.741 0x7ff8caffe600 () - task state changed: 3 => 1 [D] 08:39:47.741 0x7ff9f8e12d00 () - app task state: initializing => speech-detected [D] 08:39:49.311 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=true, eof=false [D] 08:39:49.350 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:50.715 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=24000, sof=false, eof=false [D] 08:39:50.743 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:52.308 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=48000, sof=false, eof=false [D] 08:39:52.337 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:53.712 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=71520, sof=false, eof=false [D] 08:39:53.737 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:55.305 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=98400, sof=false, eof=false [D] 08:39:55.332 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:56.709 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=122400, sof=false, eof=false [D] 08:39:56.738 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:58.313 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=146400, sof=false, eof=false [D] 08:39:58.338 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:39:59.705 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=171360, sof=false, eof=false [D] 08:39:59.733 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:01.309 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=183360, sof=false, eof=false [D] 08:40:01.341 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:02.713 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=212160, sof=false, eof=false [D] 08:40:02.732 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:04.306 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=236160, sof=false, eof=false [D] 08:40:04.326 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:05.709 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=258720, sof=false, eof=false [D] 08:40:05.732 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:07.314 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=282720, sof=false, eof=false [D] 08:40:07.337 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:08.704 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=306720, sof=false, eof=false [D] 08:40:08.727 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:10.310 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=330720, sof=false, eof=false [D] 08:40:10.332 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:10.332 0x7ff8caffe600 set_processing_state:430 - processing state: idle => decoding [D] 08:40:10.332 0x7ff8caffe600 set_speech_detection_status:508 - speech detection status: speech-detected => decoding (no-speech) [D] 08:40:10.332 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:40:10.332 0x7ff8caffe600 () - task state changed: 1 => 2 [D] 08:40:10.332 0x7ff8caffe600 process_buff:231 - speech frame: samples=330720 [D] 08:40:10.332 0x7ff8caffe600 decode_speech:255 - speech decoding started [D] 08:40:10.332 0x7ff8caffe600 execute:48 - task pushed [D] 08:40:10.344 0x7ff9f8e12d00 () - app task state: speech-detected => processing [D] 08:40:13.371 0x7ff8caffe600 decode_speech:316 - speech decoded, stats: samples=330720, duration=3039ms (0.147025) [D] 08:40:13.371 0x7ff8caffe600 set_processing_state:430 - processing state: decoding => idle [D] 08:40:13.371 0x7ff8caffe600 set_processing_state:437 - speech detection status: decoding => no-speech (no-speech) [D] 08:40:13.372 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:40:13.372 0x7ff8caffe600 () - task state changed: 2 => 0 [D] 08:40:13.372 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:13.372 0x7ff8caffe600 set_speech_started:486 - speech started: true => false [D] 08:40:13.373 0x7ff9f8e12d00 () - stt intermediate text decoded: "de_fasterwhisper_large2" 0 [D] 08:40:13.374 0x7ff9f8e12d00 () - app task state: processing => idle [D] 08:40:13.374 0x7ff9f8e12d00 () - stt text decoded: "de_fasterwhisper_large2" 0 [D] 08:40:13.375 0x7ff9f8e12d00 () - stt intermediate text decoded: "de_fasterwhisper_large2" 0 [D] 08:40:13.497 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:13.516 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:13.516 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:13.693 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:13.714 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:13.714 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:14.693 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:14.714 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:14.714 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:16.293 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:16.313 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:16.313 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:17.693 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:17.719 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:17.719 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:19.292 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:19.313 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:19.313 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:20.692 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:20.720 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:20.720 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:22.274 0x7ff9f8e12d00 () - hot key activated: start-listening [D] 08:40:22.274 0x7ff9f8e12d00 () - executing action: start-listening [D] 08:40:22.274 0x7ff9f8e12d00 () - stt start listen [D] 08:40:22.277 0x7ff9f8e12d00 () - choosing model for id: "de_fasterwhisper_large2" "en" [D] 08:40:22.277 0x7ff9f8e12d00 () - found ttt model for stt: "de_hftc_kredor" [D] 08:40:22.277 0x7ff9f8e12d00 () - gpu device str: ("CUDA", " 0", " NVIDIA GeForce RTX 3050 Ti Laptop GPU") [D] 08:40:22.277 0x7ff9f8e12d00 () - restart stt engine config: "lang=de, model-files=[model-file=/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_fasterwhisper_large2, scorer-file=, ttt-model-file=/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_hftc_kredor], speech-mode=manual, vad-mode=aggressiveness-3, speech-started=0, options=, use-gpu=1, gpu-device=[id=0, api=cuda, name=NVIDIA GeForce RTX 3050 Ti Laptop GPU, platform-name=]" [D] 08:40:22.277 0x7ff9f8e12d00 () - new stt engine not required, only restart [D] 08:40:22.277 0x7ff9f8e12d00 stop:225 - stop requested [D] 08:40:22.277 0x7ff9f8e12d00 stop_processing_impl:72 - fasterwhisper cancel [D] 08:40:22.277 0x7ff8caffe600 flush:446 - flush: exit [D] 08:40:22.278 0x7ff8caffe600 reset_in_processing:356 - reset in processing [D] 08:40:22.278 0x7ff8caffe600 start_processing:279 - processing ended [D] 08:40:22.278 0x7ff9f8e12d00 stop:240 - stop completed [D] 08:40:22.278 0x7ff9f8e12d00 start:199 - starting engine [D] 08:40:22.278 0x7ff9f8e12d00 start:207 - engine started [D] 08:40:22.278 0x7ff9f8e12d00 () - creating audio source [D] 08:40:22.278 0x7ff9f8e12d00 () - mic source created [D] 08:40:22.278 0x7ff8caffe600 start_processing:244 - processing started [D] 08:40:22.278 0x7ff8caffe600 set_processing_state:430 - processing state: idle => initializing [D] 08:40:22.278 0x7ff8caffe600 set_processing_state:437 - speech detection status: no-speech => initializing (no-speech) [D] 08:40:22.278 0x7ff8caffe600 () - service refresh status, new state: idle [D] 08:40:22.278 0x7ff8caffe600 () - service state changed: listening-manual => idle [D] 08:40:22.278 0x7ff8caffe600 () - task state changed: 0 => 3 [D] 08:40:22.278 0x7ff8caffe600 set_processing_state:430 - processing state: initializing => idle [D] 08:40:22.278 0x7ff8caffe600 set_processing_state:437 - speech detection status: initializing => no-speech (no-speech) [D] 08:40:22.278 0x7ff8caffe600 () - service refresh status, new state: idle [D] 08:40:22.278 0x7ff8caffe600 () - task state changed: 3 => 0 [D] 08:40:22.321 0x7ff9f8e12d00 () - using audio input: "alsa_input.pci-0000_00_1f.3.analog-stereo" [D] 08:40:22.334 0x7ff9f8e12d00 () - audio state: IdleState [D] 08:40:22.334 0x7ff9f8e12d00 () - mic source dtor [D] 08:40:22.334 0x7ff9f8e12d00 () - audio state: SuspendedState [D] 08:40:22.334 0x7ff9f8e12d00 () - audio ended [D] 08:40:22.335 0x7ff9f8e12d00 () - app service state: listening-manual => idle [D] 08:40:22.337 0x7ff9f8e12d00 () - app current task: 0 => 1 [D] 08:40:22.337 0x7ff9f8e12d00 () - app another app connected: false => true [W] 08:40:22.337 0x7ff9f8e12d00 () - invalid task, reseting task state [D] 08:40:22.337 0x7ff9f8e12d00 () - app busy: false => true [W] 08:40:22.337 0x7ff9f8e12d00 () - ignore TaskStatePropertyChanged signal [W] 08:40:22.337 0x7ff9f8e12d00 () - ignore TaskStatePropertyChanged signal [D] 08:40:22.337 0x7ff9f8e12d00 set_speech_started:486 - speech started: false => true [D] 08:40:22.337 0x7ff9f8e12d00 set_speech_detection_status:508 - speech detection status: no-speech => speech-detected (speech-detected) [D] 08:40:22.337 0x7ff9f8e12d00 () - service refresh status, new state: listening-manual [D] 08:40:22.337 0x7ff9f8e12d00 () - service state changed: idle => listening-manual [D] 08:40:22.337 0x7ff9f8e12d00 () - task state changed: 0 => 1 [D] 08:40:22.337 0x7ff9f8e12d00 () - service refresh status, new state: listening-manual [D] 08:40:22.337 0x7ff9f8e12d00 () - app service state: idle => listening-manual [D] 08:40:22.339 0x7ff9f8e12d00 () - app speech state: idle => speech-detected [D] 08:40:22.542 0x7ff9f8e12d00 () - audio state: ActiveState [D] 08:40:23.911 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=true, eof=false [D] 08:40:23.931 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:25.504 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=21120, sof=false, eof=false [D] 08:40:25.525 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:26.908 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=45120, sof=false, eof=false [D] 08:40:26.930 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:28.312 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=58080, sof=false, eof=false [D] 08:40:28.339 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:28.339 0x7ff8caffe600 set_processing_state:430 - processing state: idle => decoding [D] 08:40:28.339 0x7ff8caffe600 set_speech_detection_status:508 - speech detection status: speech-detected => decoding (no-speech) [D] 08:40:28.339 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:40:28.339 0x7ff8caffe600 () - task state changed: 1 => 2 [D] 08:40:28.339 0x7ff8caffe600 process_buff:231 - speech frame: samples=58080 [D] 08:40:28.339 0x7ff8caffe600 decode_speech:255 - speech decoding started [D] 08:40:28.339 0x7ff8caffe600 execute:48 - task pushed [D] 08:40:28.345 0x7ff9f8e12d00 () - app task state: speech-detected => processing [D] 08:40:29.447 0x7ff8caffe600 decode_speech:316 - speech decoded, stats: samples=58080, duration=1108ms (0.305234) [D] 08:40:29.447 0x7ff8caffe600 set_processing_state:430 - processing state: decoding => idle [D] 08:40:29.447 0x7ff8caffe600 set_processing_state:437 - speech detection status: decoding => no-speech (no-speech) [D] 08:40:29.447 0x7ff9f8e12d00 () - stt intermediate text decoded: "de_fasterwhisper_large2" 1 [D] 08:40:29.447 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:40:29.447 0x7ff8caffe600 () - task state changed: 2 => 0 [D] 08:40:29.447 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:29.447 0x7ff8caffe600 set_speech_started:486 - speech started: true => false [D] 08:40:29.466 0x7ff9f8e12d00 () - app task state: processing => idle [D] 08:40:29.466 0x7ff9f8e12d00 () - stt text decoded: "de_fasterwhisper_large2" 1 [D] 08:40:29.466 0x7ff9f8e12d00 () - stt intermediate text decoded: "de_fasterwhisper_large2" 1 [D] 08:40:29.892 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:29.913 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:29.913 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:31.296 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:31.321 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:31.321 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:32.892 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:32.920 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:32.920 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:34.22 0x7ff9f8e12d00 () - hot key activated: start-listening [D] 08:40:34.22 0x7ff9f8e12d00 () - executing action: start-listening [D] 08:40:34.22 0x7ff9f8e12d00 () - stt start listen [D] 08:40:34.24 0x7ff9f8e12d00 () - choosing model for id: "de_fasterwhisper_large2" "en" [D] 08:40:34.24 0x7ff9f8e12d00 () - found ttt model for stt: "de_hftc_kredor" [D] 08:40:34.24 0x7ff9f8e12d00 () - gpu device str: ("CUDA", " 0", " NVIDIA GeForce RTX 3050 Ti Laptop GPU") [D] 08:40:34.24 0x7ff9f8e12d00 () - restart stt engine config: "lang=de, model-files=[model-file=/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_fasterwhisper_large2, scorer-file=, ttt-model-file=/home/randomuser/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_hftc_kredor], speech-mode=manual, vad-mode=aggressiveness-3, speech-started=0, options=, use-gpu=1, gpu-device=[id=0, api=cuda, name=NVIDIA GeForce RTX 3050 Ti Laptop GPU, platform-name=]" [D] 08:40:34.24 0x7ff9f8e12d00 () - new stt engine not required, only restart [D] 08:40:34.24 0x7ff9f8e12d00 stop:225 - stop requested [D] 08:40:34.24 0x7ff9f8e12d00 stop_processing_impl:72 - fasterwhisper cancel [D] 08:40:34.24 0x7ff8caffe600 flush:446 - flush: exit [D] 08:40:34.24 0x7ff8caffe600 reset_in_processing:356 - reset in processing [D] 08:40:34.24 0x7ff8caffe600 start_processing:279 - processing ended [D] 08:40:34.25 0x7ff9f8e12d00 stop:240 - stop completed [D] 08:40:34.25 0x7ff9f8e12d00 start:199 - starting engine [D] 08:40:34.25 0x7ff9f8e12d00 start:207 - engine started [D] 08:40:34.25 0x7ff9f8e12d00 () - creating audio source [D] 08:40:34.25 0x7ff9f8e12d00 () - mic source created [D] 08:40:34.25 0x7ff8caffe600 start_processing:244 - processing started [D] 08:40:34.25 0x7ff8caffe600 set_processing_state:430 - processing state: idle => initializing [D] 08:40:34.25 0x7ff8caffe600 set_processing_state:437 - speech detection status: no-speech => initializing (no-speech) [D] 08:40:34.25 0x7ff8caffe600 () - service refresh status, new state: idle [D] 08:40:34.25 0x7ff8caffe600 () - service state changed: listening-manual => idle [D] 08:40:34.25 0x7ff8caffe600 () - task state changed: 0 => 3 [D] 08:40:34.25 0x7ff8caffe600 set_processing_state:430 - processing state: initializing => idle [D] 08:40:34.25 0x7ff8caffe600 set_processing_state:437 - speech detection status: initializing => no-speech (no-speech) [D] 08:40:34.25 0x7ff8caffe600 () - service refresh status, new state: idle [D] 08:40:34.25 0x7ff8caffe600 () - task state changed: 3 => 0 [D] 08:40:34.59 0x7ff9f8e12d00 () - using audio input: "alsa_input.pci-0000_00_1f.3.analog-stereo" [D] 08:40:34.67 0x7ff9f8e12d00 () - audio state: IdleState [D] 08:40:34.67 0x7ff9f8e12d00 () - mic source dtor [D] 08:40:34.67 0x7ff9f8e12d00 () - audio state: SuspendedState [D] 08:40:34.67 0x7ff9f8e12d00 () - audio ended [D] 08:40:34.67 0x7ff9f8e12d00 () - app service state: listening-manual => idle [D] 08:40:34.72 0x7ff9f8e12d00 () - app current task: 1 => 2 [D] 08:40:34.72 0x7ff9f8e12d00 () - app another app connected: false => true [W] 08:40:34.72 0x7ff9f8e12d00 () - invalid task, reseting task state [D] 08:40:34.72 0x7ff9f8e12d00 () - app busy: false => true [W] 08:40:34.72 0x7ff9f8e12d00 () - ignore TaskStatePropertyChanged signal [W] 08:40:34.72 0x7ff9f8e12d00 () - ignore TaskStatePropertyChanged signal [D] 08:40:34.79 0x7ff9f8e12d00 set_speech_started:486 - speech started: false => true [D] 08:40:34.79 0x7ff9f8e12d00 set_speech_detection_status:508 - speech detection status: no-speech => speech-detected (speech-detected) [D] 08:40:34.79 0x7ff9f8e12d00 () - service refresh status, new state: listening-manual [D] 08:40:34.79 0x7ff9f8e12d00 () - service state changed: idle => listening-manual [D] 08:40:34.79 0x7ff9f8e12d00 () - task state changed: 0 => 1 [D] 08:40:34.79 0x7ff9f8e12d00 () - service refresh status, new state: listening-manual [D] 08:40:34.79 0x7ff9f8e12d00 () - app service state: idle => listening-manual [D] 08:40:34.81 0x7ff9f8e12d00 () - app speech state: idle => speech-detected [D] 08:40:34.293 0x7ff9f8e12d00 () - audio state: ActiveState [D] 08:40:35.708 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=true, eof=false [D] 08:40:35.736 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:37.112 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=22080, sof=false, eof=false [D] 08:40:37.140 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:38.705 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=37440, sof=false, eof=false [D] 08:40:38.731 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:38.731 0x7ff8caffe600 set_processing_state:430 - processing state: idle => decoding [D] 08:40:38.731 0x7ff8caffe600 set_speech_detection_status:508 - speech detection status: speech-detected => decoding (no-speech) [D] 08:40:38.731 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:40:38.731 0x7ff8caffe600 () - task state changed: 1 => 2 [D] 08:40:38.731 0x7ff8caffe600 process_buff:231 - speech frame: samples=37440 [D] 08:40:38.731 0x7ff8caffe600 decode_speech:255 - speech decoding started [D] 08:40:38.731 0x7ff8caffe600 execute:48 - task pushed [D] 08:40:38.738 0x7ff9f8e12d00 () - app task state: speech-detected => processing [D] 08:40:39.676 0x7ff8caffe600 decode_speech:316 - speech decoded, stats: samples=37440, duration=944ms (0.403419) [D] 08:40:39.676 0x7ff8caffe600 set_processing_state:430 - processing state: decoding => idle [D] 08:40:39.676 0x7ff8caffe600 set_processing_state:437 - speech detection status: decoding => no-speech (no-speech) [D] 08:40:39.676 0x7ff8caffe600 () - service refresh status, new state: listening-manual [D] 08:40:39.676 0x7ff8caffe600 () - task state changed: 2 => 0 [D] 08:40:39.676 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:39.676 0x7ff8caffe600 set_speech_started:486 - speech started: true => false [D] 08:40:39.682 0x7ff9f8e12d00 () - stt intermediate text decoded: "de_fasterwhisper_large2" 2 [D] 08:40:39.682 0x7ff9f8e12d00 () - app task state: processing => idle [D] 08:40:39.682 0x7ff9f8e12d00 () - stt text decoded: "de_fasterwhisper_large2" 2 [D] 08:40:39.683 0x7ff9f8e12d00 () - stt intermediate text decoded: *** "de_fasterwhisper_large2" 2 [D] 08:40:40.96 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:40.122 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:40.122 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:41.692 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:41.716 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:41.716 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:43.93 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:43.121 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:43.121 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:44.697 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:44.723 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:44.723 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:46.93 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:46.113 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:46.113 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:47.693 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=0, sof=false, eof=false [D] 08:40:47.729 0x7ff8caffe600 process_buff:161 - vad: speech detected [D] 08:40:47.729 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:49.98 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:49.122 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:49.122 0x7ff8caffe600 flush:446 - flush: regular [W] 08:40:49.132 0x7ff9f8e12d00 ():171 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.desktop/private/TextFieldContextMenu.qml:171:9: Unable to assign [undefined] to bool [W] 08:40:49.132 0x7ff9f8e12d00 ():160 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.desktop/private/TextFieldContextMenu.qml:160:9: Unable to assign [undefined] to bool [W] 08:40:49.132 0x7ff9f8e12d00 ():171 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.desktop/private/TextFieldContextMenu.qml:171:9: Unable to assign [undefined] to bool [W] 08:40:49.134 0x7ff9f8e12d00 ():79 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.desktop/MenuItem.qml:79:13: QML Shortcut: Shortcut: Only binding to one of multiple key bindings associated with 8. Use 'sequences: [ ]' to bind to all of them. [W] 08:40:49.134 0x7ff9f8e12d00 ():79 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.desktop/MenuItem.qml:79:13: QML Shortcut: Shortcut: Only binding to one of multiple key bindings associated with 9. Use 'sequences: [ ]' to bind to all of them. [W] 08:40:49.135 0x7ff9f8e12d00 ():79 - file:///usr/lib/qml/QtQuick/Controls.2/org.kde.desktop/MenuItem.qml:79:13: QML Shortcut: Shortcut: Only binding to one of multiple key bindings associated with 10. Use 'sequences: [ ]' to bind to all of them. [D] 08:40:50.693 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:50.719 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:50.719 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:52.94 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:52.124 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:52.124 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:53.698 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:53.730 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:53.730 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:55.93 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:55.118 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:55.118 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:56.695 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:56.720 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:56.720 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:58.99 0x7ff8caffe600 process_buff:140 - process samples buf: mode=manual, in-buf size=24000, speech-buf size=12000, sof=false, eof=false [D] 08:40:58.131 0x7ff8caffe600 process_buff:172 - vad: no speech [D] 08:40:58.131 0x7ff8caffe600 flush:446 - flush: regular [D] 08:40:58.743 0x7ff9f8e12d00 () - exiting

devSJR commented 11 months ago

Thank you for the report.

Would you be able to provide a log when this problem occurs?

You can enable logging with --verbose option. Start the app with the following command:

flatpak run net.mkiol.SpeechNote --verbose

Methinks, the issue starts as soon I invoke the STT a second time via the shortcut (shift+ctrl+alt+L) The issue does not occur when I use the listen button.

mkiol commented 11 months ago

@devSJR Many thanks for the log and for catching this bug.

The problem occurred because you are using Press and hold option in Listening mode. This listening mode is not correctly handled with keyboard shortcuts.

Fix: 4258fa0d52fa8016c9309aaa36ce4fe539695ce7

devSJR commented 11 months ago

I just tested it. Am I right that this is not yet in beta 4.4.0?

mkiol commented 11 months ago

Am I right that this is not yet in beta 4.4.0?

Indeed, not yet. I will try to push new beta next week.

PS: New beta might be delayed because I split the Flatpak package into 3 smaller sub-packages (Add-ons). The base one, NVIDIA-only and AMD-only. I need to request a separate Flathub repositories for all add-ons which might be complicated. Thanks to this modular approach the main package will be much more smaller.

mkiol commented 9 months ago

Fix: 4258fa0d52fa8016c9309aaa36ce4fe539695ce7

Fix is included in v4.4.0.