mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
579 stars 20 forks source link

Text-only Option for text-to-speech-to-active-window to disable special key press like CTRL #149

Closed dengesCU closed 3 months ago

dengesCU commented 4 months ago

When I use text-to-speech to active window and say Ctrl-S, it saves the current document. While this might be a nice to have feature, it has led to rather random behavior for me a few times as random key combinations were pressed. Thus, I cannot safely use the text-to-speech-to-active window option. Therefore, it would be nice to have an option to disable this.

PS: Thanks for the great and easy-to-use app (and apologies in case I missed this option somewhere) edit: For now, the text-to-speech-to clipboard seems to be a safe alternative.

mkiol commented 4 months ago

Hi. Thank you for reporting.

You can disable any key binding by setting an empty value in the key combination field. As in the screenshot below. I hope this solves the problem.

image

dengesCU commented 4 months ago

Thanks for your reply. My problem is that when I use the text to active window option, that instead of sending text to the active window, sometimes random keyboard shortcuts are triggered. So what I am looking for is an option that text to active window is limited to actual text, meaning it should not trigger Ctrl, Meta, Alt etc. As I mentioned, for now, the text to clipboard function has solved this issue for me, and an extra plus is that it works on Wayland as well.

mkiol commented 4 months ago

text to active window is limited to actual text, meaning it should not trigger Ctrl, Meta, Alt etc

That is super weird. Most likely a bug. What exactly should I do to replicate this problem?

dengesCU commented 4 months ago

For me, opening a file in text editor, changing it, and saying "Control S" after invoking text-to-speech-to-active-window will save the file. Is this the same for you? And when doing longer sentences, I sometimes get random behavior as part of the text is understood as a sequence of key combinations

mkiol commented 4 months ago

Sorry for replying so late.

Now I understand what the problem is! It looks like a pretty amazing bug. I don't know yet if this is a bug or a feature ;-)

Unfortunately, I could not reproduce this problem on my computer, but I believe it works this way. It is possible that this is a matter of the model you are using or application. Con you can provide more details? I mean exactly what model you use and what application there is a problem with.

dengesCU commented 4 months ago

No worries. Thanks for getting back to me and no need to apologize. Thanks for the great work.

Yeah, if this worked reliably, I would definitely call it a feature, not a bug :)

I'm using the FasterWhisper Large-v3 English as model, and I ran into this problem with VS Code. I am on the latest Flatpak version of SpeechNote on Ubuntu 20.04. I just tried it again and started doubting myself because numerous times in a row I got Control S as an output but eventually the Save dialog opened, so it seems a little bit a game of chance whether the model returns Ctrl or Control. My initial encounter with this issue was that VSCode would (sometimes) do all kind of things when I entered a longer sentence with text-to-speech in which at some point something was recognized as some key combination. I don't know whether this might also affect it, but I am using KeyD (https://github.com/rvaiya/keyd) for sticky keys.

mkiol commented 4 months ago

Thanks for the additional information.

I was able to reproduce this with VSCode, but it is not specific to VSCode. The problem is not the key combinations, but where the focus is when key events are received by VSCode. When text file is focused and you press left ALT (e.g. as a part of Ctrl+Shift+Alt+K shotcut) the focus is changed to the "File" menu in VSCode. When the "File" menu is in focus, the "S" key triggers the "Save" action. I can reproduce this with other editors as well, manually changing the focus to the "File" menu before the voice processing is complete. So it looks like the problem is not "Ctrl S" or "Control S", but simply "S" or any other letter that triggers the action.

I'm looking for a possible fix or workaround... but it's possible that I won't be able to fix this.

dengesCU commented 3 months ago

Interesting. I didn't think about this. Thank you for the analysis. That makes a lot of sense now.

Thankfully, my computer is actually slow enough that I can manually put the focus into the right place before the text is pasted. I'll keep using this for a now and report back if I find any issues. As for a possible workaround, this sounds like a tough one. Since this is an Xorg only problem, sounds fair if if this remains open, to be honest. Quick googling found me this: https://lists.freedesktop.org/archives/xorg/2009-March/044296.html I know it's very old, but the logic here might be similar to on-screen keyboard logic. However, even if you can detect that something non-writable is selected, I don't know what the right thing to do with the text would be in this case.

Should I close this issue?

mkiol commented 3 months ago

Quick googling found me this: https://lists.freedesktop.org/archives/xorg/2009-March/044296.html I know it's very old, but the logic here might be similar to on-screen keyboard logic.

I checked how it works. There is a special API provided by DE (e.g. KDE Plasma, GNOME) that allows you to track where the user clicks on the screen. Interesting... but I also don't know what for this can be used in Speech Note 🤔 . It's certainly good to know that such a thing exists.

Should I close this issue?

I think so. Thanks for reporting this issue and don't hesitate to report more bugs if you see them.

dengesCU commented 3 months ago

Okay, I will close the issue now and truly I really appreciate all the work that you're doing.