mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
484 stars 19 forks source link

Use Dbus for Desktop Integration #59

Open nerumo opened 10 months ago

nerumo commented 10 months ago

For global desktop integration under wayland and x11, it would be great to use dbus hooks for activation of the speech input (or even showing a small UI)

I checked on KDE and there's a org.kde.kwin.VirtualKeyboard dbus interface, which gets triggered as soon as an input gets the focus. So the actual wayland/x11 hooks are implemented by a compositor.

image

image

Unfortunately there's no generic XDG dbus for this purpose, but I'm sure that the other DE's also implement something similar

mkiol commented 10 months ago

Interesting indeed. For some reason I don't see this interface in my system (Plasma X11). Maybe it is available only on Wayland? Or has to be activated somehow.

Let's assume it is available and can be used. This API provides "trigger" part. To have full functionality, you also need an API that allows you to insert text into the selected window. On X11, I can use hack with key events but on Wayland I have no idea.

This problem is definitely worth investigation.

nerumo commented 10 months ago

I tried to find something, but I couldn't find a proper (searchable) documentation on dbus interfaces. Then I asked myself, how virtual keyboards are solving this and I stumbled upon the malijt project, which implements a wayland capable virtual keyboard.

https://github.com/search?q=repo%3Amaliit%2Fkeyboard%20wayland&type=code

Further I stumbled upon the IBus Wayland, which offers the following: image

This can be activated on Plasma. But still, I have no clue on how these things work. malijt doesn't seam to work with dbus, probably they have a better way on integrating

mkiol commented 10 months ago

As far as I known, Maliit is used primary in Plasma Mobile. It is a virtual keyboard implementation, so it is not needed when you have hardware keyboard.

danboid commented 6 months ago

I found out about speech note today. I'm impressed by how well it works but, like @nerumo, I came here to request some better desktop integation. It would be much nicer if we could use it within any X11 or Wayland app without having to use copy and paste, the lack of this feature puts me off using it as much as I might otherwise.

mkiol commented 6 months ago

I understand the need, but it is difficult :/

What is the main use case you are looking for? Speech-to-text, Text-to-speech?

Some things are already possible thanks to "actions" (see Accessibility tab in the settings). Following is supported:

danboid commented 6 months ago

Speech-to-text directly to any focused window (works only in X11)

This is what I was wanting to do. I didn't think this was supported because there's no mention of this fuctionality in the README.

How do I use this feature (under X11)?

mkiol commented 6 months ago

Firstly, you need to enable "Use global keyboard shortcuts" in the settings.

image

The default shortcut to "Start listening, text to active window" is Ctrl+Alt+Shift+K. You can change it, but try to choose something that doesn't conflict with system-level keyboard shortcuts.

Put the cursor on a text field in any window (e.g Firefox address bar). Press Ctrl+Alt+Shift+K and say something. Wait for "silence detection" or press Ctrl+Alt+Shift+S. The recognized text will be inserted into the active text field.

Few remarks:

@danboid If you test it out, I'd love to hear your opinion. Is this feature even useful?

danboid commented 6 months ago

Thanks for explaining how to use this feature @mkiol! Yes, its very useful but isn't 100% working yet as you say.

I have tested it with the MATE desktop under Debian. It works with Kate and Firefox (gmail) but I was unable to get it to work with the MATE and XFCE terminals or Libreoffice Writer.

Ctrl+Alt+Shift+S to stop listening also doesn't work for me, I just have to shut up and wait for the silence detection to kick in. If I choose "Stop listening" from the SN system tray context menu then this seems to break the text entry to current window feature working for the current instance/job.

Under MATE, a mic icon appears in the desktop panel when its recording, even when the Speech Note "Use system tray icon" option isn't enabled - this is good.

It would be handy if SN displayed an hour glass or something similar in the system tray after you stop recording during processing to last recorded clip.

How long is it "safe" to listen/record for and still expect SN to work? Does SN display a system tray warning icon to let the user know when they are near to reaching the time limit for a recording?

I would advertise / document this feature in the README if I was you so other don't miss it.

Thanks