zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
https://zed.dev
Other
46.92k stars 2.69k forks source link

Dictation for Assistant (inline and chat-based) #16410

Open mknw opened 4 weeks ago

mknw commented 4 weeks ago

Check for existing issues

Describe the feature

I started using the inline assistant tool more and more, for both longer and shorter tasks. There are a lot of advantages in it.

I would like to be able to dictate to the agent, while seeing what I type and being able to approve/edit the input before submitting the instructions to the LLM model.

This could be visualised in the inline assistant element (⌃ctrl+↩return) as a microphone icon next to the "configure" icon to the left. It would have the benefit to shorten the time to give directions to the assistant, while providing one with the ability to adjust the input to correct variable naming and similar mistakes.

Additionally, I would suggest adding keyboard shortcuts to:

  1. start inline assistant dictation (starting the inline assistant tool as it work right now, but with the microphone already turned on for dictation); and
  2. approve the input (in the case the dictation is correct), without having to use the trackpad/mouse. But for this purpose, the original enter could get the job done. Perhaps an additional key could help in ensuring that the dictation is not submitted mistakenly (eg. shift+enter).

Nice to have

It would be nice if dictation itself was aware of the context (instead of only submitting the context while submitting the prompt). This would make it easier for the Voice-to-Text model to pick up variable names and other code specific keywords.

notpeter commented 3 weeks ago

Very interesting use!

If you are running on MacOS you can do some of this today with the built-in speech to text. Make sure dictation is enabled under System Preferences -> Keyboard -> Dictation (toggle to on) and set a dictation keyboard shortcut if desired (defaults to Fn+F5).

Then just trigger the diction shortcut (Fn+F5) and talk into the assistant. You can freely edit before submitting to the model.

A more integrated approach would likely leverage one or more of the following: