Dictation support for visual studio code

JoleCameron commented 6 years ago

Hi,

I wish to lodge a request to have VS Code updated so that it can accept dictation input. Currently, if you try to dictate into VS Code using software like Dragon (the industry standard), nothing happens.

This is important to fix for people like myself who have long term hand injuries and are trying to figure out ways to program by voice. People have managed programming by voice in these situations, but the solutions are difficult to develop and not pretty.

To be clear, I'm not asking that you develop voice commands to input symbols by voice, only that the text boxes in VS Code (and/or Visual Studio) can accept dictation input by Dragon (preferably with full 'select-and-say' support). Voice programmers can take care of the rest.

Does it have to be Dragon? Not necessarily. It could be any local speech recognition engine with good accuracy (I'd argue that the decade old Windows Voice Recognition isn't quite there yet) and the ability to write custom voice commands.

While there are few people using such technologies today, it is a subject of interest to all programmers, because they may need it in the future.

Jole

cleidigh commented 6 years ago

@JoleCameron I am a Code user and contributor. I am also a Dragon user as I have ALS and can only program by voice. For over a year now I've been doing contributions all the while using and programming in Code. Some of my contributions have to do with improving accessibility. That said, I have been able to set up a pretty usable scenario. As a longtime programer I had the ability to put together this set up, I'm sure you can do the same and apologies if I'm telling you anything you already know:

Windows7 Dragon vDPI 14 (Dragon 15 has limitations, albeit a better recognition) SpeechMatic directional high-performance microphone with USB-AGC, around the neck twist type (Critical !) Natlink (open source Python framework and API for Dragon) Dragonfly (Python grammar and rules engine allowing flexible custom commands) AutoHotKey Custom Python grammars for Code Various user contribution grammars

The key to making this all work well is to have grammars that can seamlessly enter text either in the main editors or input boxes. Also I have command set up for almost all of the common Code keyBindings Utilizing everything possible to avoid using the mouse makes everything faster.

I do all of the above with no changes to Code.

I would be happy to walk you through what I have. It probably would be a good start to understand better what you have now and how you use it.

Cheers

Update: from your repositories I see you use Vocala and therefore Natlink - you already have most of what you need. (And now I know I was telling you things you already knew :-(

JoleCameron commented 6 years ago

@cleidigh

Thanks for your prompt response. May I start by saying that it's nice to talk about this problem with someone who themselves programs by voice.

My journey towards hands-free programming developed a little differently than yours. In my case, I developed a severe case of RSI when typing up my Honours thesis in late 2013. In order to finish my mathematics thesis, I developed basic macros using Vocola 2.

I was an inexperienced self-taught programmer before developing this injury, so didn't want to start developing a system to program by voice until my hands could do a little bit of typing to write the commands. Between that, full-time work in a different industry and a couple of years of poor health, and have only returned to my goal to set up programming by voice now.

For the PC, I have licenses for Dragon 12.5 Preferred and Dragon 15. As you know, Natlink does not work with Dragon 15 and, given it never had official support, may not work with future versions. Since the compatibility issues have not been resolved in the year since Dragon 15 was released, I have no reason to believe that they will be resolved in the future. Because of this, I will develop a system using inbuilt DVC commands. Note that it is actually possible to write DVC commands like camel , provided that the open-ended variable as at the end of the command.

Back to the main point: there are a few reasons why I think that Code need Select-and-Say capabilities.

While I admit that it's possible to program effectively by using commands to emulate keystrokes, this solution has its problems. First, it cuts the user off from being able to use "correct that" to improve recognition with time. Depending on the particular person, this can be more or less important. Second, it makes the learning curve steeper than necessary, because if a person was able to start by developing voice commands for some things, and they could use ordinary dictation to write the rest of their code, albeit slowly. What we don't see is the number of programmers who lose the ability to use a keyboard and mouse and then a force to change careers.
Your solution is impractical for markup languages like LaTeX. You may well be aware that LaTeX is the industry standard for scientific and mathematical publication. These documents are a mix of prose, and encoding for equations and pictures. It is impractical to dictate ordinary prose using only custom commands, so you need Select-and-Say, but you also need an editor with sufficient power to efficiently navigate the various symbols by voice.
Finally, Visual Studio and VS Code fail to live up to Microsoft's own standards for disability access. And if Microsoft fails to live up to its own accessibility standards, what do you think everyone else will do? I note that you're using Windows 7. I'm using Windows 10. Windows 10 is actually less accessible by speech than Windows 7. For example, consider Edge in Windows 10 vs Internet Explorer in Windows 7 (Internet Explorer on Windows 10 is too unstable to use).

Anyway, I hope this helps explain why I think that VS Code should incorporate this change.

Cheers

cleidigh commented 6 years ago

@JoleCameron Thanks for the detailed response.

First let me say that despite the fact that we have arrived at where we are from slightly different paths, one thing I think is probably very common; everyone starts off frustrated using voice control / dictation for programming. I was very reluctant to use Dragon in the beginning given its peculiarities , limitations. Necessity changed that and I went crazy to try to make the best of it. I think there are some objective realities that one should start with:

System Requirements: a fast system, 16gb Ram et cetera, very good microphone - did not catch what you use?
Dragons is not meant for programming and it never will be unfortunately as you know they do not support Natlinks and they broke "continuous recognition" in DPI 15 - more on that later
Utilizing programming support elements is really a requirement not an option.
Code cannot really add much directly to the puzzle, Dragon would have to implement more direct support for anything special, they will never do that.
I believe I am accomplishing everything you mention without much more set up then you are the have.
I think you see too many limitations with the current approach you are using with just vocala
I can do everything that Select and Say does, while I do not use correct that some of those facilities should be possible . also
Doing MarkDown is no problem using a mix of custom commands or using Emmetts and normal dictated text.
Key factor is using continuous recognition commands that allow you to chain both symbols, words and Code commands. For this you need Natlink+Dragonfly and some off-the-shelf grammars
Using some very basic Python you can add almost anything with little effort. I can share with you all my grammars both personal and collected.

I would strongly suggest giving Dragonfly grammars a try, and I would be happy to help with this. Let me know if you'd like to do this.

BTW I think one way Code could support this more as with a combination of recipes and perhaps an extension to help with setup. I believe this is the most likely path knowing both code and Dragon.

JoleCameron commented 6 years ago

Thanks for the offer, but as I am choosing to stick with the current version of Dragon (for reasons of employability and to make sure my system works long-term) your method won't work in my case. It's easy enough to fake continuous command recognition - that's not my issue here. And my setup is fine.

When it comes down to it: yes, I think I can get it to work without any changes to Code. However, this will require using workarounds that I wouldn't need to use if Microsoft lived up to its own accessibility standards for speech input. Sure, Dragon's not designed for programming, but I'm not asking for a special method to program by voice, just that the text box is designed according to Microsoft's own standards.

cleidigh commented 6 years ago

@JoleCameron Happy New Year's !

After all my work with this done without help, I'd like to help you get the most out of voice programming.

I think you have a couple options:

1) I believe you can install both 12 and 15, I cannot test this because I can never be without voice. I think you could try this to be able to use 12 for programming and 15 for other things and future compatibility. After 40+ years of programming and a lot of research and working with Dragon, any article on programming by voice will point you to one of the frameworks like dragonfly, vocala etc . Your flexibility and power cannot be matched by dragging alone. FWIW

2) while I highly recommend the above approach, if you're absolutely determined to use 15, I would still like to make this work better for you.

Natlink has been made to work partially with 15. Multiple people are trying to make this work better, however most likely will be with limitations but I think you might be able to get a fair amount out of it. This might be in between approach.

3) Lastly a pure 15 approach:

I want to point out a couple of things on your comments about compatibility
I am not sure why you are not able to use Dragon alone to enter dictation into an input box in Code. --- With my 14 I can use just Dragon commands to open the Find widget and enter search text. Press Control+F (default key binding for search) Put Dragon in to Normal or dictation mode "Start Normal Mode" (dictate search text)
you can optimize the above with DVC commands
The above is the standard way that Dragon interacts with any input item with no knowledge of the application
It is important to understand the several ways Dragon interacts with programs
Dragon really only has custom interactions in a couple of ways
It understands and can interact with Menus and dialogue buttons that utilize the Win32 Windows API
Many new applications utilize WPF which I think are currently not fully supported by Dragon, this is a Dragon issue not an application issue.
With respect to Code in particular it is somewhat of a special application itself. It does not use Win32 for anything other than the menu bar and a couple native dialogues. Code is an Electron app centered around the Chromium standalone browser engine.
This architecture means the entire application is browser like not native application like. Dragon{needs to interact with the application in the same manner that it typically interacts with a browser.
While Dragon has some add-ons for interacting with a few popular programs including browsers, these are done by Dragon personally I believe are not that great. I have created a few things to get much more out of Chrome than the Dragon add-ons. My browsing experience is quite good . this way
You mentioned correction, I use the built-in suggestions in Code as well as Undo, I do not Dragons Correction as it will rarely ever help with code. I believe it should work just fine . anyway.

Finally without sounding defensive (I am not) , Code does not violate nor does it incorrectly implement "input boxes", these are implemented as HTML5 input elements which we focused, will accept any input from Dragon. I have actually done extensions to Natlink and I have a pretty good idea of how it works. I have not actually determined how Dragon could be made more compatible, it almost always comes down to keyboard commands. I switched to Code after using many other editors for years in particular because of its accessibility. It supports things like screen readers and contrast modes , not necessary for us but nonetheless I think it makes Code the most accessible editor out there.

Let me know if you'd like to do some of these experiments.

cleidigh commented 6 years ago

@JoleCameron

Any thoughts on the above? Did you try my suggestion for dictation into input boxes?

Is there's something very specific to address given my comments?

JoleCameron commented 6 years ago

Sorry for not getting back to your sooner. I've been both busy and unwell this week, and I let this slide. My microphone also died last night, so I'm having to type this by hand. Hence, I'll be brief.

My concerns with VS Code boil down to the fact that I can't even dictate into the main text box without using a Dragon command, let alone have Select-and-Say access. Given that Microsoft provides an essential service (Windows), I think that the problem isn't entirely Nuance's fault. Beyond that, things go beyond my level of knowledge.

I would appreciate input on setting up programming by voice using Dragon 15, but I'd rather not do that through a public forum. To that end, I sent you a private message on the knowbrainer forum. I'll probably want input at about the one month mark.

LexiconCode commented 6 years ago

@claudioc

I also program by voice. Outside of Select-and-Say capabilities which would be a blessing to have VS Code. There are a number of other ways VS code could improve accessibility as well. First a little bit about my set up.

Edited 2/11/2020 - Updated information and links

Windows 10 64-bit -(8GB) of RAM - i5 7200U Dragon vDPI 15 SpeechWare FlexyMike Dual Ear Cardiod high-performance microphone with SpeechMatic MultiAdaptermulti

Natlink - NatLink is an OpenSource extension module for the speech recognition program Dragon.
Caster - Caster is a collection of tools aimed at enabling programming and accessibility entirely by voice. It runs on top of Dragonfly.
Dragonfly - A fork of dragonfly that utilizes CMU Sphinx, Dragon NaturallySpeaking, Windows Speech Recognition, Kaldi as a backend.

Microsoft and VS code contributors could at empower the voice to code community to develop extensions that facilitate accessibility. There are some outstanding limitations with Castor and Dragonfly both interact by emulating keystrokes in VS code. A uses example. Which is why we need A method to expose the VSCode active 'when Clause Contexts'.
https://github.com/Microsoft/vscode/issues/10471 https://github.com/Microsoft/vscode/issues/26882

zachgibson commented 6 years ago

I’m trying to use Dictation on a Mac and it doesn’t handle actually dictating text. I can perform commands such as open new file and such using Dictation in VSCode.

cece554 commented 6 years ago

having this problem as well with mac dictation, I tried saving snippets in dictation under commands VScode appears to be unable to handle them, but when I say worries are not under commands VScode prints those

sethwilsonUS commented 5 years ago

I'm legally blind, and while I can program reasonably well through conventional means I'm still excited about the possibilities of voice programming.

I've been working on a voice programming web abb using an open-source JavaScript library called AnnYang. It uses the web standard SpeechRecognition API, which at present only works in Chrome (and apparently also Firefox now, though I haven't tested that). I'm wondering if, since VSCode uses Chromium, I/we can integrate AnnYang into a VSCode extension. If this could work, it would be awesome, because it'd be a free, integrated, cross-platform solution. But I'm not sure how smooth the integration would be, or if AnnYang is powerful enough. But the idea has potential I think...

ryan-zheng-teki commented 5 years ago

I think many developers will really like it when we could use voice command to write code. Especially when people are back at home after a whole day's work. Now voice recognition accuracy is improving, and Microsoft is promoting the "remote-development".With the adoption of 5G, I really hope that voice coding could be integrated into VSCode.At least we could decrease 70% of our time sitting down every day which is really good for our health condition as a developer.

irasanchez commented 5 years ago

As a student who is developing wrist pain, I'd also appreciate this.

rbavery commented 4 years ago

Just want to chime in to support this.

niemyjski commented 4 years ago

It is super important to have accessible tools for everyone to use.

LexiconCode commented 4 years ago

I've been investigating alternatives that don't require reliance on the editor to expose information for accessibility via extensions. Microsoft's Accessibility Insights for Windows as a tool to investigate exposing and testing Windows accessibility API UI Automation. Currently there is no official UI Automation bindings for Python or standardize support from a community performance project. I've worked with a few people to expose some other editors Scintilla. From there we been able to expose menus and editable text, cursor position, and so on. My hope is that this could be done from UI Automation but there needs to be better support from Microsoft.

CJohnDesign commented 4 years ago

I support this too. Wrist pain.

isidorn commented 4 years ago

Hi, VS Code developer here 👋
First thanks a lot for the great feedback. We definetly want to have a nice dictation expereince in VS Code so let's try to get some concise info here. I do not use dictation software so I appologise for the simple questions:

What are the dictation software used on Win / Mac / Linx. Is Dragon used everywhere? I plan to try it out on my mac.
Does this dictation software have a GitHub page where we can interect with the developers?
Do these dictation software work well with Google Chrome, for example when you want to dictate into this GitHub input box
What is the experience with VS Code? It simply does not work? I know @cleidigh uses it with Dragon

Then we can try to figure out what should be done on the VS Code side and what should be done on the dictation software side.

Thanks!

niemyjski commented 4 years ago

I don't have answers to most of your questions :( but you have a whole team @ Microsoft (https://blogs.microsoft.com/accessibility/) who does nothing but accessibility. I'd recommend reaching out to Jessica Rafuse (She's pretty awesome).

LexiconCode commented 4 years ago

Hi, VS Code developer here 👋 First thanks a lot for the great feedback. We definetly want to have a nice dictation expereince in VS Code so let's try to get some concise info here. I do not use dictation software so I appologise for the simple questions:
1. What are the dictation software used on Win / Mac / Linx. Is Dragon used everywhere? I plan to try it out on my mac.

Dragonfly Cross-platform Win / Mac / Linx and can be used on a number of engines (like kaldi) on Mac/Linx where DNS is not supported. Supported engines Dragon NaturallySpeaking (DNS/DPI), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx

2. Does this dictation software have a GitHub page where we can interect with the developers?

You can get in touch with the voice coding community at https://gitter.im/dictation-toolbox/home. From there you can interact with individual projects such as Dragonfly/Caster/Unimacro/Vocala/Natlink and more

3. Do these dictation software work well with Google Chrome, for example when you want to dictate into this GitHub input box

Yes they do for general dictation and commands. There's a little more in the future for Select-and-Say capabilities being reimplemented. More on that a little bit later.

4. What is the experience with VS Code? It simply does not work? I know @cleidigh uses it with Dragon.

While I use DNS/DPI I have chosen to go with the Dragonfly framework because it leverages the Python language. I develop my grammar and it works on any speech recognition backend that Dragonfly supports and for the most part automatically cross-platform. Thus an important aspect is I'm not locked into the Nuance ecosystem or into any particular speech recognition backend. A disturbing trend overall is the death of on premise speech recognition software.

Nuance dropped support for Mac. As developers we really don't have access to define advanced client-side grammars for cloud-based speech recognition at the time of writing designed for coding. Therefore as a community we are also investing in leveraging and developing open source technologies for speech recognition backends.

I primarily code in VS Code and it works just fine. Select-and-Say functionality is not there yet but as a community we are working towards re-implementing that functionality independent of DNS. Previously due to electron/other quirks we were not able to access the necessary controls to start implementing Select-and-Say capability. It is in the beginning stages but thanks to the microsoft development team with VS Code enhancing their support for screen readers.

Medium goal is to leverage as much information from the OS accessibility APIs such as UIautomation. Then develop a minimal plug-in to communicate VS Code specific information to allow bidirectional communication grammars and application/OS. Even without these enhancements many people successfully code by voice.

Note: NatLink is now fully compatible with DPI v15. In addition it is also very close to being released for Python 3.

isidorn commented 4 years ago

@LexiconCode great, thank you very much for the information.

So it would be great if we would file issues to VS Code regading all those quirks so we know what is missing from making the voice coding experience even better. Out of those issues we can then figure out what should be forwarded to Chrome and Electron. From what you are saying feels like most of the things should be done on the Electron side and there is no low hanging fruit we can tackle in VS Code.

I will also join the Gitter channels you posted. Thanks! I posted in the Caster channel since that one had the most particpants :)

lunixbochs commented 4 years ago

There is another voice programming community around my Talon project, with a very active Slack. I know a lot of people voice code with VSCode there on all of windows/linux/mac, and there is a #vscode channel. It might be worth getting feedback from those users.

Serenade also supports VSCode, but I can't really speak for them.

isidorn commented 4 years ago

@lunixbochs thanks. I have posted a message in your Talon slack project.

adiabatic commented 4 years ago

What are the dictation software used on Win / Mac / Linx. Is Dragon used everywhere? I plan to try it out on my mac.

I'm using the built-in Dictation on macOS Catalina.

Does this dictation software have a GitHub page where we can interect with the developers?

Possible, but I doubt it.

Do these dictation software work well with Google Chrome, for example when you want to dictate into this GitHub input box

No. When I open a web page in Safari and say "show numbers", a number pops up for every toolbar item and every visible element on the page. In Chrome, by contrast, I only see numbers pop up for UI elements on the browser chrome (close/minimize/fullscreen, each tab, new tab, back, etc.)

What is the experience with VS Code? It simply does not work? I know @cleidigh uses it with Dragon

When I say "show numbers", the only UI elements that get numbers are the close, minimize, and full-screen buttons on the top left.

lunixbochs commented 4 years ago

You can make Electron apps respond to macOS accessibility, there's an extended attribute you can set on it with AXUIElementSetAttributeValue iirc

But that's not a great solution, walking the accessibility hierarchy of Chrome can be super slow, so native accessibility is better.

tangela19 commented 3 years ago

Hello, its worth noting that dictation also does not work with standard windows speech recognission.

problemSolvingProgramming commented 3 years ago

How Can I Use The Dragon Naturally Speaking In Speech Recognition Project Using C# ?

How can i use the dragon naturally speaking and tools in speech recognition project using c# in Visual Studio 2017

pokey commented 3 years ago

isidorn commented 3 years ago

Just FYI there is a voice-assistant VS Code extension for Windows. You can find it here https://github.com/b4rtaz/voice-assistant I tried it out and feels like it is in the early stages and still needs a lot of polish, but nevertheless it looks interesting.

fusentasticus commented 2 years ago

@isidorn Thanks for following this thread on automation needs for those of us who prefer or have to command our computer by voice!

Now that there is a dedicated subdirectory for automation in the source tree, should we as dictation users go and vote for https://github.com/microsoft/vscode/issues/136121 so that at least this part becomes easily user installable?
And with the automation already in place would it be a big deal to write a full UIAutomation driver on top? By the real thing, I'm of course thinking about the excellent conceptual framework https://docs.microsoft.com/en-us/windows/win32/winauto/entry-uiautocore-overview, which very nicely Microsoft's Edge browser already supports and that Microsoft has given to the community per official commitments.
So, it is as if all the technology pieces are in place for something like word-under-the-mouse and custom select-and-say mechanisms to be easily implemented by dictation systems --- if we could just get a full built-in UIAutomation service for VS Code!! Specifically, we're looking for goodies like FromPoint, RangeFromPoint and Select from Text pattern, the TextEdit patterns, and all the well-designed stuff for automation of panels, tabs, and buttons etc.
My comments here should include: that I do see at least partial UIAutomation support when the VS Code window is focused (active window). However, the TextEdit control is disabled unless "Accessibility support" is turned on. Unfortunately, turning this setting on forces text wrapping to be off (which is not always good for visual users!). Also, when vscode is unfocused, the automation elements returned by FromPoint appear to be an internal VS Code hierarchy not related to the UIAutomation model, which is why I am confused about the status of automation for VS Code! I'm not sure for example how much of the current automation in VS code has bubbled up from underlying automation work on Chromium/Electron [My preliminary testing is done via FlaUI in UIA3 mode.]

isidorn commented 2 years ago

@fusentasticus thanks for your reply. Let me try to answer:

I would not suggest voting on that, that is just testing infrastructure we use and we do not have any plans to add this. I hope we can achieve this not using Playwright
I am not an expert in UIAutomation framework, so I do not really know how to best answer this. If something can be written that interacts with VS Code that would be great. VS Code is using Chromium underneath, so theoretically is this UIAutomation works with Chrome or Edge it should be possible for it to work with VS Code
I see how this UIAutomation would enable a lot of scenarios, and that sounds great!
Word wrapping being disabled is covered by this issue https://github.com/microsoft/vscode/issues/95428 We can fine tune this behaviour. And yes I believe VS Code simple bubble up from underlying automation work on Chromium/Electron

meganrogge commented 1 year ago

Exploring a different, though related idea in #170554. Please let us know what you think there.

meganrogge commented 1 year ago

Hi @JoleCameron, it has been a while since we last touched base with you. How are you finding the dictation support in VS Code these days? Is there anything we can do to help?

bpasero commented 8 months ago

Fyi I am splitting this issue into the part that is actually being worked on: dictation support in the editor (https://github.com/microsoft/vscode/issues/205263).

I think this issue here in particular asks for voice-to-text support in all locations that accept textual input, which is not in scope for February.

bpasero commented 8 months ago

With our February release, there is now support to use your voice to dictate into the editor: https://code.visualstudio.com/updates/v1_87#_use-dictation-in-the-editor

ezgif-5-74147b0701

After installing the VS Code Speech extension you can use the keyboard shortcut Ctrl+Alt+V (Cmd+Alt+V on macOS) to start it.

Can people in this issue try it out and report back how it goes? Thanks!

meganrogge commented 6 months ago

In reading through this issue, here are my findings:

Dragon does work with VS Code, though beginner dictation users can find configuring their setups to do so challenging
there is interest in using voice commands to write code. We now support that with copilot chat and with editor dictation.
It would improve the experience to expose VS Code when clauses so extensions could know context/focus and tell Caster, a dragonfly based programming toolkit that enables running commands/writing code. However, we now have copilot chat and Hey Code for those.

cc @isidorn, I think this issue can be closed given these findings.

isidorn commented 6 months ago

Thank you very much for those insights.

I agree that we can go ahead and close this issue. But I think we should create a follow up feature request for voice to trigger VS Code commands. Something that we currently do not support well, and it would be good to understand the need better.

For other requests (when close through API) there are already issues capturing this.

Users of the Voice extensions - we plan to do a user study at end of May. If you would like to help, more details can be found here https://github.com/microsoft/vscode-discussions/discussions/1144

bpasero commented 6 months ago

I think we have that as https://github.com/microsoft/vscode/issues/209906

meganrogge commented 6 months ago

I have assigned https://github.com/microsoft/vscode/issues/209906 to myself and added the accessibility label

microsoft / vscode

Dictation support for visual studio code #40976