microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
160.19k stars 28.06k forks source link

Speech To Text in VS code is awkward on MacOS #213149

Open p-i- opened 1 month ago

p-i- commented 1 month ago

Type: Bug

Just try using the MacOS inbuilt Dictation tool in VSCode.

(This tool can be activated under SystemSettings -> Keyboard -> Dictation).

Many problems:

I think that the fundamental problem here is with this MacOS tool. I think it's design is overly complex and intricate, and it often falls over.

Given that most VS Code users spend most of their day entering text into VSCode, it would be really nice to have a solution that takes care of SpeechToText. Maybe a fix to interop with this Dictation tool, maybe an extension, maybe a VSCode core functionality.

I'm not bothered about speech-to-code. I'm quite happy to type my code. but if I am editing text files (.txt, .md, .nt, etc.) or modifying text content within the code (e.g. AI prompts, docstrings, strings, comments, etc.) I would like something simple and reliable.

VS Code version: Code 1.89.1 (dc96b837cf6bb4af9cd736aa3af08cf8279f7685, 2024-05-07T05:14:32.757Z) OS version: Darwin arm64 23.4.0 Modes:

System Info |Item|Value| |---|---| |CPUs|Apple M2 (8 x 24)| |GPU Status|2d_canvas: enabled
canvas_oop_rasterization: enabled_on
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: disabled_off
video_decode: enabled
video_encode: enabled
webgl: enabled
webgl2: enabled
webgpu: enabled| |Load (avg)|2, 2, 2| |Memory (System)|24.00GB (2.49GB free)| |Process Argv|--crash-reporter-id f10d97cd-2115-4dba-a34a-07be9312995a| |Screen Reader|no| |VM|0%|
Extensions (21) Extension|Author (truncated)|Version ---|---|--- dvt-remote-ssh|ami|1.0.0 nestedtext|bma|2.0.0 githistory|don|0.6.20 copilot|Git|1.194.886 copilot-chat|Git|0.15.2024043005 vsc-python-indent|Kev|1.18.0 rainbow-csv|mec|3.11.0 vscode-docker|ms-|1.29.1 debugpy|ms-|2024.6.0 python|ms-|2024.6.0 vscode-pylance|ms-|2024.5.1 jupyter|ms-|2024.4.0 jupyter-keymap|ms-|1.1.2 jupyter-renderers|ms-|1.0.17 vscode-jupyter-cell-tags|ms-|0.1.9 vscode-jupyter-slideshow|ms-|0.1.6 remote-containers|ms-|0.362.0 remote-ssh|ms-|0.110.1 remote-ssh-edit|ms-|0.86.0 remote-explorer|ms-|0.4.3 vscode-speech|ms-|0.8.0 (1 theme extensions excluded)
A/B Experiments ``` vsliv368cf:30146710 vspor879:30202332 vspor708:30202333 vspor363:30204092 tftest:31042121 vstes627:30244334 vscorecescf:30445987 vscod805cf:30301675 binariesv615:30325510 vsaa593cf:30376535 py29gd2263:31024239 vscaac:30438847 c4g48928:30535728 azure-dev_surveyone:30548225 2i9eh265:30646982 962ge761:30959799 pythongtdpath:30769146 welcomedialog:30910333 pythonidxpt:30866567 pythonnoceb:30805159 asynctok:30898717 pythontestfixt:30902429 pythonregdiag2:30936856 pythonmypyd1:30879173 pythoncet0:30885854 2e7ec940:31000449 pythontbext0:30879054 accentitlementst:30995554 dsvsc016:30899300 dsvsc017:30899301 dsvsc018:30899302 cppperfnew:31000557 dsvsc020:30976470 pythonait:31006305 chatpanelt:31048053 dsvsc021:30996838 jg8ic977:31013176 pythoncenvptcf:31049071 a69g1124:31046351 pythonprc:31047982 dwnewjupytercf:31046870 26j00206:31048877 ```
p-i- commented 1 month ago

If you could just hook the did_complete of the Dictation tool and use AI to post-process and re-render the affected text, maybe this would do the job. If that's possible...

p-i- commented 1 month ago

Here's an example of the duplicate-text bug.

I'm speaking test 123 optionally followed by full stop or new paragraph and then hitting BACKSPACE or ENTER, or LEFT-ARROW, or 'a' or pretty much anything it seems.

It seems that if I don't allow enough silence for it to 'settle down' after I've said 'full stop', the utterance text gets double-injected into the window.

In TextEdit I can't replicate this particular fail. It isn't 100% right there either. It is inserting unwanted newline characters.

https://github.com/microsoft/vscode/assets/693495/4b517d5c-6938-4ae3-8830-4c1e4b64a1ca

p-i- commented 1 month ago

Here's a demo of the wordwrap + superposition issue:

https://github.com/microsoft/vscode/assets/693495/cb098b5f-e6d6-492e-af7c-7917ce421d30

p-i- commented 1 month ago

Here's an example of the Capitilization-of-start-of-new-phrase problem:

There are other situations where I get a Capitalization fail, e.g. inserting the cursor into a sentence and speaking.

https://github.com/microsoft/vscode/assets/693495/9fafdb22-f227-4c10-baac-dd838215d283

This one is probably a really tricky fix, as macOS dictation assistant is clearly scraping the text for the active window and operating over that.

I think a VS code native speech tool would be a much appreciated feature!

p-i- commented 1 month ago

Here's a nice repeatable minimal testcase for duplication.

All I do here is double-tap Fn to invoke the macOS speech-to-text assistant and speak "Test 123" followed by a couple of seconds of silence followed by "New paragraph".

And then I just wait.

Firstly it DOESN'T create a new paragraph, just a couple of spaces. Secondly, once it times out it dumps a duplicate of the utterance.

https://github.com/microsoft/vscode/assets/693495/6b58590b-aa13-47f7-b766-4f9547a9c6f2