microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.39k stars 29.32k forks source link

Lay of the land / help / overview command #201898

Open meganrogge opened 10 months ago

meganrogge commented 10 months ago

@kieferrm had an idea where we would use copilot to describe the current state & actions that can be taken to a screen reader user. Imagine what we do currently using the contextual help dialogs, but with more information.

In thinking about this, I wondered if we could have a command that would describe the current lay of the land / overview. For example, if a user is unsure where the focus has gone or if something is not working as expected, it might say "there are 2 editor groups open. The first has a bash terminal and the second has 4 typescript files. The extensions view is focused and contains 3 extensions that suggest reloading the window.". This would save time as otherwise, a user would have to navigate all over the workbench for this information.

@jooyoungseo, @rperez030 please let us know what you think of these ideas.

One thing we'd want to consider is should this content be read using Copilot voice or presented in an accessible view? I imagine the latter.

rperez030 commented 10 months ago

That sounds like a super powerful idea. My initial thought is that this should be part of the accessible view simply because not everyone will have access to Github copilot which is a paid product, but copilot would make much more sense because it would allow the user to ask more specific questions, for example, tell me more about the typescript files, or "how can I close that terminal?". If, on top of that, copilot could perform actions on the editor on behalf of the user... combine that with the speech feature, and we'll have powerful assistant that would help multiple user groups in a number of situations.

jooyoungseo commented 10 months ago

@meganrogge I would vote for Copilot voice instead of accessible view so that we can give a clear cue for users to distinguish the response from other text info in the accessible view.

rperez030 commented 10 months ago

@meganrogge I would vote for Copilot voice instead of accessible view so that we can give a clear cue for users to distinguish the response from other text info in the accessible view.

What about Braille users? Is copilot voice available to users who do not have access to GitHub copilot? Is it possible to interact with messages coming from copilot voice using the screen reader?

I personally haven't tried copilot voice.

meganrogge commented 10 months ago

That is a really great point @rperez030. I'm not certain, but would hope that Copilot Voice works with braille devices already. @bpasero do you know this?

rperez030 commented 10 months ago

braille support is really a screen reader feature. i don't know much about copilot voice. If it is supposed to be a voice only experience, it probably doesn't make sense that it works with Braille, but then all the features that are available through copilot voice should also be available through copilot chat.

bpasero commented 10 months ago

@meganrogge can you clarify what you are asking me? Today the vscode-speech extension provides the capability of transcribing voice to text and fill it into the Copilot chat input boxes. We have not explored any further capabilities such as speaking text back to the user, though that is a feature that the underlying speech library is providing. Maybe check out https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-sdk if that helps.

I miss some context on what it would mean to support braille from the extension, maybe someone can clarify for me.

meganrogge commented 10 months ago

I was confused. I thought Copilot Voice allowed for text to be read to the user.

The link you shared looks like what I'd actually want to use, thanks.

So, a user would trigger the help command, we'd ask copilot to describe the lay of the land, and we'd use that library to read the response aloud. We'd have to see if the library integrates with braille. if it does not, I would guess we'd want to alert that text instead, so would need a setting as I do worry about how this would interplay with screen readers.

bpasero commented 10 months ago

The ability to read out text is something currently not exposed from the library or the extension but I think could be added:

meganrogge commented 10 months ago

On a teams call, @rperez030 used an NVDA extension to get this lay of the land:

"This image appears to be a screenshot of an online video conference call interface. There are four participants shown in the conference, each with a distinct video or profile image and name displayed.

Starting from the top left corner of the window, you can see a row of various icons and the duration of the current call, which is 03:35. Next to the call duration, there are options for engaging in the video conference call, such as Chat, People (highlighted with a number indicating there are 4 people in the call), Raise hand, React, View options, Notes, Whiteboard, Apps, and More options. There is a prominent Leave button in red, indicating the option to exit the call.

In the main area of the interface, there are four participants:

  1. At the top left, there is a man named Roberto Perez, who is currently in a live video feed. He is wearing a polo shirt with horizontal stripes and headphones. He is in a room with a cream-colored wall and is visible from the chest up.

  2. At the top right, there is a woman named Megan Rogge, also shown in a live video feed. She has curly hair and is wearing a plaid shirt. She is in a well-lit room with white doors and a window showing daylight in the background.

  3. At the bottom left, there is a participant named Debra Shelton (Northwest Center), but instead of a live video feed, there is a profile picture displayed. The picture shows a person with a fair complexion and dark hair that covers part of their face, looking towards the camera with a light source in the background.

  4. At the bottom right, there is a participant named Jen Agarwal, who is also represented by a profile picture instead of a live video. The image features a person with medium-dark skin and dark hair, looking to the side with a natural backdrop.

It's important to note that all names and any personal characteristics described here are derived from the visual content of the screenshot and do not violate the privacy rules set forth. The video conference interface itself bears a strong resemblance to Microsoft Teams or a similar application."