Hay Say Accessibility Feedback

hydrusbeta / hay_say_ui

A unified, browser-based interface for pony voice generation

Apache License 2.0

39 stars 3 forks source link

Hay Say Accessibility Feedback #1

Open queenslight opened 1 year ago

queenslight commented 1 year ago

As a blind/visually impaired MLP Stallion, I was able to successfully install and run the docker application, along with running said script. The issue I am having, is the character comboboxes. They do not properly work with the VoiceOver screen reader:

http://apple.com/accessibility/vision/

, trying with both Microsoft Edge and Safari browsers.

I am running an M1 Mac 2020 with 8gb ram and 256gb storage.

I am wondering as well, if this program will work in a Windows 11 ARM Virtual machine.

That'll be something I shall test with screen readers for said operating system.

hydrusbeta commented 1 year ago

Thank you for the feedback, queenslight. I think I have reproduced the issue you are referring to. When I navigate up and down the character options in the comboboxes/dropdowns using the arrow keys, the screen reader does not read off the character names, which would make it impossible to know which character you are selecting. Does that accurately describe the issue you are facing? I'll see what I can do to make the comboboxes work with a screen reader.

I have not tried Hay Say in a Windows 11 virtual machine, so I can't definitively say whether doing that would work, but I can't think of any technical reason why it wouldn't work off the top of my head. I would be concerned about its performance, because there are at least 2 levels of virtualization happening. First, Windows itself is being executed in a virtual machine. Second, Windows uses WSL (Windows Subsystem for Linux) to run Docker containers, which is a sort of light virtual machine. If you do decide to try it, I would be interested to hear how it goes.

queenslight commented 1 year ago

Yes, that describes it perfectly!

Also if possible, could there be a way of notification of when the generation of the voice is complete? That would be most helpful.

Also, could the buttons for each available model be an actual button for tapping? At the moment, it's a mouseover, though not identified as true buttons/toggles.

Thanks for the quick reply, and for this wonderful project!

PS. As for trying this in a VM, I'm wondering if I should wait till more support for the Mac version is better, or if I should just go for it. And yes you're correct, windows screen readers (and Linux ones for that matter), should act all the same once said fix is applied with the character selection. PSS. Do to PonyChan not having an audio captcha, I have not been able to participate over there, though I am on their Mastodon:

@.***/

, though I always check the thread daily!

On Sat, May 13, 2023 at 9:36 PM hydrusbeta @.***> wrote:

Thank you for the feedback, queenslight. I think I have reproduced the issue you are referring to. When I navigate up and down the character options in the comboboxes/dropdowns using the arrow keys, the screen reader does not read off the character names, which would make it impossible to know which character you are selecting. Does that accurately describe the issue you are facing? I'll see what I can do to make the comboboxes work with a screen reader.

I have not tried Hay Say in a Windows 11 virtual machine, so I can't definitively say whether doing that would work, but I can't think of any technical reason why it wouldn't work off the top of my head. I would be concerned about its performance, because there are at least 2 levels of virtualization happening. First, Windows itself is being executed in a virtual machine. Second, Windows uses WSL (Windows Subsystem for Linux) to run Docker containers, which is a sort of light virtual machine. If you do decide to try it, I would be interested to hear how it goes.

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1546798561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKUCPC3G5EFYQBMZUVDXGBHNBANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

hydrusbeta commented 1 year ago

I was using the Dash Core Component "dcc.dropdown" for the comboboxes. Apparently, these don't work so well with screen readers. Fortunately, I can replace those with standard html "select" elements which do work with screen readers. The resulting comboboxes look only slightly different. I also noticed that many of the options are missing html labels, so the screen reader recognizes that there is a combobox or checkbox but does not read off the option's name (e.g. "Character" or "Disable Audio Input"). I should be able to add those labels in. I'll start implementing these changes.

When you say "the buttons for each available model", I assume you are referring to the tabs that you can click to load the options for each architecture? It seems to be impossible to select any of the tabs using the keyboard alone, so I can see that being an issue. Once again, I was using a Dash Core Component, this time the "dcc.Tabs" component. I should be able to replace those with html buttons so you can select them with a keyboard and have the screenreader read off the names.

As for notifying the screen reader when the generation is complete, that may be a little tricky and I'll need to do more experimentation. I might be able to play a sound. There's also an html attribute called "role" that I can set to "alert" which supposedly informs the screen reader when an element is hidden or unhidden; maybe I can do something with that.

queenslight commented 1 year ago

All sounds fantastic! Yes, the HTML screen reader alert should do just nicely. And yes, I was indeed talking about the tabs for each particular model. . As you probably know too, the M2 Mac Mini (depending on the configuration) has slower read and write speeds with the M2 VS the M1. Was thinking about that when ya mentioned there was issues with voice generations. Sent from my T-Mobile 5G Device -------- Original message --------From: hydrusbeta @.> Date: 5/14/23 11:56 AM (GMT-07:00) To: hydrusbeta/hay_say_ui @.> Cc: queenslight @.>, Author @.> Subject: Re: [hydrusbeta/hay_say_ui] Hay Say Accessibility Feedback (Issue #1) I was using the Dash Core Component "dcc.dropdown" for the comboboxes. Apparently, these don't work so well with screen readers. Fortunately, I can replace those with standard html "select" elements which do work with screen readers. The resulting comboboxes look only slightly different. I also noticed that many of the options are missing html labels, so the screen reader recognizes that there is a combobox or checkbox but does not read off the option's name (e.g. "Character" or "Disable Audio Input"). I should be able to add those labels in. I'll start implementing these changes. When you say "the buttons for each available model", I assume you are referring to the tabs that you can click to load the options for each architecture? It seems to be impossible to select any of the tabs using the keyboard alone, so I can see that being an issue. Once again, I was using a Dash Core Component, this time the "dcc.Tabs" component. I should be able to replace those with html buttons so you can select them with a keyboard and have the screenreader read off the names. As for notifying the screen reader when the generation is complete, that may be a little tricky and I'll need to do more experimentation. I might be able to play a sound. There's also an html attribute called "role" that I can set to "alert" which supposedly informs the screen reader when an element is hidden or unhidden; maybe I can do something with that.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

hydrusbeta commented 1 year ago

I've run into a technical hurdle, which I wanted to document here for future reference.

Unfortunately, the Select class from the dash_html_components module (i.e. html.Select) does not have a "value" property, which makes it difficult for the Python code to determine which item is selected.

I'll see whether I can listen for the n_clicks property on all the Option elements, examine the callback context to determine which Option was selected, and save the result to a Store object. Failing that, I'll try to get the dcc.dropdown component to work better with a screen reader or pursue alternatives (like, maybe a scrollable list of buttons?).

Update: listening for the n_clicks property on Option elements does not work; that attribute does not trigger callbacks. I can listen for n_clicks on the Select element, but there's no way to tell which option was selected when the callback is triggered.

Update 5/20/2023: After trying lots of different things, including using custom JavaScript callbacks, I believe it is impossible to determine which Option element is selected within a Select element in Plotly. This makes Select/Option elements a no-go. I have also been unable to get the dcc.Dropdown component to work any better with screen readers. However, I have good news. The Dash Bootstrap Component library defines its own dropdown menu, dbc.DropDownMenu, which does work with screen readers. I will now work on replacing the dcc.Dropdown component with dbc.DropDownMenu. In the meantime, I have successfully replaced all dcc.Tab components with html.Button elements and have added labels for all options in the accessibility_enhancements branch.

Another update 5/20/2023: I was able to replace all the dcc.Dropdown components with dbc.DropDownMenu, but there was a small flaw that was bugging me. The callback for selecting the first item in the audio selection dropdown was automatically firing when the user uploaded a new audio file and I don't know why. This made it impossible to auto-select the file that the user just uploaded. While trying to find a solution, I stumbled across another Dash Bootstrap Component called dbc.Select which provides yet another way to create dropdown menus. After experimenting with it a little, it seems to be the perfect solution. It is screenreader-friendly, searchable, and should provide a cleaner solution than dbc.DropDownMenu because I can simply access its "value" property to determine which item is selected, rather than listening for an n_clicks event on every item within a dbc.DropDownMenu. It will take a bit of work to style the dbc.Select element to my liking, but this component looks very promising.

queenslight commented 1 year ago

Yes, a scrollable list of buttons should do the trick. Thanks for keeping me up to date with progress.

On Tue, May 16, 2023 at 8:42 PM hydrusbeta @.***> wrote:

I've run into a technical hurdle, which I wanted to document here for future reference.

Unfortunately, the Select class from the dash_html_components module (i.e. html.Select) does not have a "value" property, which makes it difficult for the Python code to determine which item is selected.

I'll see whether I can listen for the n_clicks property on all the Option elements, examine the callback context to determine which Option was selected, and save the result to a Store object. Failing that, I'll try to get the dcc.dropdown component to work better with a screen reader or pursue alternatives (like, maybe a scrollable list of buttons?).

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1550604806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKQEKUCKQJSUC3PBZGLXGQ3ITANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

hydrusbeta commented 1 year ago

Hello queenslight,

I have Implemented the following Accessibility enhancements:

The old "Tabs" for selecting an architecture have been replaced with Buttons which can be navigated by Keyboard.
dropdowns for selecting a character and for selecting an audio input file should be screenreader-friendly now.
When audio finishes generating, screenreaders should now say "new output generated".
Screenreaders should now read off the name of each input control (e.g. "Character" or "Shift Pitch (semitones)") when they have focus.

I put up a demo webpage at the URL below. You can't upload your own audio files to it and it doesn't generate real output, but it should otherwise be a faithful demo of the UI. If you get a chance, would you mind checking it out and letting me know whether it works with your screen reader and whether there are any issues you notice?

http://hydrusbeta.pythonanywhere.com/

queenslight commented 1 year ago

Greetings!

I am happy to say, your interface changes are an amazing success! Thanks for letting Twilight do said messages. I give my thanks to both of y'all!

The only weird thing I found, (and it's probably a Chromium issue really), if you switch between the models, both Firefox and Safari will say "updating" to let ya know you switched models while 'Microsoft Edge' in this case, says nothing. When it comes to the 'highly accessible' menus (regardless of what browser ya use), read absolutely fabulous! Also, all browsers do announce when the output has been 'generated'.

My only other suggestion, is if there was a way to 'optionally' download the necessary things for one specific model. As an example, I would myself most likely use Sovids version 4, since that's got the most characters. Hmmm... Should I mention I've had a crush on Twilight in her Equestria Girls form? 🤔

Anyway, I can't wait to test more! Keep up the fantastic work!

On Wed, May 24, 2023 at 11:08 PM hydrusbeta @.***> wrote:

Hello queenslight,

I have Implemented the following Accessibility enhancements:

The old "Tabs" for selecting an architecture have been replaced with Buttons which can be navigated by Keyboard.

dropdowns for selecting a character and for selecting an audio input file should be screenreader-friendly now.

When audio finishes generating, screenreaders should now say "new output generated".

Screenreaders should now read off the name of each input control (e.g. "Character" or "Shift Pitch (semitones)") when they have focus.

I put up a demo webpage at the URL below. You can't upload your own audio files to it and it doesn't generate real output, but it should otherwise be a faithful demo of the UI. If you get a chance, would you mind checking it out and letting me know whether it works with your screen reader and whether there are any issues you notice?

http://hydrusbeta.pythonanywhere.com/

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1562276861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKRY2YL2O23KJULSEZDXH3SOTANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

hydrusbeta commented 1 year ago

It is really great to hear that you find it a success!

I have merged all of these changes to the main branch and have begun building a new Docker image. I'll post here again when it is available for download, along with instructions on how to update Hay Say.

I'm going to switch my focus to some other tasks on my to-do list for now, but I'll take a look at the issue with Microsoft Edge not announcing when you've switched architectures in the future.

It is possible to download only one specific model, but you have to manually edit the docker-compose file. There are more details in the Readme, here: Installing Only Specific Architectures Basically, you would need to comment out all sections that begin with "XXX_server" except for so_vits_svc_4_server and also comment out all sections that begin with "XXX_modelpack#" except for so_vits_svc_4_model_pack_0, 1, and 2. In the future, I plan to make a "Hay Say Launcher" application where you can use checkboxes to select which architectures and model packs you want, and then the launcher will dynamically build the docker-compose file for you. That will be much easier for everyone to use, I think.

queenslight commented 1 year ago

I'm very glad I'm able to help with all of this, and happy that my feedback is deeply valuable. And yes, that application interface will come in handy in the future for sure!

I'll be sure to watch out for your email once the new update arrives.

Thanks for the tip about commenting out said lines in the file by the way. Kind regards

Trenton Matthews PS. Thos guys over at the 'Pony Preservation Project' may be an odd bunch, but they're talented work (including your own) is truly appreciated!

On Fri, May 26, 2023 at 7:30 PM hydrusbeta @.***> wrote:

It is really great to hear that you find it a success!

I have merged all of these changes to the main branch and have begun building a new Docker image. I'll post here again when it is available for download, along with instructions on how to update Hay Say.

I'm going to switch my focus to some other tasks on my to-do list for now, but I'll take a look at the issue with Microsoft Edge not announcing when you've switched architectures in the future.

It is possible to download only one specific model, but you have to manually edit the docker-compose file. There are more details in the Readme, here: Installing Only Specific Architectures https://github.com/hydrusbeta/hay_say_ui#installing-only-specific-architectures Basically, you would need to comment out all sections that begin with "XXX_server" except for so_vits_svc_4_server and also comment out all sections that begin with "XXX_modelpack#" except for so_vits_svc_4_model_pack_0, 1, and 2. In the future, I plan to make a "Hay Say Launcher" application where you can use checkboxes to select which architectures and model packs you want, and then the launcher will dynamically build the docker-compose file for you. That will be much easier for everyone to use, I think.

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1565137290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKTN2M2A6NX65266LU3XIFKMDANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

hydrusbeta commented 1 year ago

Yes, your feedback is very valuable and much appreciated! It is almost certain that there are other users who would benefit from these accessibility enhancements but never reach out. Thank you for taking the time to test the Demo webpage too.

I have uploaded the Docker image, so you should be able to update Hay Say now by executing the following 3 commands in a terminal. The first will shut down your containers if they are running, the second will pull the latest image, and the third will start Hay Say again:

docker compose stop
docker compose pull
docker compose up

Best, HydrusBeta

queenslight commented 1 year ago

Greetings!

After updating the app (even before accessibility gotten fixed), I keep getting this error which, sadly i'm not sure what the issue is.

An error has occurred. Please send the software maintainers the following information (please review and remove any private info before sending!): Traceback (most recent call last): File "/root/hay_say/hay_say_ui/main.py", line 327, in generate hash_preprocessed = preprocess_if_needed(selected_file, semitone_pitch, debug_pitch, reduce_noise, crop_silence) File "/root/hay_say/hay_say_ui/main.py", line 436, in preprocess_if_needed hash_preprocessed = preprocess(selected_file, semitone_pitch, debug_pitch, reduce_noise, crop_silence) File "/root/hay_say/hay_say_ui/main.py", line 642, in preprocess preprocess_file(hash_raw, hash_preprocessed, semitone_pitch, debug_pitch, reduce_noise, File "/root/hay_say/hay_say_ui/main.py", line 658, in preprocess_file data_raw, sr_raw = read_audio_from_cache(RAW_DIR, hash_raw) File "/usr/local/lib/python3.10/site-packages/hay_say_common/file_integration.py", line 77, in read_audio_from_cache path = os.path.join(folder, filename_sans_extension + CACHE_EXTENSION) TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Also do to me only having 256GB on this 8GBRam mac mini m1, here's hoping Im not out of luck with trying to run these models.

I have 140GB of space left at the moment.

I also do have 'Home Brew' installed too.

On Fri, May 26, 2023 at 9:23 PM hydrusbeta @.***> wrote:

Yes, your feedback is very valuable and much appreciated! It is almost certain that there are other users who would benefit from these accessibility enhancements but never reach out. Thank you for taking the time to test the Demo webpage too.

I have uploaded the Docker image, so you should be able to update Hay Say now by executing the following 3 commands in a terminal. The first will shut down your containers if they are running, the second will pull the latest image, and the third will start Hay Say again:

docker compose stop docker compose pull docker compose up

Best, HydrusBeta

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1565181853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKXNHJHYYV3LYHD2PBLXIFXRNANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

hydrusbeta commented 1 year ago

Hello Queenslight,

This error indicates that either:

An invalid file was somehow selected when you tried to generate audio
A certain metadata file got corrupted, or
There is a bug in the code's logic for handling the case where no file is selected.

To help narrow down the root cause, can you tell me which architecture you had selected when you encountered this error (e.g. Controllable TalkNet, so-vits-svc 3.0 or so-vits-svc 4.0)? Also, did you have an audio input file selected? If so, do you see this issue when any audio input file is selected or just a particular one?

queenslight commented 1 year ago

Greetings!

I was using a .mp3 file, and it happened with both Sovids 3.0 and 4.0.

Also, each time I try commenting out Sovids 3 and Talknet options, the Docker file continues giving me errors and never starts.

If I pull down the entire file by updating it, it keeps telling me I don't have enough space. Think I said this before, but I only have a Mac Mini M1 8GB 256GB SSD model (M1).

I'll gather all of the error messages I get and put them in a separate message/comment.

On Sat, May 27, 2023 at 7:16 AM hydrusbeta @.***> wrote:

Hello Queenslight,

This error indicates that either:

An invalid file was somehow selected when you tried to generate audio

A certain metadata file got corrupted, or

There is a bug in the code's logic for handling the case where no file is selected.

To help narrow down the root cause, can you tell me which architecture you had selected when you encountered this error (e.g. Controllable TalkNet, so-vits-svc 3.0 or so-vits-svc 4.0)? Also, did you have an audio input file selected? If so, do you see this issue when any audio input file is selected or just a particular one?

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1565417160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKRZEDHCYVAD2737K6TXIH5CTANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

queenslight commented 1 year ago

Update!

Here, is the error I get. NB. Out of the 245GB I have with the machine itself, there is currently 147GB of space left. That's after commenting out Sovids 3 and Talknet, only keeping Sovids 4:

requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested 0.0s

Error response from daemon: failed to copy files: copy file range failed: no space left on device

Models should take up that much space you'd think... Hmm... I am truly not sure.

The only way I know how to start from scratch every time is to use CCleaner and clear the space that way.

After that, I usually have around 206-212GB of space available, out of the 245GB of harddrive space.

On Sun, May 28, 2023 at 12:03 AM Trenton Matthews @.***> wrote:

Greetings!

I was using a .mp3 file, and it happened with both Sovids 3.0 and 4.0.

Also, each time I try commenting out Sovids 3 and Talknet options, the Docker file continues giving me errors and never starts.

If I pull down the entire file by updating it, it keeps telling me I don't have enough space. Think I said this before, but I only have a Mac Mini M1 8GB 256GB SSD model (M1).

I'll gather all of the error messages I get and put them in a separate message/comment.

On Sat, May 27, 2023 at 7:16 AM hydrusbeta @.***> wrote:

Hello Queenslight,

This error indicates that either:

An invalid file was somehow selected when you tried to generate audio

A certain metadata file got corrupted, or

There is a bug in the code's logic for handling the case where no file is selected.

To help narrow down the root cause, can you tell me which architecture you had selected when you encountered this error (e.g. Controllable TalkNet, so-vits-svc 3.0 or so-vits-svc 4.0)? Also, did you have an audio input file selected? If so, do you see this issue when any audio input file is selected or just a particular one?

— Reply to this email directly, view it on GitHub https://github.com/hydrusbeta/hay_say_ui/issues/1#issuecomment-1565417160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK5GXKRZEDHCYVAD2737K6TXIH5CTANCNFSM6AAAAAAYA2GM5E . You are receiving this because you authored the thread.Message ID: @.***>

queenslight commented 1 year ago

Time for another update!

With using a .wav file and testing on my Mac Mini M1 with Talknet and SV4. I now get the below error with Hay Say:

An error has occurred. Please send the software maintainers the following information (please review and remove any private info before sending!): Traceback (most recent call last): File "/root/hay_say/hay_say_ui/main.py", line 334, in generate hash_output = process(user_text, hash_preprocessed, selected_tab_object, relevant_inputs) File "/root/hay_say/hay_say_ui/main.py", line 526, in process send_payload(payload, host, port) File "/root/hay_say/hay_say_ui/main.py", line 554, in send_payload message = extract_message(response) File "/root/hay_say/hay_say_ui/main.py", line 559, in extract_message json_response = json.loads(response.read().decode('utf-8')) File "/usr/local/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm going to recreate a Windows 11 ARM machine for the meantime, and will let you know how that goes using Hay Say that way. At least it'll be a good work around till the Mac side of things gets fixed.

Keep up the fantastic work! I shall keep up with progress reports when needed.