nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.09k stars 630 forks source link

Content not part of the body (for example Aria tags) read non English words character by character #16124

Open cevangelougovcy opened 8 months ago

cevangelougovcy commented 8 months ago

Steps to reproduce:

  1. Open Chrome
  2. Browse to https://gov-cy.github.io/dsf-sample-page/ucd-accordion.html
  3. Use the tab button to navigate to the accordion headings.

Actual behavior:

NOTE: The lang attribute of the document is set to el for Greek. I have noticed the same behaviour also when trying to read the <title> of the page in Greek

Also this issue is linked with https://github.com/nvaccess/nvda/issues/8206. I have tried changing the synthesizer but with no effect.

Expected behavior:

NVDA should read Greek content as words instead of character by character

NVDA logs, crash dumps and other attachments:

System configuration

NVDA installed/portable/running from source:

Installed

NVDA version:

NVDA version 2023.3.1

Windows version:

Windows 11

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

No

If NVDA add-ons are disabled, is your problem still occurring?

No add-ons

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Am not sure

CyrilleB79 commented 8 months ago

Welcome @cevangelougovcy

Next time, to be able to reliably reproduce this issue, we need a code sample. provide a minimal HTML sample to reproduce this using codepen, or directly an html snippet.

Please also indicate the synthesizer(s) / voice(s) you are using and the corresponding result. If you are not using a Greek voice by default, be sure that your synthesizer has a Greek voice though and is able to switch language.

My assumption is that aria-labels do not honour language as demonstrated by the following HTML snippet used with eSpeak English voice: data:text/html,<body lang="it"><h2 aria-label="Cigliegie e ceci">Title 2</h2><p>Cigliegie e ceci</p></body>switching Wit the previous snippet, we can see that the paragraph's text uses Italian voice but the aria label of the title does not.

This issue seems related to #4396, if not even a duplicate. @Adriani90, what do you think?

cevangelougovcy commented 8 months ago

Thanks for the reply @CyrilleB79 . I haven't used codepen much but I understand that the it sets the preview template sets the lang to en by default.

The synthesizer I use is "Windows OneCore voices", but I have also tested it with "eSpeak NG", "Microsoft Speech API version 5" and acts the same.

I played a little bit with the Voice setting. The only one that read the Greek in aria-label correctly was "Microsoft Stefanos", which is the voice that takes over on Greek content in other cases. This voice though has a really bad accent in English and I would hate to force our users change their favourite settings.

FYI Apple's voiceover works as Expected and reads the Greek word correctly.

Adriani90 commented 8 months ago

Actually I couldn't reproduce this issue when using the german or romanian voice of eSpeak. However, the german One Core or Sapi 5 voices did not recognize these characters in my case at all. This is related or might be a duplicate of #9181 or #10488.

@michaelDCurran do you have any technical advice on how to conceptualize a solution? I buess it might be tricky to always force correct pronounciation due to so many differences in voice phonetics of synthesizers. Is it something NVDA can do here at all? I think even when supporting the lang atribute in HTML properly, it will still depend on how phonetics are implemented in the voice of the synthesizer by the synth vendors. Right?

My only suggestion is to add all the greek alphabeth to the symbols.dic with the correct phoneme of the unicode character (not the full unicode character name). The greek alphabeth has only one possible pronounciation per symbol, so it should not be a big problem. This has also been requested in #5194. In that case all voices of no matter what synthesizer will pronounce it as defined in the symbols. dictionary. I guess it is possible to build the phonemes ourselves with help of native speakers. There are also a lot of videos about how greek letters sound like. https://www.youtube.com/watch?v=VOSvqiaJN2c

CyrilleB79 commented 8 months ago

@cevangelougovcy I cannot reproduce on Firefox instead on my side. Do you confirm that the issue only occurs with Chrome (or probably any other Chromium navigator), not with Firefox?

Actually I couldn't reproduce this issue when using the german or romanian voice of eSpeak. However, the german One Core or Sapi 5 voices did not recognize these characters in my case at all.

My only suggestion is to add all the greek alphabeth to the symbols.dic with the correct phoneme of the unicode character (not the full unicode character name). The greek alphabeth has only one possible pronounciation per symbol, so it should not be a big problem.

That does not seem to be true, at least for modern Greek which seems to be the topic of the sample provided in this issue. For example, see Greek orthography Wikipedia page, paragraph "Digraphs and diphthongs".

More generally, you should forget the idea to make a synthesizer speak another language with a dictionary or a symbol file. That is not the job of the screen reader; that's the job of the synthesizer.

cevangelougovcy commented 8 months ago

@CyrilleB79 and @Adriani90 thank you for your replies. This issue is really important to us, we are trying to implement a Design System to be used by all Government websites and services in Cyprus and we would were really hoping we could offer a good screen reader experience to our users.

  • Did you use Chrome? The issue does not seem to be present in Firefox

I confirm that the issue with aria-label does not occur on Firefox. However, other Greek content that is not part of the <body> still behaves on Firefox as it does with Chrome. For example if the page's <title> includes Greek words, it is read in a character-by-character manner (I have updated the sample page to sees for your self).

  • Do you confirm that you have automatic language switching enabled?

Yes I confirm it and in general, for any other part of the body of the page, language switch seems to work properly

  • For OneCore or SAPI tests, do you confirm that these synthesizers are able to switch to Greek on your plateform? E.g. fore OneCore, you should have Stefanos installed, which seems the only Greek OneCore voice available.

Yes I confirm it, Stephanos is installed.

CyrilleB79 commented 8 months ago

I confirm that the issue with aria-label does not occur on Firefox. However, other Greek content that is not part of the <body> still behaves on Firefox as it does with Chrome. For example if the page's <title> includes Greek words, it is read in a character-by-character manner (I have updated the sample page to sees for your self).

How do you have the title read? If you use NVDA+T command to have the title read, it just reads the title of Firefox window. In this case, you are out of the document. The title of the window is not tagged with a specific language. How it is pronounced will just depend on the voice that you have selected. E.g. if you use Microsoft OneCore Stefanos or eSpeak Greek, it should be OK; if you use an eSpeak English voice or an English OneCore voice (e.g. Zira, David) they are not adapted to read Greek.

  • Do you confirm that you have automatic language switching enabled?

Yes I confirm it and in general, for any other part of the body of the page, language switch seems to work properly

  • For OneCore or SAPI tests, do you confirm that these synthesizers are able to switch to Greek on your plateform? E.g. fore OneCore, you should have Stefanos installed, which seems the only Greek OneCore voice available.

Yes I confirm it, Stephanos is installed.

cevangelougovcy commented 8 months ago

How do you have the title read? If you use NVDA+T command to have the title read, it just reads the title of Firefox window. In this case, you are out of the document. The title of the window is not tagged with a specific language. How it is pronounced will just depend on the voice that you have selected. E.g. if you use Microsoft OneCore Stefanos or eSpeak Greek, it should be OK; if you use an eSpeak English voice or an English OneCore voice (e.g. Zira, David) they are not adapted to read Greek.

I don't use NVDA+T command. I just let NVDA read the title when navigating with the browser. For example if I click on https://gov-cy.github.io/dsf-sample-page/ucd-accordion.html, NVDA starts by reading the title of the document.

I have noticed that the same issue occurs with <label> (Both on Firefox and Chrome).

If this never came up before, or if you cannot replicate it, it might have something to do with my synthesizer setting. I am not sure how to share these settings, but here is a screenshot of my settings.

Details below

Synthesizer: Windows OneCode voices Voice: Microsoft Zira Automatic language switching: True Automatic dialect switching: False