nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
Other
2.08k stars 626 forks source link

builtin.dic breaks a synth's ability to handle plural acronyms #11472

Open ultrasound1372 opened 4 years ago

ultrasound1372 commented 4 years ago

the problem:

Examine the output after NVDA's built-in processing of a string like "I have all their CDs" or similar. The point I'm making here is a string that has an all-caps acronym followed by 's' to indicate plurality. Some synthesizers are able to pronounce this properly, making it sound as if you had put an apostrophe there. This behavior causes many blind people when writing to use an apostrophe s at the end of an acronym to indicate plurality, however this is not proper writing. I would test all the synthesizers I have's ability to handle this but I don't know of an easy way to disable NVDA's built-in processing, I could never find a setting for it.

the proposal:

I am suggesting a minor alteration to builtin.dic, specifically the expression that breaks away words starting with a capital from a fully uppercase word. The second lowercase letter should be anything but s.
Specifically, this is the regex modification I'm proposing on line 4 of builtin.dic

([A-Z])([A-Z][a-rt-z])

Blocking questions:

NVDA does support many languages and I don't know the syntax of all of them. are there any languages where words with the second letter being 's' would be a problem here? How many other languages that use the latin script would indicate a plural acronym in this way? If this does pose a problem for languages that end up with s as the second letter, what NVDA usable synthesizers will handle this gracefully if it were to behave in the way I propose?

misc questions:

Does default/voice/temporary dictionary processing occur before NVDA's builtin.dic processing? If so and this is not implemented for whatever reason, a simple regex in the speech dictionary would be to add an apostrophe manually, which would actually make more synthesizers behave this way. I would not propose adding an apostrophe in builtin.dic.

amirsol81 commented 4 years ago

I second this. I've seen it quite frequently with terms like CEOs, FAQs, UFOs, CFOs, GMOs, and GUIs.

Mohamed00 commented 4 years ago

I mentioned something similar in #11368, but I'll close that in favor of this issue, since it has more detail. The dictionary can be disabled with this code. import globalVars globalVars.speechDictionaryProcessing=False

CyrilleB79 commented 4 years ago

In French, we should normally write "CD" for singular as well as plural form and adding a 's' at the end of abbreviations is not the recommended way for the plural form. Anyway the English way to do seems to becoming more and more frequent despite being incorrect. So I would recommend to have it announced in French too for smoother reading.

Note also that modifying this rule may have some incidence on expressions such as "XIXe siècle", i.e. "19th century" usually written with roman number in French. It seems that some synth dictionaries have partially worked around the issue caused by NVDA's mixed case word rule.

ultrasound1372 commented 4 years ago

Also the case that @Mohamed00 made about names like McAdam and others is another problem this causes, but the fix would be specifically an English one and that dictionary is applied to all languages, and I'm not sure how the regex would look for that. So perhaps we need another solution than just adding s as an exception to mixed case processing. Knowing the order of processing would be very helpful to craft dictionary entries that can work around this were this not added in core, but I'd have to deliberately break the string in some other way if they were processed first.