nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.12k stars 637 forks source link

Associating a group of symbols (or any symbols in a particular group) with one pronunciation under symbols.dic #2676

Open nvaccessAuto opened 12 years ago

nvaccessAuto commented 12 years ago

Reported by nvdakor on 2012-09-19 06:28 Hi, In symbols.dic for some languages, apart from using regex, is it possible (or would it be possible) to perform the following:

nvaccessAuto commented 12 years ago

Comment 1 by jteh (in reply to comment description) on 2012-09-19 08:29 Thanks for the suggestion. My thoughts:

Replying to nvdakor:

In symbols.dic for some languages, apart from using regex,

Why is regex a problem? It's not particularly difficult to do this with a regex. You just enclose the symbols in square brackets; e.g. ![abc]. The only disadvantage is that you can only have 90 or so complex symbols.

{a,b}(tab)A(tab)none

One problem with this is that we'd have to escape {. For example, to match just {, the user would have to do { to distinguish it from a grouping. This would break user symbol files. Also, users would not be able to configure the pronunciation of the individual symbols if they wanted to, although maybe this is intended.

This could also help with faster symbols processing, as a translator doesn't have to define same pronunciation (one per line) for any number of individual symbols.

Internally, it won't really be any faster. The individual symbols will still be treated the same way.

For providing punctuation levels for each individual symbols in the braces, I'd like to propose:

{(A(tab)puncLevel),(b}(tab)A(tab)none

Even if we implement symbol grouping, I think symbol grouping with levels adds complexity (both to the code and for translators) with no advantage. If the level is defined separately, other parameters might be defined separately too. There's definitely no speed advantage here.

To summarise:

Given the above, the big question is: do you think this is hugely necessary or was it just a nice idea?

bhavyashah commented 7 years ago

@josephsl As the original author of this ticket, could you please respond to the questions asked in @jcsteh's https://github.com/nvaccess/nvda/issues/2676#issuecomment-155300661?

Adriani90 commented 6 years ago

@josephsl, any updates regarding this issue? Does anyone work on it?

Adriani90 commented 8 months ago

Having this feature in the symbols.dic would really make things simpler and would reduce the complexity of the file. especially for mathematic alphanumeric characters, If I implement these in the symbols.dic (over 900 characters", there are multiple versions of letter a, b, c (doulbe struck, script, etc.) or multiple versions of numbers such as subscript, superscript etc. We don't need the full details of a character's name in the symbols.dic, so for example all multiple versions of the small letter a could be associated to the pronounciation "a" and so on.

Adriani90 commented 8 months ago

The full name of a character could then be retrieved with the help of an add-on from the Unicode databases etc. on demand (i.e. character information add-on which already exists).

Adriani90 commented 2 months ago

Now that Unicode normalization has made it into the core, this got much better for many symbols. However, I still think the capability to assign a common pronounciation among a group of symbols would be very useful. at least it would make the symbols.dic file much easier to read, especially when adding different versions of a mathematical symbol e.g.

All these should just be pronounced as L

Or maybe it is possible, at least for math, to also include these math extended symbols into the normalization algorythm? https://www.classe.cornell.edu/~dms79/LectureNotes/formulae/list-of-math-symbols-extended.htm

cc: @LeonarddeR, @SaschaCowley, @ABuffEr any ideas?