Associating a group of symbols (or any symbols in a particular group) with one pronunciation under symbols.dic

nvaccessAuto commented 12 years ago

Reported by nvdakor on 2012-09-19 06:28 Hi, In symbols.dic for some languages, apart from using regex, is it possible (or would it be possible) to perform the following:

Create a group of symbols.
The group of these individual symbols would be given a single pronunciation.
Any individual symbol within this pronunciation group would be using this one pronunciation when spoken. This could be useful for tonal languages such as Korean and Vietnamese which associates one pronunciation for multiple individual symbols. This could also help with faster symbols processing, as a translator doesn't have to define same pronunciation (one per line) for any number of individual symbols. For example, in the current symbols.dic syntax: sym(tab)pronunciation(tab)punctuationLevel And suppose if we wish to assign the letters "a" and "b" to be pronounced as "A": a(tab)A(tab)none b(tab)A(tab)none Following the proposal above, we could say: {a,b}(tab)A(tab)none For providing punctuation levels for each individual symbols in the braces, I'd like to propose: {(A(tab)puncLevel),(b}(tab)A(tab)none With the priority given to puncLevel for symbols surrounded by parentheses. Thanks.

nvaccessAuto commented 12 years ago

Comment 1 by jteh (in reply to comment description) on 2012-09-19 08:29 Thanks for the suggestion. My thoughts:

Replying to nvdakor:

In symbols.dic for some languages, apart from using regex,

Why is regex a problem? It's not particularly difficult to do this with a regex. You just enclose the symbols in square brackets; e.g. ![abc]. The only disadvantage is that you can only have 90 or so complex symbols.

{a,b}(tab)A(tab)none

One problem with this is that we'd have to escape {. For example, to match just {, the user would have to do { to distinguish it from a grouping. This would break user symbol files. Also, users would not be able to configure the pronunciation of the individual symbols if they wanted to, although maybe this is intended.

This could also help with faster symbols processing, as a translator doesn't have to define same pronunciation (one per line) for any number of individual symbols.

Internally, it won't really be any faster. The individual symbols will still be treated the same way.

For providing punctuation levels for each individual symbols in the braces, I'd like to propose:

{(A(tab)puncLevel),(b}(tab)A(tab)none

Even if we implement symbol grouping, I think symbol grouping with levels adds complexity (both to the code and for translators) with no advantage. If the level is defined separately, other parameters might be defined separately too. There's definitely no speed advantage here.

To summarise:

This is a fairly significant change that will have to be carefully implemented to avoid breaking user symbol files.
There is little to no speed advantage.

Given the above, the big question is: do you think this is hugely necessary or was it just a nice idea?

bhavyashah commented 7 years ago

@josephsl As the original author of this ticket, could you please respond to the questions asked in @jcsteh's https://github.com/nvaccess/nvda/issues/2676#issuecomment-155300661?

Adriani90 commented 6 years ago

@josephsl, any updates regarding this issue? Does anyone work on it?

Adriani90 commented 8 months ago

Having this feature in the symbols.dic would really make things simpler and would reduce the complexity of the file. especially for mathematic alphanumeric characters, If I implement these in the symbols.dic (over 900 characters", there are multiple versions of letter a, b, c (doulbe struck, script, etc.) or multiple versions of numbers such as subscript, superscript etc. We don't need the full details of a character's name in the symbols.dic, so for example all multiple versions of the small letter a could be associated to the pronounciation "a" and so on.

Adriani90 commented 8 months ago

The full name of a character could then be retrieved with the help of an add-on from the Unicode databases etc. on demand (i.e. character information add-on which already exists).

Adriani90 commented 2 months ago

Now that Unicode normalization has made it into the core, this got much better for many symbols. However, I still think the capability to assign a common pronounciation among a group of symbols would be very useful. at least it would make the symbols.dic file much easier to read, especially when adding different versions of a mathematical symbol e.g.

TURNED SANS-SERIF CAPITAL L: ⅂
REVERSED SANS-SERIF CAPITAL L ⅃

All these should just be pronounced as L

Or maybe it is possible, at least for math, to also include these math extended symbols into the normalization algorythm? https://www.classe.cornell.edu/~dms79/LectureNotes/formulae/list-of-math-symbols-extended.htm

cc: @LeonarddeR, @SaschaCowley, @ABuffEr any ideas?

nvaccess / nvda

Associating a group of symbols (or any symbols in a particular group) with one pronunciation under symbols.dic #2676