Create doc listing all the important char with possible speech for the character.

NSoiffer commented 6 months ago

This is start of an agreed to resolution of #480.

Because the experience for AT users of not hearing anything other than possibly a hex number for a character is really poor, we agreed the Math WG should come up large list that maps a very inclusive set of characters that might be used for speech to some potential speech. This speech is not a concept name and so does not belong in the concept lists. For example ↻ (U+21BB) might have the listed potential speech "clockwise open circle arrow". Often, these names are based on the Unicode description.

The path forward is:

Neil will verify his MathCAT list of 4000+ characters includes all the Unicode chars with math properties
Neil will pass that on to @davidcarlisle for adding to his unicode.xml list (used for XML Entities rec).
David will then produce a draft W3C note or some other document for reference by AT vendors.

davidcarlisle commented 5 months ago

The initial idea was to add to unicode.xml but there are several conditional templates which don't really fit the existing unicode.xml style and it seems more in the spirit of the intent concept and property lists to manage the source data as YAML, which is also the format of the MathCAT list.

A slightly modified version of the MathCat list

https://github.com/NSoiffer/MathCAT/blob/main/Rules/Languages/en/unicode-full.yaml

has been added to mathml-docs as

https://github.com/w3c/mathml-docs/blob/main/_data/unicode-speech.yml

With a github pages rendering

https://w3c.github.io/mathml-docs/unicode-speech/

There are some minor restructuring of the YAML to aid rendering in jekyll but the substantive changes are:

All Private Use Area characters dropped.
the pitch: field has been dropped (only occured twice after dropping PUA characters)

Some of the nested conditions are not fully handled by the github template (and show the raw object data as ... => ... ) It may make more sense to simplify the conditions

davidcarlisle commented 5 months ago

Possible changes:

drop the CJK Compatibillity block which starts at https://w3c.github.io/mathml-docs/unicode-speech/#U3371
re-order to be in Unicode order
Break up the math alphabetic ranges not to incllude the "holes" for pre-existing base plane characters
drop some more fields such as audio:
simplify some of spell/translate markup
simplify some of the pseudo xpath conditions (especially when they query custom MathCat elements rather than MathML)
Document the external parameters used eg "$SpeechStyle != 'ClearSpeak' (and probably don't use so many if they are more closely tied to the mathcat implementation)

davidcarlisle commented 5 months ago

@NSoiffer I checked in a second version with much simpler handling of conditional textx, all nested tests and xpath and other tests are replaced by (aribitrarily named) states so the yaml still records all possible suggested speech strings but the detailed mechanism to choose between them is left to implementations.

so

    - test:
        if: ancestor::m:modified-variable and preceding-sibling::*[1][self::m:mi]
        then:
          - t: bar
        else:
          - t: line

becomes

    - choose:
        - modified-variable: bar
        - default: line

and

    - test:
        if: $SpeechStyle != 'ClearSpeak' or $ClearSpeak_MultSymbolDot = 'Auto'
        then:
          - t: times
        else:
          - t: dot

becomes

    - choose:
        - dot-times: times
        - default: dot

and

    - test:
        if: $SpeechStyle != 'ClearSpeak'
        then:
          - t: an element of
        else_test:
          if: ../../self::m:set or ../../../self::m:set
          then_test:
            - if: $ClearSpeak_SetMemberSymbol = 'Auto' or $ClearSpeak_SetMemberSymbol = 'In'
              then:
                - t: in
            - else_if: $ClearSpeak_SetMemberSymbol = 'Member'
              then:
                - t: member of
            - else_if: $ClearSpeak_SetMemberSymbol = 'Element'
              then:
                - t: element of
            - else:
                - t: belonging to
          else_test:
            - if: $ClearSpeak_SetMemberSymbol = 'Auto' or $ClearSpeak_SetMemberSymbol =
                'Member'
              then:
                - t: is a member of
            - else_if: $ClearSpeak_SetMemberSymbol = 'Element'
              then:
                - t: is an element of
            - else_if: $ClearSpeak_SetMemberSymbol = 'In'
              then:
                - t: is in
            - else:
                - t: belongs to

becomes

    - choose:
        - element-member-verbose: is a member of
        - element-member: member of
        - element-belonging: belonging to
        - element-belongs: belongs to
        - element-in-verbose: is in
        - element-in: in
        - element-verbose: is an element of
        - default: an element of

YAML

https://github.com/w3c/mathml-docs/blob/main/_data/unicode-speech2.yml

HTML rendering

https://w3c.github.io/mathml-docs/unicode-speech/index2.html

Currently named as ...2 to allow side by side comparison in gh-pages view.

NSoiffer commented 5 months ago

I like those changes, and also using "map". It is much more readable. Let's discuss this with the rest of the group at the start of the meeting on Thursday.

davidcarlisle commented 5 months ago

the index2 version has now been implemented as

https://w3c.github.io/mathml-docs/unicode-speech

and teh above URL is no longer active. The exact list of characters can be edited as can their speech strings but the basic mechanism is in place with a public document, so closing here.

w3c / mathml

Create doc listing all the important char with possible speech for the character. #495