Closed polx closed 6 months ago
Some extra related context: We have currently hidden a number of zero-argument concepts outside of Core, while adding Core properties to represent them.
Notably:
:unit
which likely implies 20+ concepts such as second
, meter
, volt
, watt
, ampere
...:chemical-element
which likely implies 100+ element names such as helium
, oxygen
, tin
, antimony
, ...As Paul mentioned, we also have a (currently uncounted) selection of concepts in the "self-voicing Unicode" category, such as planck-constant
ℎ (U+210E), degree-celsius
℃ (U+2103), ohm-sign
Ω (U+2126), end-of-proof
∎ (U+220E), n-ary-product
∏ (U+220F) ...
I think we may not have yet been sufficiently clear if some of these are "outside of Core" (= Open concepts), or "skipped from the Core list" (= Core concepts, which are only skipped for visual brevity). Ideally this issue also clears up that question.
I think the main practical difference for a concept being Core for this discussion is whether AT support is expected as a minimal requirement. And what does such support entail for concepts that match a simple literal reading (e.g. radius
and _radius
or second
and _second
). Is translation the key feature which is enabled? E.g. second
is clearly to be translated as the unit, while _second
can be an ordinal (first, second, third...)
Edit: apologies, misclicked.
I freely admit my feeling on what is based on my implementation design that separates out speech for Unicode chars from speech for notations. However I think that it is a fairly common design. I know that @MurrayIII had that separation in his math editor.
What I've found is that my Unicode implementation needs to be more complete than my notation rules. That's because the notation rules fall back to reading the underlying syntax or in a bad cases, capture more than they should. An example of the later is saying "power" when something is just a superscript. Even when misreading something as a power, it can be understood. For example "x to the star power" will make someone stop and think "what!?". But they will understand and move on. However, not having a name for a Unicode character makes the speech extremely hard to understand. For example "Unicode 2 5 a b, A B C" is next to useless.
On the other hand, in practice, very few characters are used up through Calculus. In a paper I wrote, I found that 50 non-keyboard character (non-ASCII is a good approximation) covered 99.95% of the characters that were used. Still, encountering one of those 0.05% characters is a very poor experience.
A few years ago, @davidfarmer sent me more textbooks to analyze, but I haven't found the time to do that yet.
In the absence of more analysis, I would encourage a more complete listing of characters to include in core rather than a more restrictive listing. Both creating the list and implementing the speech is much simpler than documenting and implementing a notation that should be handled. It also means authoring tools can mostly worry about intents on characters when they have a special way of being spoken or are highly ambiguous (|
comes to mind). My guess is that not counting alphabets and script variants, 200 - 300 characters would be sufficient to cover >99.999% of all characters encountered. But lacking data, that's just a guess.
At the May 16 meeting, we agreed that we should create a list separate from the core-concept list, that lists all the characters that either have the (Unicode) math property or otherwise have some reason they might show up. This list should include some suggested speech names for the characters. Potentially there will be fields for different languages.
Neil will verify his MathCAT list of 4000+ characters includes all the Unicode chars with math properties and then pass that on to @davidcarlisle for adding to his unicode.xml list (used for XML Entities rec). From that, he will then produce a draft W3C note or some other document for reference by AT vendors.
Lots of atomic symbols are being considered within the explorations towards the core list of intents. We need to find arguments for or against including them.
For single characters whose pronunciation is equivalent to the unicode name, there is agreement that there is no need to include it in the core list.
For symbols which carry a conceptual value it is not clear what could be the advantage of including them as intent-property or as intent as opposed to, say, let the author (’s producing system) output an explicit intent name.
There could be value into adding properties of typical letters which, through their usage, refer to a common concept. Radius, Volume, Area, Angle- or Segment-length values or similar such concepts could be defined in the core list. Would that bring a better speak-aloud? Would a perplex user ask for a more verbose speakaloud when lost and there would kick our more verbose name?
It appears that there are textbooks and reading environments where pronouncing the formula of the area inside a circle A = π∙r² is done using the complete detailed words area is equal to pi times radius squared. How could this be operationalised without making all appearances of A or r being spoken as radius or area?