w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
59 stars 18 forks source link

List of core intents #432

Open NSoiffer opened 1 year ago

NSoiffer commented 1 year ago

The WG has discussed what needs to be part of core for intent (i.e., those intents that AT should know about and potentially have "special" speech for. The "special" speech might be differing ways of speaking the intent based on:

@dginev created a speadsheet with lots of options. This issue is here for people to discuss what should be part of core and also to provide their own lists (as per the WG meeting today).

dginev commented 1 year ago

This comment tracks the newly contributed collection of 361 "Symbol speech strings" and 128 "Function speech strings" used in OfficeMath:

https://devblogs.microsoft.com/math-in-office/math-speech-strings-and-localization/

Murray Sargent shared their list via the www-math mailing list yesterday (archive)

davidfarmer commented 1 year ago

I have compiled a list of common expressions in K-14 math which have more than one meaning/pronunciation. Approximately 55 entries.

When the K-14 meaning is already ambiguous, in a couple cases I also added some possibilities from more advanced math.

https://docs.google.com/spreadsheets/d/1PhjYFEz3PhRTsE5U4RiPH84pnTTTwilh0wx-ujnprqg/

brucemiller commented 1 year ago

How many of these symbols actually need an entry in the core list (or would only if needed for translation to other languages)? For example, many of the first cases could be addressed by <mo intent="times">*</mo> (or "cross", "by", "dot", "convolved-with", etc). That is they don't need any special treatment by the AT, other than pasting the words together.

Of the ones that do need special treatment, what would be good keywords for the core dictionary?

NSoiffer commented 1 year ago

I hope we all agree that AT should translate Unicode chars to the appropriate language. These are a fixed "known" set of characters and so can be translated. The defaults should be based on what the Unicode spec describes the characters as, although the words used probably need to be modified. For example “×” (U+00D7) is listed as "Multiplication sign". AT should be free to say "multiples", "times", etc. But if someone wants some other meaning/speech such as "cross", they need to provide that.

One way to provide that is as @brucemiller suggests: add it to the mo. The other way, which @davidfarmer favors(?) is to put it on an mrow. E.g (using the "star" entry in his table).,

<mrow intent="star@infix($arg1, $arg2)">
   <mi arg="arg1">a</mi>
   <mo>*</mo>
   <mi arg="arg2">b</mi>
</mrow.

Is one method preferable to the other? I can imagine that a \star{a}{b} macro would easily be able to do either. The mrow version has the advantage of representing the semantics more clearly, but intent is not about semantics.

Should the spec suggest one way is preferable?

davidfarmer commented 1 year ago

My interpretation of @davidcarlisle 's suggestion last Thursday is that we mock up several end-to-end intents.

That is, propose what is the preferred MathML, what is acceptable MathML, and how is intent indicated in each case. I think that is better than discussing just one or two examples in isolation.

Many of the examples involve an "a" and a "b", but (in addition to other words in the voicing) it is not always true that the "a" and "b" are spoken in the order they appear in the markup. Having a variety of full examples to look at may make things more clear.

It is not always possible to have the intent on the mo. It may always be possible to put it on the mrow. (I am not claiming that argument should be decisive.)

NSoiffer commented 1 year ago

It is not always possible to have the intent on the mo

Yes, true for some bracketing notations such as "absolute value". Also true for 2D notations such as "transpose". But as you say, it's not clear that means intent should always be on the mrow when it is possible to put it on an mo

brucemiller commented 1 year ago

One way to provide that is as @brucemiller https://github.com/brucemiller suggests: add it to the |mo|. The other way, which @davidfarmer https://github.com/davidfarmer favors(?) is to put it on an |mrow|. E.g (using the "star" entry in his table).,

|<mrow @.***($arg1, $arg2)"> a

* b

I wasn't suggesting that one was preferable to the other: both are valid and should be equivalent. My main point was that either <mo intent="star">... or <mrow ***@***.***($a,$b)">... should yield the speech "a star b", without "star" needing to be in any dictionary.

davidcarlisle commented 1 year ago

@davidfarmer's table at https://github.com/w3c/mathml/issues/432#issuecomment-1419361000 now has some possble mathml with and without intent. It may be interesting to see how many case the intent improves the default. Cases where it is just making explicit the default behaviour could perhaps be dropped from a minimal list.

see https://texlive.net/david/intent.html

The current draft uses 96 names (the actual names are not so important, and can be changed but it gives an indication of the number of names needed for @davidfarmer 's list of notations.

Delta
O
a-dagger
a-star
absolute-value
absolutely-continuous
adjoin
adjoint
antiparticle
asymptotic
augmented
boundary
bra-ket
cardinality
center-of-mass
choose
closed-interval
closed-open-interval
closure
commutator
complement
conjugate
connected-sum
convolution
coordinate
cross-product
cycle
dimension-over
discriminant
distribution
divided-by
divides
dual
empty-set
equivalent
evaluated-at
exactly-divides
floor
function-from-to
gcd
given
group-generated 
group-generated-with
hermitian-conjugate
ideal-generated 
index-of
inner-product
inverse
isomorphic
jacobi
kronecker
lcm
legendre
lie-bracket
limit-from-left
limit-from-right
line
magnitude
mean
member
much-greater
much-less
multiplicative-subgroup
negation
norm
normal-distribution
o
open-closed-interval
open-interval
or
pair
parallel
partial-derivatve
permutations
point-at
power
probablity
ratio
repeated
restricted-to
root
sequence
set
similar
span
subset
subset-or-equal
such-that
superset
superset-or-equal
times
to
transpose
unary-minus
units
vector
davidfarmer commented 1 year ago

I have started a new list of proposed core intents.

Some more advanced examples were omitted because they are self-voicing. Some were not ambiguous (but we still hope that AT says them correctly). Some maybe can be handled in other ways.

I may have missed some that others can add.

https://docs.google.com/spreadsheets/d/1cLPaIy9kX5K-67RG6rjSAXErDSB-_iYmgZaTKQjShVg/edit#gid=0

davidcarlisle commented 1 year ago

@davidfarmer a google doc is a bit hard to work with as the entries are unstructured. I think most of your cases are in a list I made which is showing generated speech for intent both in the function@hint style of the current spec (column 3, green) and using a new mathcat implementation of the template with funcall style of issue 446 (column 4, blue)

https://mathml-refresh.github.io/intent-lists/intent4.html

eg {}^{\mathrm{T}}x is shown at https://mathml-refresh.github.io/intent-lists/intent4.html#IDxtransposepre-sup

Also you might want to comment on the indexed exampes such aas $H^2$ https://mathml-refresh.github.io/intent-lists/intent4.html#ID2ndCohomology

If there are examples you think are missing let me know and I'll add.

I'll try to add some possible variants for your Ackerman example during the day

davidcarlisle commented 1 year ago

possible A(m,n) markup added at https://mathml-refresh.github.io/intent-lists/intent4.html#IDAckermanorA

davidfarmer commented 1 year ago

The purpose of my new intents spreadsheet is to work toward a decision on what will be in core. I thought that a fresh list which basically only had the proposed elements would be helpful.

I don't think any of the other ways of saying "Ackerman" carry the information that AT in a terse mode should ignore the intent and just use the content of the "mi".

On Thu, 16 Mar 2023, David Carlisle wrote:

possible A(m,n) markup added at https://mathml-refresh.github.io/intent-lists/intent4.html#IDAckermanorA

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AABTULDWKAQZDVENL5YKGYTW4LVETA5CNFSM6AAAAAAUIBKT5OWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSXXCB32.gif ] Message ID: @.***>

davidcarlisle commented 1 year ago

The purpose of my new intents spreadsheet is to work toward a decision on what will be in core. I thought that a fresh list which basically only had the proposed elements would be helpful.

Yes sure, but as one criterion  is (or seems to be) that things shouldn't  be in core if they would get the desired reading by default, it seems to be useful to try the examples. The first two doc spreadsheets were sufficiently structured it made sense to semi-automatically derive the examples. All I meant was that this one is more of a discussion list which is fine, but I think any examples from there that people want to see "in action" I will copy by hand to list4.

I don't think any of the other ways of saying "Ackerman" carry the information that AT in a terse mode should ignore the intent and just use the content of the "mi".

I am not sure I agree here.  I don't see anything special about Ackerman. More or less every symbol  you sometimes say its meaning and sometimes just read the notation. You might read ${}_1F_2$ as "hypergeometric function" or you might get tired of saying that and read it as "1-F-2" . If there is an existing global option to read the notation, do we need that on every identifier? If we do need it then it has to fit in to the general scheme. core funtions are names for which the system should have speech rules, not a different syntactic class, so currently

<mi intent='named-function{Ackerman)'>A</mi>

being in core would mean the system should have a rule for pronoucing named-function given argument Ackerman, it should make the same for

<mi intent='named-function{Ackerman)'>B</mi>

We could change the rules so the element content is part of the input but what rules are you proposing?

If intent is being used and the document has <mo intent='rabbit'>+</mo> can the system choose to read that as plus, or does the intent mask the element content ?

davidfarmer commented 1 year ago

"If intent is being used and the document has + can the system choose to read that as plus, or does the intent mask the element content ?"

I think that is the key question, which the group should discuss.

My impression was that AT would always say the intent and never the actual content. That is why I am proposing a special type of intent that specifically indicates to AT that it is okay to skip the intent and just say the content.

The general question: how is AT supposed to know when it is okay to ignore intent and just say what is there?

dginev commented 1 year ago

Two notes from me:

davidcarlisle commented 1 year ago

I agree chemistry will need a concept list, and hopefully some at least semi formal way of switching in that list via <math intent=":chemistry"> or similar, but I can't agree it would be good to include double bond in a core math concept list. lots of math layout has different meaning and reading for chemistry, why single out = as double bond?

dginev commented 1 year ago
  1. K12 STEM generally covers (and I may be missing some)

    • biology,
    • chemistry,
    • computer science,
    • engineering,
    • earth sciences (geology, geography),
    • mathematics,
    • physics (astronomy, Newtonian mechanics, ...)
    • (?) economics - which would e.g. motivate currency notations

    Are you suggesting each of these 7-8 areas gets a separate Core list? I assumed a single list can carry all of them, maybe with some visual marker separating them, but not necessarily. So no "singling out" of one concept over another - my previous comment just anchored two examples that have become clearer to me.

  2. As to intent=":chemistry", I hoped the suggestion for "subject area annotation" was put to rest after the discussions in #93 and #426, but we can rehash it. It is more reliable to mark up a known kind of object, as in intent=":chemical-formula" or even a more specific kind of chemical formula, so that AT has finer grasp of what needs to be guessed inside the expression.

    That shines especially in cases such as intent=":si-unit", which are not clearly bound to one field (or we end up e.g. assuming all of :physics). It's a curious question whether there is a need for a "Core list of properties", but that may benefit from a new github issue, to tease it apart.

davidcarlisle commented 1 year ago

there is a big difference between physics and engineering using mathematics, and chemistry using a layout sufficiently similar to math that a math typesetter can be used. Just including double bond in isolation would be weird to say the least. Chemists can't use that without some basic collection of chemical concepts and it's unrelated to mathematics.

NSoiffer commented 1 year ago

Having talked to the Chemistry CG a few times, we came to the conclusion that there are three basics things to mark up which I think should be properties:

Note that in a chemical-equation, "=" means "equals", but things like "[...]" mean concentration of, etc. This is different from a chemical-formula where the = (or ::) means double bond and brackets are just grouping symbols.

The Chemistry CG also identified some specialized areas chemistry that have conflicting notations (I don't remember what they were), but I think that is probably best resolved via intent as these are nowhere near as widely used and (although my memory may be faulty), were much more limited in meaning (i.e., they could be handled by notating a leaf).

I think there is some analogy between these (sort of) inherited properties and the table properties we have discussed. So I don't see this as introducing a new concept.

polx commented 1 year ago

A first fork is here:

dginev commented 1 year ago

As part of seeking more data/hints towards a "useful Core list", I completed a first case study today, reading through a full K-12 STEM book, which I picked at random through a search in my local (US) library. That took a 6-hour reading block to complete, so it should be possible for me to cover another few books in the near future.

My notes may be a bit on the brief side this time, but here is a list of concept names (and properties) that MAY have been beneficial as intent values in some equations in the book: https://gist.github.com/dginev/825078ae316c32c312436f42061b3d05

I found 6 cases that I had no clear answers for, marked with ?. The good news is that my final list is about 150 entries long while being generous, and almost all of the entries have already been discussed or recorded in the WG's work on Intent.

That could be an early indication that a Core of that rough size should already be "useful" for some classes of materials (math+physics).

dginev commented 1 year ago

Update: I have now also completed an intent-oriented review of a K-12 level chemistry refresher (targeted at the curriculum of New York state). I kept my notes brief and exploratory, as with my previous "mathematics for physics students" review.

The chemistry review is here: https://gist.github.com/dginev/ff7e6e090b79a0389fc2eff2b9961331

There was almost complete overlap on the ~30 math-near intents. I recorded just about 50 chemistry-related intents, not counting units (~50, depending how one counts) and mentioned chemical elements (50+). About 10 different structures may benefit from a dedicated :property, 6 of which chemistry-specific.

I am probably about a couple of book reviews away from being able to do synthesis, compressing the different lists into PRs for Core. But that is just one possible course of action. I'd ideally like to collect more notations in K-12 engineering, biology, geology and computer science.