davidfarmer commented 1 year ago

Proposal for a functional grammar for intent

This seems workable on the examples we have discussed, with proper markup (meaning judicious use of mrows) and recognizing that micromanaging the pronunciation often makes things worse.

Challenging cases welcome, of course! In particular, examples that were headless or leading underscore under other proposals.

{comments in curly brackets}

knownintent       = we have to decide on the list {e.g., 'absolute-value', "superscript", "number", ...}
reflist           = '(' '$' identifier (',' '$' identifier)* ')'
namereflist       = '(' names (',' '$' identifier)* ')'
identifier        = letter+
letter            = we have to decide what is a "letter"
name              = letter+ (('-'|\s) letter+)*   {note: spaces are allowed.  Not a deal-breaker.}
names             = name ('|' name)*
numbervalue       = we have to decide if . and , are both allowed, and minus sign { why not? }
type              = 'named' | 'adhoc' | 'value' { "isa" is sort-of a type, but is treated differently }
category          = 'function' | 'group' | 'number' | 'operation' | 'system-of-equations' | ...  {several more things}

intent            = (knownintent reflist?) |
                    (isa ":" category) |
                    (type ":" category namereflist)
                    (type ":number" "(" numbervalue ")")

Example of knownintent

<mrow intent="absolute-value($x)"><mo>(</mo><mi arg="x">A</mi><mo>)<mo></mrow>

Example of isa

<mrow intent="isa:system-of-equations">ABC</mrow> tells AT that ABC is a system of equations. Similarly for isa:matrix and isa:cases.

Things like isa:operation or isa:group probably have no effect initially.

Examples of named

<mi intent="named:extension(free algebra)">R</mi>

<mi intent="named:function(Bessel function of the first kind|Bessel-J)">J</mi>

The "named" type is used to indicate that the item has an existing name. The "|" separate different names, with the more verbose coming first. (We can consider omitting the option to have multiple names.)

The "named" type tells AT that it can use the literal value if desired.

Examples of adhoc

<mo intent='adhoc:operation(foo)'>⊞</mo>

(that symbol is a plus in a box)

The "adhoc" type is used to indicate that the author is making up the name, or that the name is nonstandard. There should not be "|" alternatives for an adhoc name.

The "adhoc" type tells AT that it can use the literal value if desired.

Examples of value

<mo intent='value:operation(times)'>*</mo>

The "value" type is used to indicate that AT should use that value instead of the literal content. There should not be "|" alternatives for a value name. (value is implict for knownintent)

Some special cases

The "superscript" core intent is used to have the correct pronunciation for things like

<msup><mi>H</mi><mn intent="superscript">2</mn></msup>

While it is true that (in the context of (co)homology) a person would pronounce H^2 as "H 2", they also would pronounce H_2 the same way. The superscript intent tells AT that the 2 is just a superscript/index, not a power, so it will probably say "H sup 2", which is better.

The "number" intent (is it too confusing to have it as both an intent and a category?) covers cases which were mentioned on a call, such as:

<mrow intent="number(3.14)"><mn color="red">3</mn><mo>.</mo><mn color="blue">14</mn></mrow>

In this, and many examples, it is necessary to have suitable mrows in order to fit the proposed intent grammar. (This allows keeping numbers out of the arguments of intent, except inside the number intent or number category.)

Some special features

In many cases, at least with the initial implementations, the "category" is ignored and

intent="named:X(foo)" is probably pronounced "foo", no matter the category X.

In some cases the "category" can be a useful signal to AT. For example, if the category is "function" then AT can know to say "of" before the reference.

Otherwise, the AT just says the name and the references in order.

davidcarlisle commented 1 year ago

In this, and many examples, it is necessary to have suitable mrows in order to fit the proposed intent grammar. (This allows keeping numbers out of the arguments of intent, except inside the number intent.)

in all the previous versions, the places where you end up with longish compound intents are places where you can't easily add mrows.

eg something like this with and x=1.00 \\ y=10.50 where numbers are split to force decimal alignment but you can re-constitute in intent so something like intent="$op($x,1.00)" would currently be allowed.

This may not be a great example as if decimal alignment worked you woudn't have to split the number, but you may want that for coloring or other reasons as you show in your mrow, but in a table row you can not group subterms.

<mtable>
 <mtr>
  <mtd><mi intent="x">x</mi></mtd>
  <mtd><mo>=</mo></mtd>
  <mtd><mn>1</mn></mtd>
  <mtd><mn>.00</mn></mtd>
 </mtr>
 <mtr>
  <mtd><mi>y</mi></mtd>
  <mtd><mo>=</mo></mtd>
  <mtd><mn>10</mn></mtd>
  <mtd><mn>.50</mn></mtd>
 </mtr>
</mtable>

davidcarlisle commented 1 year ago

I think

type ":" category ":" namereflist

should be

type ":" category namereflist

with just one : to match the example

named:function(Bessel function of the first kind|Bessel-J)

brucemiller commented 1 year ago

Actually it looks like the example was intended to match type ":" category names, rather than namereflist, but I'm not sure.names` doesn't seem to be used anywhere.

It also seems as if only references can be used as arguments to functions? (I'm kinda lost)

davidfarmer commented 1 year ago

Corrected.

davidfarmer commented 1 year ago

I corrected another typo: names now occurs on the 3rd line of the grammar.

And yes, I am proposing that, other than the first "names" entry of the namereflist, only identifiers occur as arguments.

This forces there to be a nice structure on the markup, so you can refer by identifier.

davidcarlisle commented 1 year ago

namesreflist has , not ( before the first ref, is that intentional?

intent="named:function(Bessel function of the first kind|Bessel-J, $x)"

I would have expected

intent="named:function(Bessel function of the first kind|Bessel-J)($x)"

davidcarlisle commented 1 year ago

there does not appear to be any equivalent of @infix ?

eg

 <mmultiscripts intent='choose@infix($n,$k)'>
  <mi>C</mi>
  <mi arg='k'>k</mi>
  <mrow/>
  <mprescripts/>
  <mrow/>
  <mi arg='n'>n</mi>
 </mmultiscripts>

from list4

(assume choose is not in the known list)

davidfarmer commented 1 year ago

I think namereflist is described correctly. I was trying to say:

(zeta function) (zeta function,$bcd) (zeta function,$x,$y)

I intended the example as:

ζ⁡(z) Not sure when I would want to put intent="named:function(Riemann zeta-function,$x)" The markup already says it is a function and what its argument is. So only the name/pronunciation of the function is in doubt. On Fri, 17 Mar 2023, David Carlisle wrote: > > namesreflist has ,not(` before the first ref, is that intentional? > > intent="named:function(Bessel function of the first kind|Bessel-J, $x)" > > — > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you authored thethread.[AABTULGUVRR46GOCSPF4WM3W4TQ4NA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSX4KRXQ.gif > ] Message ID: ***@***.***> > > >

brucemiller commented 1 year ago

@davidfarmer Seriously? You deleted my comment? Please don't do that.

davidcarlisle commented 1 year ago

you can not always put the function on the mo, for delimiters and other reasons, sometimes it has to be on the mrow, or as above, on mmultiscripts and so you need a functional form with arguments.

when would you use

intent="named:function(some name,$x,$y)"

?

davidfarmer commented 1 year ago

Sorry! I also should not have changed my comment.

On Fri, 17 Mar 2023, bruce miller wrote:

@davidfarmer Seriously? You deleted my comment? Please don't do that.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AABTULHKXVHS3DCBK4U7QODW4TTIXA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSX4LPIK. gif] Message ID: @.***>

davidfarmer commented 1 year ago

I agree that there needs to be something that lets you specify an infix or postfix reading.

On Fri, 17 Mar 2023, David Carlisle wrote:

there does not appear to be any equivalent of @infix ?

eg

<mmultiscripts @.***($n,$k)'>
C k n

from list4

(assume choose is not in the known list)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored thethread.[AABTULGD7THEEP7DCQCIGOTW4TSOLA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSX4LFV6.gif ] Message ID: @.***>

davidcarlisle commented 1 year ago

I agree that there needs to be something that lets you specify an infix or postfix reading.

currently you have

<mmultiscripts intent='named:function(choose,$n,$k)'>

I suppose you could have

<mmultiscripts intent='named:infix-function(choose,$n,$k)'>

but it still looks very odd to me with ,$n,$k rather than ($n,$k)

davidfarmer commented 1 year ago

Do you mean

intent='named:infix-function(choose)($n,$k)'

That seems reasonable.

On Fri, 17 Mar 2023, David Carlisle wrote:

  I agree that there needs to be something that lets you specify an infix or postfix reading.
currently you have
I suppose you could have but it still looks very odd to me with ,$n,$k rather than ($n,$k) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AABTULA7XXDJCOLHQVNEAE3W4TU2BA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSX4MDC6. gif] Message ID: ***@***.***>

davidcarlisle commented 1 year ago

well I asked above if you intended to have , not ( before the refs in a namesreflist and you confirmed that was intentional so here I just suggested infix-function but kept the (choose,$n,$k)form you specified

davidcarlisle commented 1 year ago

an important feature of previous versions is that there is no syntactic difference between core and open concept lists, as the list of "known concepts" will in practice be variable.

As far as I can tell you are using knownintent for colon-free references to the core list butvalue:operation(concept) for the open list. I would drop value: and allow

name reflist?

for possibibly unknown intents.

davidcarlisle commented 1 year ago

intent="named.X(foo)" is probably pronounced "foo", no matter the category X.

did you mean named:X(foo) with : not . ?

davidfarmer commented 1 year ago

Yes, I meant named:X(foo). Edited to fix.

Just because I intended (choose,$n,$k) does not mean that I am against (choose)($n,$k) . That second choice is perhaps more natural because it sets out that ($n,$k) is the argument of the function.

And I definitely missed the need for infix- and postfix- .

As to dropping the value: : I can see doing that. But without the category , as in function(foo) or number(foo) there is missing information which may make it hard for AT to do the right thing in some cases.

davidcarlisle commented 1 year ago

As to dropping the value: : I can see doing that. But without the category , as in function(foo) or number(foo) there is missing information which may make it hard for AT to do the right thing in some cases.

well yes which is how we ended up with properties/hints in other variants, but largely dropped here (I think)

dginev commented 1 year ago

In this proposal, does the inverted "median of x at index i" example (TeX \overline{x}_i), MathML:

<msub intent="median(index($x,$i))">
  <mover accent="true">
    <mi arg="x">x</mi>
    <mo>¯</mo>
  </mover>
  <mi arg="i">i</mi>
</msub>

end up identical? And if the concepts weren't known, would it instead be the same structure but with intent:

intent="adhoc:operation(my-median)(adhoc:operation(my-index)($x,$i))"

or would these be named functions?

intent="named:function(my-median)(named:function(my-index)($x,$i))"

davidcarlisle commented 1 year ago

@dginev if I understand the proposal you could not do median(index($x,$i)) or the adhoc or named versions as you can not nest function calls you can only have $xxx as arguments of a function.

davidfarmer commented 1 year ago

There are several things going on in @dginev 's example.

1) As @davidcarlisle noted, this proposal does not allow nesting functions: you have to refer to the $arg .

2) Since "my-median" is not a standard name for an existing concept, this would be adhoc. If you wanted AT to say median or mean, and those have the usual meaning, then using named is appropriate. It is the concept and the name of the concept which matter, not the notation. That way named can be used to for nonstandard notation of a common concept (or for standard notation if not in core). And adhoc can be used when the author introduces new terminology.

3) This is a good example because it points out that mover will need special treatment, just as does msup. To give a good answer I'll need more information about the rules AT uses for the default pronunciation of \overline{x}_i . But assuming it is "bar x sub i" or maybe "x bar sub i", then <mo intent="adhoc:decoration(my-median)>¯</mo> would have it say "my-median" instead of "bar".

But more likely we will encounter other situations where the preferred reading is in a different order. There are many reasonably common use cases for "decorated" objects, such as \hat{f} for Fourier transform. Adapting a suggestion from @davidcarlisle , to force the "my-median" before the x (whether or not the AT would do that anyhow), we could do

  <mover>
    <mi>x</mi>
    <mo intent="adhoc:prefix-decoration(my-median)>¯</mo>
  </mover>

I also realize that I unfortunately did not allow intent="$x $y $z". (I will put an updated schema in another comment, and also include the prefix-, etc suggestion). So, another way to do it which guarantees the "my-median" before the x is:

  <mover accent="true" intent="$y $x">
    <mi arg="x">x</mi>
    <mo arg="y" intent="adhoc:decoration(my-median)>¯</mo>
  </mover>

In this discussion it does not seem to matter that the "sub i" is there, because it is outside the mover, so AT should be trusted to handle that correctly.

davidcarlisle commented 1 year ago

@davidfarmer

In this discussion it does not seem to matter that the "sub i" is there, because it is outside the mover

that is surely the point of Deyan's example? It is not the "mean of $x$" it is the "mean of $x_i$" with just a typographical quirk of placing the bar just over the x not extending it over the subscript. However there is no container you can label with $xsubi to use as the argument to median. which is why you currently need a nested function call

intent="median(index($x,$i))

That is, you want to read it as the logical markup

<mover accent="true"  intent="median($xsubi)">
  <msub arg="xsubi" intent="index($x,$i))">
    <mi arg="x">x</mi>
    <mi arg="i">i</mi>
  </msub>
  <mo>¯</mo>
</mover>

without forcing that layout.

davidcarlisle commented 1 year ago

In general, nested function calls and/or literal values are used in the previous proposals to handle cases where the mathematical structure does not match the presentation mathml element structure. It is hard to see how you can handle these cases while restricting function arguments to $argref.

You give an easy example re-constituting a coloured number where there is a containing mrow but a more realistic coloured example might be

<math>
 <mtable columnspacing="0pt">
  <mtr intent="$op($var,10.00)">
   <mtd><mi arg="var">x</mi></mtd><mtd><mo arg="op">=</mo></mtd><mtd><mn mathcolor="red">10</mn></mtd><mtd><mn mathcolor="green">.00</mn></mtd>
  </mtr>
  <mtr intent="$op($var,12.10)">
   <mtd><mi arg="var">y</mi></mtd><mtd><mo arg="op">=</mo></mtd><mtd><mn mathcolor="red">12</mn></mtd><mtd><mn mathcolor="green">.10</mn></mtd>
  </mtr>
 </mtable>
</math>

davidfarmer commented 1 year ago

As also discussed in #448 , we have to decide if intent is supposed to go beyond its original scope of allowing disambiguation of what is written. In particular, is it allowed to rearrange the presentation tree?

In the "my-median of x sub i" example, the markup clearly indicates (my-median x) sub i. And that is what the sighted person sees. If it means my-median(x sub i), then the sighted reader somehow has to figure that out on their own.

The intent should clarify, such as indicating to pronounce it "my-median" instead of "bar" or "overline". But to make the reading change what the markup says, and providing different information than the sighted reader sees, seems like asking for trouble.

For the example of mtable with numbers split across different mtds, that markup is bad for accessibility. I don't see that intent is there to remediate inaccessible markup. But, in this case the intended reading can be done without nesting, only having references as arguments, and only putting literal numbers inside a number intent:

  <mtr intent="$var $op $value">
   <mtd><mi arg="var">y</mi></mtd><mtd><mo arg="op">=</mo></mtd><mtd arg="value" intent="number(12.10)"><mn mathcolor="red">12</mn></mtd><mtd><mn mathcolor="green">.10</mn></mtd>
  </mtr>

The fact that the intent on $value is not the literal number value of its contents, seems forgivable because of the inaccessible markup.

dginev commented 1 year ago

But to make the reading change what the markup says, and providing different information than the sighted reader sees, seems like asking for trouble.

I think as the chief current trouble-maker I should clarify that the people who tend to ask for trouble don't go away, but find the trouble elsewhere. Which is fine really, as long as everyone expects that it will inevitably happen :)

Deciding that cases where "presentation and intent structures do not align" are out of scope for this syntax proposal is a reasonable outcome. But then you get the inevitable follow-up, where someone who is decided on using that presentation MathML will use the more restrictive syntax to achieve that as either a parallel tree, or a single tree with extra wrapping mrows:

parallel mrows:

<mrow intent="$intent-branch">
<mover accent="true">
<msub>
  <mi>x</mi>
  <mi>i</mi>
</msub>
<mo>¯</mo>
</mover>
<mrow arg="intent-branch" intent="median($indexed-arg)">
<mrow arg="indexed-arg" intent="index(x,i)"/>
</mrow>
</mrow>

wrapping mrows:

<mrow intent="median($indexed-arg)">
<mrow intent="index($x,$i)" arg="indexed-arg"> 
<msub>
  <mover accent="true">
    <mi arg="x">x</mi>
    <mo>¯</mo>
  </mover>
  <mi arg="i">i</mi>
</msub>
</mrow>
</mrow>

My main point being that a restricted syntax will mostly make it more awkward to "ask for trouble", but will not eliminate the possibility (as long as Presentation MathML remains as flexible as it currently is).

davidcarlisle commented 1 year ago

@davidfarmer

In the "my-median of x sub i" example, the markup clearly indicates (my-median x) sub i

No, sorry I do not see it that way at all.

If you start from a "semantic" tex markup such as \mean{x_i} then the macro definitions must typeset $\bar{x}_i$ not $\bar{x_i}$ If it makes the latter it is simply bad tex. So a primary aim of intent is to allow this while disambguating the original meaning, hence intent="mean(index($x,$i))

Note this happens all the time. If you have $X_i$ marked as <msub intent="foo($i)"><mi>X</mi><mi arg="i">i</mi></msub>

then need $X_i^2$ you have <msubsup intent="power(foo($i),$n)"><mi>X</mi><mi arg="i">i</mi><mn arg="n">2</mn></msubsup>

and again, you need nested function calls as neither foo nor power have an element that corresponds to an argument.

brucemiller commented 1 year ago

In the "my-median of x sub i" example, the markup clearly indicates (my-median x) sub i.

If by "markup" you mean the pure MathML without the intent, then: No, the markup indicates "(x with overbar) subscript i".

And that is what the sighted person sees. If it means my-median(x sub i), then the sighted reader somehow has to figure that out on their own.

Exactly; and they do. Knowing the hypothetical (but common) context, they would recognize that overbar stands for median, and that whatever kind of collection "x" is (vector, array, list, whatever) don't have medians, but the elements of those collections do have medians, the sighted reader would figure out that the expression must mean "median(index(x,i))", and that "index(median(x),i)" would be wrong.

I don't see that intent is there to remediate inaccessible markup.

Hmm. I thought that was exactly what it was for.

To me, notation ambiguity is just a form of inaccessibility. Both sighted and non-sighted people are just as capable of figuring out that overbar stands for median, that "J" stands for Bessel, etc. But without the visual cues and context, it is much harder for the latter to do, unfairly so. Is that the wrong perspective?

davidfarmer commented 1 year ago

When I said "I don't see that intent is there to remediate inaccessible markup" I was referring to decimal numbers spanning multiple cells in a table. That is inaccessible to a level beyond ambiguous notation.

I accept that in the "my-median of x sub i" example, the presentation markup actually says "(x overbar) sub i". I don't see that as inaccessible markup.

Things like |x| require intent, because the literal reading, pronouncing all the symbols, imposes a cognitive burden. And AT guessing the wrong meaning is worse.

Is it better to hear "mean of quantity x sub i endquantity", which is what it means but not what the markup literally indicates? Or would the AT user prefer just hearing "mean" instead of "overbar" with the existing markup?

davidcarlisle commented 1 year ago

Is it better to hear "mean of quantity x sub i endquantity", which is what it means but not what the markup literally indicates?

The markup is not something the reader should be aware of at all, it is just a technical necessity.

I think $X_i^2$ should be pronounced however you are pronouncing $X_i$ followed by "squared". The fact that in MathML, as in TeX, a sub-sup combination is a separate markup than a nested subscript does not affect the reading,

I do not see how you can specify an intent for $X_i^2$ in this proposal as there is no element corresponding to $X_i$, but I don't see the restriction is needed for this proposal, you could allow nested arguments with minimal change to the grammar.

davidfarmer commented 1 year ago

I am hoping that this functional approach is workable, and I understand that if intent goes outside the presentation tree, then nested arguments are necessary.

davidcarlisle commented 1 year ago

I am hoping that this functional approach is workable, and I understand that if intent goes outside the presentation tree, then nested arguments are necessary.

I'm not sure what you mean by "outside" here but in any case I'd see specifying intents for $\bar{X}_i$ or $X_i^n$ as core motivating examples for intent, so if you could post a version of the grammar that supported that, there are other parts that probably need discusssion, but without that it's hard to see how to make it workable.

davidcarlisle commented 1 year ago

Most of the above discussion was about arguments to functions, so some comments on the other parts of the proposal, with comparisons to https://w3c.github.io/mathml/#mixing_intent_grammar

knownintent       = we have to decide on the list {e.g., 'absolute-value', "superscript", "number", ...}

I think listing names in the grammar is too fragile, better to accept any identifier here, with the system handling "known intents" and just reading unknown ones as-is, so

concept-or-literal := NCName

reflist           = '(' '$' identifier (',' '$' identifier)* ')'

As noted above, I can't see any way to make a restriction to $argref so perhaps

arglist ='(' intent (',' intent)* ')'

namereflist       = '(' names (',' '$' identifier)* ')'

As discussed above (foo, $a, $b) is unusual syntax for a function call (lisp-like, but with commas). Despite a personal fondness for lisp I suggest

namereflist = '(' names ')' arglist

identifier        = letter+
letter            = we have to decide what is a "letter"

Probably should be NCName or [\pL][\pL\pMn\-\Md]+ or some such as discussed for other proposals

name              = letter+ (('-'|\s) letter+)*   {note: spaces are allowed.  Not a deal-breaker.}
names             = name ('|' name)*

This (long name | other name) proposal is the main new feature here, it could possibly be incorporated in to the other proposals if we decided to go that way.

numbervalue       = we have to decide if . and , are both allowed, and minus sign { why not? }
type              = 'named' | 'adhoc' | 'value' { "isa" is sort-of a type, but is treated differently }

I can't say I like the names adhoc or value but that's just details.

category          = 'function' | 'group' | 'number' | 'operation' | 'system-of-equations' | ...  {several more things}

as for knownintent, I think baking a fixed list in the grammar is too fragile, also as discussed for other proposals you end up needing multiple overlapping ones, so I would allow :function:infix:complex:whateverand use

property := ":" NCName

intent            = (knownintent reflist?) |

concept-or-literal arglist due to suggested name changes above, but why can you not have category/properties here?

                    (isa ":" category) |

(isa property+)

                    (type ":" category namereflist)

(type property+ namereflist)
(type ":number" "(" numbervalue ")")
If you make category/property an open list this just becomes a special case of the previous clause but with an interpretation that property number means the arglist has exactly one arg and any commas are part of the number

davidfarmer commented 1 year ago

There is a lot for me to unpack here. I will working on modifying the grammar, but it would help to clarify if nested arguments are really needed. I am submitting a separate issue for that.

brucemiller commented 1 year ago

I'm having a hard time getting an overview perspective of this proposal; Can you give a sense of the advantages of this proposal over the others?

brucemiller commented 1 year ago

but it would help to clarify if nested arguments are really needed.

Perhaps it isn't if you can get the same effect. Given the common MathML

<msub>
  <mover accent="true">
    <mi>x</mi>
    <mo>¯</mo>
  </mover>
  <mi>i</mi>
</msub>

Without modifying the MathML, how should an annotator that knows the meaning is "the mean of the i-th element of x" encode the intent in your system?

davidfarmer commented 1 year ago

I hope this is at least a partial answer to the question of what I was trying to propose and the advantages I hoped to get from it.

I am thinking about the interface I am creating which will convert user input to MathML with intent.

For core intents, I am not particularly concerned: those will have a specified markup which I can produce and which we can expect AT to handle properly. (There are a couple of key cases which may require more discussion, such as how to indicate that <msup><mo>H</mo><mn>2</mn></msup> is "H 2" and not "H squared", probably pronounced "H sup 2" by AT.)

The hard part is how I will enable authors to indicate special treatment for markup not in core. We have seen some examples of trying to include literal words, so that AT can say what the author would say if the formula were read aloud. Those examples convinced me that this is a bad idea, because quite often the result was worse. Thus, we need a functional syntax.

My other conclusion was that, as I figure out how I will allow authors to specify non-core intent, I do not want authors thinking in terms of how they pronounce the expression. A workable alternative is for them to indicate what something is (or what it is not, such as the "2" in "H^2" is not an exponent).

For example, a particular symbol may represent a function in one context for one author, and an infix operator in another. Knowing that something is a function helps with pronunciation, so I need a way for intent to specify that something is a function. (Maybe not the best example, because of ⁡.) And if that symbol has a name, the author will want to indicate the name. That is: a mathematical name which may be different than the Unicode name. (And as mentioned much earlier in this issue, I think it is good to distinguish between an established name (which did not make it into core) and a name which is not generally known and perhaps invented by the author.)

There also is the issue of what authors want to indicate, even if we might argue that it is not really necessary. For example, the author may want to say "J" is a Bessel function. They may complain if they are not allowed to supply that apparently useful information. So, I wanted a way to encode the name, but in a way that AT knows it is okay to just say "J". Similarly for authors specifying that "G" is a group. Maybe AT will not use that now, but if we allowed isa:group as the intent, that would make some authors happy. I would prefer not to do something like a data-isa="group" attribute.

The previous paragraph describes things like specifying that the content of an mo is a function. A related situation is specifying that a large multi-layered expression is a system of equations, or a matrix, or a "cases", or some other type of expression. That is important information for AT.

Another issue is numbers. I don't like the idea of requiring "." as the decimal separator, and there also are complex numbers and scientific notation. All of those are numbers. So I suggested a number intent as a wrapper, as in number(3,14159).

I'd like to be able to output those types of intent. And unless someone can figure out a way to only allow speech strings that make things better, I'd like to disallow those.

davidcarlisle commented 1 year ago

I'd like to be able to output those types of intent. And unless someone can figure out a way to only allow speech strings that make things better, I'd like to disallow those.

Actually I'd say a main effect of the proposal here is that it offers arbitrary speech strings for people who don't like _ .

I must admit I assumed that was the main motivation, as it's the main new feature.

intent="named:function(arbitrary English sentence here)"

seems valid (you could replace named with adhoc etc but as far as I can tell all allow the equivalent of

_(_arbitrary, _English, _sentence, _here)

without the ugly _

davidfarmer commented 1 year ago

I don't think the problems came from allowing arbitrary text for the name of a function. After all, "Bessel function of the first kind" is the name of the function commonly denoted "J". But nobody says that when pronouncing a formula. They say things like "Jay naught", which is bad because it hides the fact that the 0 is in a subscript. AT would indicate the subscript (I assume).

The named:function part of the markup makes a difference.

It is possible that I have not absorbed what is going on with the underscores.

On Wed, 22 Mar 2023, David Carlisle wrote:

  I'd like to be able to output those types of intent. And unless
  someone can figure out a way to only allow speech strings that make things better,
  I'd like to disallow those.
Actually I'd say a main effect of the proposal here is that it offers arbitrary speech strings for people who don't like _ .

I must admit I assumed that was the main motivation, as it's the main new feature.

intent="named:function(arbitrary English sentence here)"

seems valid (you could replace named with adhoc etc but as far as I can tell all allow the equivalent of

_(_arbitrary, _English, _sentence, _here)

without the ugly _

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AABTULFY7DE6N32HKUBHGVLW5OCBTA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSYHS6LG. gif] Message ID: @.***>

davidcarlisle commented 1 year ago

ok so my example should have been that this proposal makes it easy to have

<msub intent="named:function(Jay naught)"><mi>J</mi><mn>0</mn></msub>

The named:function part of the markup makes a difference.

use adhoc:operation in my examples if you prefer. Unless I misunderstand you completely, the effect of

intent="adhoc:operation(an english sentence)"

is to ignore the mathml markup completely and generate the speech string an english sentence

dginev commented 1 year ago

unless someone can figure out a way to only allow speech strings that make things better, I'd like to disallow those.

For the record I hold the opposite design bias:

Unless AT can generally guarantee great coverage of all edge cases we can expect to encounter in a broad sample of real-world uses of math syntax, I would like the authors to always have an "escape hatch" where they can remediate linguistic realities that were not foreseen during the WG's limited charter and survey scope.

davidfarmer commented 1 year ago

I agree that what I am asking for would allow intent="named:function(Jay naught)" on the msub.

My thoughts are slowly (maybe too slowly) turning toward wanting to be able to do what I think would be useful, and away from trying to prevent others from doing what would not be helpful.

On Wed, 22 Mar 2023, David Carlisle wrote:

ok so my example should have been that this proposal makes it easy to have
J0
  The named:function part of the markup makes a difference.
use adhoc:operation in my examples if you prefer. Unless I misunderstand you completely, the effect of

intent="adhoc:operation(an english sentence)"

is to ignore the mathml markup completely and generate the speech string an english sentence

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AABTULBXJB5IGMOB2FNT5ATW5OFWNA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSYHUH2U. gif] Message ID: @.***>

davidcarlisle commented 1 year ago

apart from making it easier to supply space separated words (and | separated choices for such strings) the other main feature is categories. These seem mostly a syntactic variant of :properties as used in the current draft, except more restricted unless you take the suggestion to allow more than one, The syntax is a lot more complicated though.

davidcarlisle commented 1 year ago

you ask

The "number" intent (is it too confusing to have it as both an intent and a category?)

I think the answer is yes as you give the example

<mrow intent="number(3.14)">

but that doesn't parse. number there is a knownintent so does not allow digits.

To parse 3.14 as a number in the grammar above you would need something like

<mrow intent="value:number(3.14)">

but even this is a bit confusing as it looks like a type ":" category but is in fact a separate grammatical form with separate parse rules for the argument.

I think it would be clearer if we want a separate grammatical form for numbers allowing comma to use a separate syntax, say [3,14] so you could use that anywhere as foo-bar($x,[3,14]) but the feeling on last week's call was not to allow decimal comma in the syntax, which means quoting is not needed and foo-bar($x,3.14) works.

davidfarmer commented 1 year ago

I think we were too hasty in our discussion of numbers.

It is not just a question of commas or periods. What is a "number"? Complex numbers? Scientific notation?

A wrapper for numbers, either number(***) or [***] is worth discussing. As is the possibility of push-back if we do not allow comas.

On Thu, 23 Mar 2023, David Carlisle wrote:

you ask
  The "number" intent (is it too confusing to have it as both an intent and a category?)
I think the answer is yes as you give the example
but that doesn't parse. number there is a knownintent so does not allow digits. To parse 3.14 as a number in the grammar above you would need something like but even this is a bit confusing as it looks like a type ":" category but is in fact a separate grammatical form with separate parse rules for the argument. I think it would be clearer if we want a separate grammatical form for numbers allowing comma to use a separate syntax, say [3,14] so you could use that anywhere as foo-bar($x,[3,14]) but the feeling on last week's call was not to allow decimal comma in the syntax, which means quoting is not needed and foo-bar($x,3.14) works. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AABTULARW6TVUWDMBKI3F2DW5QZKHA5CNFSM6AAAAAAV65YP66WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSYI3QJE. gif] Message ID: ***@***.***>

davidcarlisle commented 1 year ago

What is a "number"? Complex numbers? Scientific notation?

yes I wondered about that too. Certainly here (documenting a numerical sofware library) 0.314e1 is as much a number as 2

As is the possibility of push-back if we do not allow comas.

if we were discussing text strings I think there would be push back but I know of no system using comma separated function arguments that allows decimal comma. That said, some quoting method or using spaces for agument separation both work.

brucemiller commented 1 year ago

The current grammar would treat 0.314e1 as a literal (or depending on implementation, perhaps as a number 0.314 followed by a literal e1, likely an error). Other more liberal proposals that don't specifically call out number would also treat it as a literal. In either case, a :number property might be a reasonable clarification.

Comma remains a problem: 1,235 defaults to a list of two numbers (eg. function arguments). But even if we had a way of quoting the comma, it might be a small number (<2) or a large number (greater than a thousand) depending on locality (of the author? of the listener?) and what the author expected since we said they could use comma :>

davidcarlisle commented 1 year ago

@brucemiller yes at

https://mathml-refresh.github.io/intent-lists/intent4.html#IDdecimalcomma

I have <mn intent='1,234:decimal-comma'>1,234</mn> and <mn intent='1,234:thousands-comma'> although mathcat doesn't like them (not sure I like them either but they are a placeholder for whatever is decided)

NSoiffer commented 1 year ago

Both @physikerwelt and @polx were pretty clear last week that allowing two different forms of a "decimal separator" has turned out to be a bad idea in practice. I was worried about imposing my cultural bias on others, but it seems that everyone has accepted the "." in practice and not only are ok with it, but strongly want it to stay that way to keep numbers simpler.

Note: this is about intent values, not the actual display value.

davidfarmer commented 1 year ago

To be replaced by a new issue listing a few of the desirable features which maybe should be possible with functional intent.

w3c / mathml

Proposed intent with functional notation #451

Proposal for a functional grammar for intent

Example of knownintent

Example of isa

Examples of named

Examples of adhoc

Examples of value

Some special cases

Some special features