Closed davidcarlisle closed 2 years ago
Interesting synthesis. But something went wrong with the nemptyintent
production. Perhaps:
nemptyintent := literal | selector | nemptyintent '(' arglist ')'
arglist := wildcard | intent [ ',' intent]*
What does emptyintent signify? I'm guessing it makes the node invisible to AT (and defaulting)? Although I hadn't actually allowed it in my grammar, I found it necessary to allow empty intents within an arglist to signify missing or implied arguments (otherwise you'd need many special case literals for derivative, for example (though maybe there are better solutions?)).
Although I understand the appeal of implicit arglists, I'm still sceptical of the added complexity. Although it helps a+b+c....
, the current proposals don't help a+b-c+d...
, which are at least as common. And I've noticed that offering @
seems to tempt people to use it where they end up with various opaque tricks to make it work.
@brucemiller I added emptyintent as (a) you'd mentioned it on the call and (b) here I'd written that empty could be used instead of the intent="/" in Sam's grammar, so I thought I'd better allow it. It was more disruptive than I wanted as I didn't want to allow empty in the nested cases intent=divide(,,) so I had to have a nonempty production for the recursive cases.
Actually, the recursive case is exactly where I wanted to allow emptyintent (eg $f^(n)$ has intent=derivative(f,,n)
), but presumably not for the operator itself; I hadn't needed it at the top-level.
And, coming back to "implicit arguments", why does $a+b+c+d$ or $a+b-c+d$ need an intent value at all? If the goal is accessibility, in almost all cases it would be read as "aye plus bee minus see plus dee". At most, the operators might need an intent.
on empty oh yes so you do, OK I'll adjust,
on narry arguments I think I'd agree that special casing the case where all operators are the same is probably not right, you really want to have intent = "natural sequence of arguments and infix operators" that would anyway be the default, so then it comes down to a more philosophical design question as to whether the default should correspond to an explicit intent in all cases.
At the moment, I personally only see the need for a restricted subset of Bruce's grammar that holds a bare minimum for writing down
which appears to fully suffice for AT remediation. Namely:
intent ::= name | expression
name ::= concept | argref
concept ::= [letter|digit|-]*
arg ::= [digit]+
argref ::= '$' arg
expression ::= name '(' intent [ ',' intent ]* ')'
with an example use:
<mrow intent="$1($2)">
<mi arg="2">x</mi>
<mo arg="1" intent="double-factorial">!!</mo>
</mrow>
Edit: update with a link to my group meeting talk on this approach.
intent=""
seems reasonable, though I wonder if ARIA interop isn't a better way to allow it via aria-label=""
or probably even better aria-hidden="true"
.I think we need to do something about long n-ary argument lists. Yes, defaults do take care of the ones I'm aware of, but I think it is important to be able to specify a default value as an intent
value so that one can describe the defaults as an intent
(basically eating your own dog food). Or more practically, not having to invent something new in software to handle the default. I envision a system to interpret intent working along the lines:
intent
is specified, use it'2' or '3' might have document-level or user "style sheets" for intent
in addition to whatever defaults the software has.
I'm using "style sheet" in a very generic sense of being a collection of patterns that match to return an intent
value. They could be CSS-like, or they could be something very different.
I mostly agree with @davidcarlisle about the special
values being unnecessary.
@samdooley: if you have some specific use cases in mind where they are actually needed, you should add them to your doc where you introduce/explain the syntax. The examples given merely show that your statement "it is rarely needed" is true; what they don't show is that "never needed" is false.
I do see some merit in @
, although (again agreeing with @davidcarlisle), it seems overly complicated. I want to come back to my favorite example: a+b-c+d
(all in an single mrow) that @brucemiller brought up. If @
just meant use the arguments as is (I think *
would be better), then this would speak just fine, but if it has to point to an argument and use the speech for that argument in place of all of the mo
s, then that doesn't work. For what @samdooley wants to do, I think the answer is that a complicated (nested) intent value is needed because the mapping between presentation and content is more complicated.
Although I complained about @
seemingly overly complicated, I want to throw out a suggestion for a complicated version that is potentially powerful, at least for speech: `prefix@infix@postfix" (prefix/infix/postfix would be given as the actual words to speak) where
"prefix", "infix" and "postfix" are optional. Examples:
intent="@@"
would be used for my a+b-c+d
example and other similar n-ary examples.intext="fraction@over@end fraction"
could be used for an mfrac
I don't have a lot of good examples, so maybe this is too much complication for not much gain.
Mainly a question for @dginev...
With your example
<mrow intent="$1($2)">
<mi arg="2">x</mi>
<mo arg="1" intent="double-factorial">!!</mo>
</mrow>
If the AT doesn't know about double-factorial, wouldn't the likely speech be "double factorial of x"? That's not terrible, but probably not what is intended.
My question: how do we distinguish between wanting to hear "x double factorial" and "double factorial of x"?
If we need an intent value as a place-keeper for "read the tree as it is" (ie. depth-first traversal) then we can make @
stand for that.
My question: how do we distinguish between wanting to hear "x double factorial" and "double factorial of x"?
Short: external speech hints via either a global list, and/or a document-level list and/or an AT-specific vendor list. As well as using AT contextual decisions.
Long:
looking at @dginev's simplification ...
I think dropping named arguments may be a step too far: allowing intent="..$foo"
.. <mi arg=foo>
as well as intent="..$2"
...<mi arg=2
makes some things look a bit nicer
However dropping the $2/$3
path syntax might be a good idea. It perhaps isn't needed as instead of $3/$2
you could put arg="zz"
on the grandchild and replace $3/$2
by $zz
As elsewhere I think its main benefit would be specifying defaults but perhaps we can work round that. If we keep it then specifying /
will face the same issues that anyone accessing MathML with XPath finds; that the child step in the xml isn't necessarily the logical child and you would need to decide whether / is strictly referencing a child in the xml tree or whether you step through redundant or implied mrows. that is given
<mrow><mrow><mi>a</mi><mi>b</mi></mrow></mrow>
is b $2
or $1/2
the mathml spec has always said that such a single child mrow is redundant but if removing it means re-numbering all intent paths that's a bit of a pain.
As @davidcarlisle suggests, a main advantage of the path syntax ($2/$3
or $2/3
, whichever) might be for defaulting. Think of several patterns to match the various forms of binomial (mfrac
, marray
,..). When defaulting matches a given pattern to a mathml subtree it could simply add the corresponding intent to the root of that subtree and be done, without having to invent names and find and modify the children.
OTOH, since it would presumably would also restrict its recursion to only those children that were referenced (ignoring children that were invisible, such as the binomial's parentheses, the mfrac
itself, etc.), it would still have to find & fiddle with the children; So maybe the "simplification" of the 1st paragraph is no real advantage.
I think dropping named arguments may be a step too far: allowing intent="..$foo" ..
as well as intent="..$2" ...<mi arg=2 makes some things look a bit nicer
They could look nicer or they could look worse, it depends on what values are used.
Long:
I may be missing your point, but to me your second typo seems harder to catch from the output than the first. Of course, I'm more comfortable with the pedestrian kind of mathematics that has numbers in it. Numeric literals will be needed, on occasion (eg. $f''$).
That said; automatic sanity checking would be a good thing.
harder to catch
Harder to notice, easier to figure out. But sure, this is why I'd lean towards automating the detection entirely, for this class of error. Spellcheckers are a huge help for text, so a good refchecker ought to be a huge help for these attributes.
My point was more focused on debugging what went wrong than about initial detection. How one names the argrefs could be a cause of confusion and surprise, in an analogous way to how one names their variables in a programming language could cause confusion and surprise. They could also be a source of great clarity when chosen judiciously, as you and David hope will be the usual case. But that optimistically assumes the remediator is in possession of that great clarity while annotating.
My intuition is that when one remediates+proofreads a book (~10,000 expressions), or a paper (~1,000 expressions), there will be a noticeable speedup from the boring predictability of only having numbers in the argument holes. Both when adding them in ("how do I name this?" is never asked), and when reading through them later ("what is this foo referring to and why did I name it foo?").
f''
I assume you mean "the second derivative of f" and not f-double-prime
here. It's less obvious to me. Wikipedia has a page for second derivative, and has an alias mentioned - second-order-derivative
. The appeal of derivative($1, 2)
is understandable, from a computational standpoint, it's the second functional-power of the derivative. Then again, if that was the core motivation, we've also discussed that as functional-power(derivative, 2)
which is what's really getting computed. Which annotation feels "natural" here depends on what one is trying to model, and I think we still lack a common vision in that regard.
@dginev I think my main concern is it is rather unusual to use numbers as identifiers, most (or at least some) systems force at least the first character to be non-numeric, eg the NCName in Bruce's grammar or [A-Za-z_][A-Za-z0-9_.]*
in Sam's in both of those, $2
references the second child, and while we could define it as you suggest to reference whatever element has arg="2"
I suspect that people will still read it as positional.
Mainly, my point was that neither names nor numbers are inherently easier to debug, depending on the accidents of the names and numbers chosen and the exact situation. And my second point, independent of the merits of any particular encoding of derivatives, was that literal numbers are almost certain to be needed somewhere.
I suppose if we drop support for the path-style references, the named references can allow any alphanumeric for the name. Then you can use pure numbers for annotation if you prefer.
Alternatively, we could use a distinct modifier prefix for path references. That has the advantage of more clearly distinguishing different kinds of reference. In an ideal world, I would have suggested '@' (with "at" suggesting "position").
@dginev I think my main concern is it is rather unusual to use numbers as identifiers, most (or at least some) systems force at least the first character to be non-numeric
Not my fight, but HTML5 has allowed purely numeric ids since at least 2010 if this article is right. More to the point, the arg
attribute is not an identifier - we can expect hundreds of identical ones in the same document, when the same notation is repeated. It's just as soft as class="intent-arg-1"
.
If it's a substantial problem, I could imagine some reworking where we do away with the dedicated arg
attribute to avoid the issue and instead make a heavier intent
attribute, as in (for my double-factorial example): intent="arg:2"
, or
intent="arg:1; double-factorial"
, or something on those lines. Not too enthusiastic about getting in the territory of style
-like values though.
@dginev hinted at a what is likely a really useful tool: a validator that looks at the intent values in a document and gives errors and warnings.
Errors would include referencing something that doesn't have an arg
that is in scope (e.g, could be missing or could be blocked by another intent
).
Warnings would be for literals in the intent value that match arg
values in scope. There are probably other examples that deserve warnings (e.g., referencing the same arg
twice is likely a mistake).
I think with such a tool, concerns about mistakes involving numbers or alphanumeric refs being hard to spot go away. Of course, people can still make lots of other kinds of mistakes like using $2
when they really meant 2
and arg="2"
actually does exist. Still, the tool would likely catch the vast majority of mistakes.
I wouldn't be surprised to see @davidcarlisle or @dginev demo such a tool next week. Hint, hint 😀
@davidcarlisle @brucemiller @samdooley @NSoiffer I've opend a new issue to do a deeper dive on the n-ary
expressions at #253 , pinging here just in case you're not subscribed to the repository notifications.
Just for the record:
I think introducing a new grammar to describe the intent is not a good idea at all. Overall, it seems that the majority of the group insists on adding new complexity instead of "abusing" other existing methods, as described in the Leveraging Existing Technology section of the MathML Accessibility Gap Analysis document. Since this attribute describes mathematical semantics, I think the least harmful solution was to use some form of LaTeX like syntax. At least it comes with selectors among other language features.
As briefly discussed on the call today, we need to get back to intent.
At
https://w3c.github.io/mathml/spec.html#mixing_intent_grammar
There are currently two versions of the grammar showing, the second one being more aggressively minimalist and is similar to @dginev 's version above in not having numeric paths or wildcards (but it does still separate out numeric literals from identifiers for the concept entries in the intent table.
I think the $4/$5 path syntax does simply describing the default intent as you can say <mfrac>
has intent divide($1,$2)
without having to invent a meta syntax that allows you to say divide($a,$b)
where arg="a"
and arg="b"
added in suitable places ...
But counting elements leads to all kinds of issues around redundant mrows and which elements to count and I'm not sure the benefits outweigh the costs, or that we could get a usable version agreed in reasonable time.
So I would propose going for the minimal version which is however structured such that numeric paths could be added back later if some pressing use cases are discovered.
For the record in case the spec version gets edited, the minimal version there is:
intent := number | NCName | argref | function
function := (NCName | argref) '(' intent [ ',' intent ]* ')'
number := '-'? digit+ ('.' digit+)?
argref :='$' NCName
That is, using NCName to denote intent concepts, allowing numeric literals, not allowing empty intent or empty function parameters.
A comment on the minimal version of intent
in the previous comment dropping numeric path references.
I think that they would be useful, especially for specifying defaults) and if we can bring them back later we may want to reconsider (which is one reason I'm not keen on allowing numeric names here). But experimenting with polyfills, they just seem too fragile.
Looking at the first example at
https://mathml-refresh.github.io/mathml-polyfills/acid-test.html
I added an intent to the <mfenced>
to say it was a pair:
<mfenced open="[" separators=";" close="]" intent="pair($1,$2)">
<mn>0</mn>
<mn>1</mn>
</mfenced>
Once the polyfill is enabled so this renders as [0;1] the DOM is equivalent to:
<mrow intent="pair($1,$2)">
<mo fence="true" data-nesting-depth="1">[</mo>
<mrow><mn>0</mn><mo fence="true" data-nesting-depth="2">;</mo><mn>1</mn></mrow>
<mo fence="true" data-nesting-depth="1">]</mo>
</mrow>
where the intent is now basically wrong.
We could say that is just an implemenation deficiency in the current polyfill, not fixing up intent, or we could specify that values always reference the initial document or we could say intent is only supported on core or ... but none of these choices seems particularly appealing.
Using arg
references, the intent
survives the mfenced
polyfill, even though it was not coded to do anything special with intent
<mfenced open="[" separators=";" close="]" intent="pair($a,$b)">
<mn arg="a">0</mn>
<mn arg="b">1</mn>
</mfenced>
becomes
<mrow intent="pair($a,$b)">
<mo fence="true" data-nesting-depth="1">[</mo>
<mrow><mn arg="a">0</mn><mo fence="true" data-nesting-depth="2">;</mo><mn arg="b">1</mn></mrow>
<mo fence="true" data-nesting-depth="1">]</mo>
</mrow>
Ouch! That's a pretty compellingly annoying example!
I still think that relative paths would be useful when defining
defaulting patterns, but that the application of such a pattern to a
given MathML fragment should convert the path references to named ones
and add the arg
attributes appropriately.
But relative paths are perhaps just too risky within the normal intent values.
@brucemiller yes that was my thought too that the polyfill could convert the paths to named references which would then survive the transformation, but the whole thing made me uneasy. If we keep the grammar so that numeric paths could be added back if they prove to be needed, going with the minimal version without them might simplify things (and we might need a simple version to get this all signed off in reasonable time)
I may have been unclear: I do not think it is wise to require the polyfill (that would, eg. convert mfenced to an mrow) to deal with adjusting references. They're hard enough already, and kind of come in from the wild. Given that, we probably should not allow relative paths in the intent attribute.
What I was tentatively and speculatively proposing was that if and when we consider a defaulting mechanism (which would likely be outside of MathML proper), we might want to have an enhanced intent minilanguage which did include relative paths. Something handwavingly like:
match_fence : pair($1,$2)
and it would be the defaulting machine's responsibility to add the intent and arg attributes to end up with your
<mfenced open="[" separators=";" close="]" intent="pair($a,$b)">
<mn arg="a">0</mn>
<mn arg="b">1</mn>
</mfenced>
And then polyfills are free to mangle as they wish.
But that's all just recording a thought for the future. It's probably best that we omit the relative paths from intent for now.
@brucemiller ah yes having an enhanced meta syntax for describing defaults could work. I'll leave both versions in the spec for now but hopefully we can sign off the grammar on the call tomorrow, at least an initial draft stable enough to move forward with chapter 5
Nothing like looking at examples to sharpen one's thoughts. I originally liked the idea of positional arguments, but the mfenced
polyfill example shows the danger of it. Although the polyfill could be modified to deal with it, that more complication. So I "officially" change my mind and think we should ditch positional references for now.
Doesn't NCName
allow for digits, so $1
would be valid in the minimal grammar? We should prevent that so that if in the future we want positional references, we can add them and not break existing usages.
@NSoiffer
Doesn't
NCName
allow for digits, so$1
would be valid in the minimal grammar?
No, it's the same as a no-namespace xml element name, so has to start with a letter (for some definition of letter) but may contain digits later)
This comment dosn't really fit under "unifying" but not sure we want a 10th open #intent issue, so adding it here...
We need somewhere to host the lists that is at least stable enough to use for the spec working drafts.
Currently we link to the original spreadsheet, and we could go with that, but the interface feels a bit wrong to me, especially as it doesn't make it so clear that people can add to level 3 but shouldn't change level 1 once it is signed off by the WG.
I exprimented after the call last night with using github pages for levels 1 and 2 and a github wiki for level 3, more comments in the readme at
Currently we link to the original spreadsheet, and we could go with that, but the interface feels a bit wrong to me, especially as it doesn't make it so clear that people can add to level 3 but shouldn't change level 1 once it is signed off by the WG.
Thanks for starting on this David! It will kick me back into gear here as well on the long-term editing side. I have been surveying the available open spreadsheet implementations and I think I'll just start a React app using ag-grid that allows editing two (directories of) JSON files - one for Intent Core
and one for Intent Open
, using Github's OAuth, with a git commit for each persisted edit. I'm gearing to use the named
levels, since Neil keeps talking about Level 0
on even days and Level 1
on odd days, which is too confusing :-)
I think we may want to keep each "sheet" as a separate JSON file, and the really big Unicode sheet (which doesn't exist yet in the Google Sheet) may be best kept isolated. This is just a preliminary plan (welcoming feedback), since the majority of both contributing to and editing of these lists of names remains to be done. I'll let you know how this goes in the meeting next Thursday.
@dginev thanks. As I put in the readme, the current repo is just a discardable experiment (and may already be enough to suggest it's not really the right direction:-) So feel free to build something else... It's not super urgent as we could reference the google sheet even for a FPWG if we don't have anything else, but at some point it may start to affect the words we want to put in the spec (especially around names and number of levels and which ones are open to contributions)
@dginev ag-grid looks interesting
The discussion of the format/location for the 'intent
values should be in a separate issue.
The original topic has been resolved, so I'm closing this.
At the call on 2021-11-04 two intent proposals were proposed (further discussion expected this week) This issue is the start at trying to propose a unified syntax,
Bruce
https://mathml-refresh.github.io/discussion-papers/semantics-mini
Sam
https://samdooley.github.io/mathml-docs/intent2cmml/intent.html
I suspect we don't need
!
even though it may help forcing some specific content mathml forms such as<apply><plus/>
rather than<apply><csymbol>+</csymbol>
we can probably arrange the defaults "do the right thing" not dont need that,.Similarly the literal prefix
#
is not I think needed as unprefixed names and numbers can serve this purpose.<mi intent="3">iii</mi>
in Bruces's proposal seems equivalent to<mi intent="#3">iii</mi>
in Sam'sThe
intent=/
for empty is (normally) covered by having an intent on the parent that doesn't select the element. I think remaining cases could be covered byintent=""
Bruce confirmed in the call that the omission of a syntax for implicit list of arguments was intentional but I think that
(+ 1 2 3 4 5 6)
becomes unweildy if we don't have this and even if we could make the default intent of<mrow>... <mo>+</mo> ... <mo>+</mo> ... <mo>+</mo> ... <mo>+</mo> ... <mo>+</mo> ...
be this, we can't write down the default value explicitly if there is no syntax for narry application.However I'm not convinced by the suggestion to allow @ on its own or suffix or prefix position, it seems sound but is very hard to remember which is which. I think I would just add one wildcard symbol
@
(although*
would be another possibility) to mean "all argument children" for some definition of "all".So Sam's
would become
and
would be
where
@
means (to be defined explicitly) all the non mo children.For child references both proposals allow
$3
and$foo
, Bruce also allows$3/$1
but I think you could also allow$foo/$2
So... (untested by any running code...)