Open NSoiffer opened 10 months ago
I'm generally skeptical that custom names offer a significant advantage (e.g. reduced list length) over consistently following a uniform naming convention. Uniform naming keeps the learning curve as small as possible, and aids adoption.
To an adopter, a new large-operator
entry raises the question why we don't have prefix-operator
, postfix-operator
or infix-operator
. To me this looks like the same kind of pattern that is addressed by :prefix
, :infix
, etc properties. I would have imagined those fond of fixity properties would have added yet another fixity construct, as in :indexed-operator
or simply :indexed
, and documenting the list of (10?) Core large operators to be in that "default fixity".
In the absence of some consistent rule for choosing argument order, we'd need to document each case separately, which is why I had previously raised #478 .
I'm not advocating adding a "large-operator" concept name -- I'm merely advocating for an organizational arrangement of the names that groups the large operators together to avoid a lot of repetition. My goal is to reduce the size and apparent complexity of the spec.
I do think we need to add a few more "fixity" options, but that's not what this thread is about. This thread is about where the intent
for large operators should be placed/what the number of arguments are to the intent along with how we should describe them.
I'll make a fork with an experimental rendering of the condensed list
At the last WG meeting I took an action item to look again at this.
Unicode 16 (and MathClass-15) have 66 characters classified as largeop (mathclass="L"
in unicode.xml
) of which 17 have Unicode name containing N-ARY
The full list is at the end of this post.
This shows several categories of concept/common character that could potentially be compressed
:infix
on a binary characterintersection
may be used with U+22C2 ⋂ or as :infix
with U+2229 ∩sum
U+2211 ∑ and plus
U+002B +We could have one of entries in the current style for each of these groups then in each case list the concept names and default characters for the other entries in the group.
But split this way there are not really many in each group and I wonder if the indirection really helps or whether it would just be simpler to list each of them separatelyi n the main list, as the list would not be that long as probably several of these characters do not correspond to any concept that we would have in the core list.
U00606 ARABIC-INDIC CUBE ROOT
U00607 ARABIC-INDIC FOURTH ROOT
U02140 DOUBLE-STRUCK N-ARY SUMMATION
U0220F N-ARY PRODUCT
U02210 N-ARY COPRODUCT
U02211 N-ARY SUMMATION
U0221A SQUARE ROOT
U0221B CUBE ROOT
U0221C FOURTH ROOT
U0222B INTEGRAL
U0222C DOUBLE INTEGRAL
U0222D TRIPLE INTEGRAL
U0222E CONTOUR INTEGRAL
U0222F SURFACE INTEGRAL
U02230 VOLUME INTEGRAL
U02231 CLOCKWISE INTEGRAL
U02232 CLOCKWISE CONTOUR INTEGRAL
U02233 ANTICLOCKWISE CONTOUR INTEGRAL
U022C0 N-ARY LOGICAL AND
U022C1 N-ARY LOGICAL OR
U022C2 N-ARY INTERSECTION
U022C3 N-ARY UNION
U027CC LONG DIVISION
U027D5 LEFT OUTER JOIN
U027D6 RIGHT OUTER JOIN
U027D7 FULL OUTER JOIN
U027D8 LARGE UP TACK
U027D9 LARGE DOWN TACK
U029F8 BIG SOLIDUS
U029F9 BIG REVERSE SOLIDUS
U02A00 N-ARY CIRCLED DOT OPERATOR
U02A01 N-ARY CIRCLED PLUS OPERATOR
U02A02 N-ARY CIRCLED TIMES OPERATOR
U02A03 N-ARY UNION OPERATOR WITH DOT
U02A04 N-ARY UNION OPERATOR WITH PLUS
U02A05 N-ARY SQUARE INTERSECTION OPERATOR
U02A06 N-ARY SQUARE UNION OPERATOR
U02A07 TWO LOGICAL AND OPERATOR
U02A08 TWO LOGICAL OR OPERATOR
U02A09 N-ARY TIMES OPERATOR
U02A0A MODULO TWO SUM
U02A0B SUMMATION WITH INTEGRAL
U02A0C QUADRUPLE INTEGRAL OPERATOR
U02A0D FINITE PART INTEGRAL
U02A0E INTEGRAL WITH DOUBLE STROKE
U02A0F INTEGRAL AVERAGE WITH SLASH
U02A10 CIRCULATION FUNCTION
U02A11 ANTICLOCKWISE INTEGRATION
U02A12 LINE INTEGRATION WITH RECTANGULAR PATH AROUND POLE
U02A13 LINE INTEGRATION WITH SEMICIRCULAR PATH AROUND POLE
U02A14 LINE INTEGRATION NOT INCLUDING THE POLE
U02A15 INTEGRAL AROUND A POINT OPERATOR
U02A16 QUATERNION INTEGRAL OPERATOR
U02A17 INTEGRAL WITH LEFTWARDS ARROW WITH HOOK
U02A18 INTEGRAL WITH TIMES SIGN
U02A19 INTEGRAL WITH INTERSECTION
U02A1A INTEGRAL WITH UNION
U02A1B INTEGRAL WITH OVERBAR
U02A1C INTEGRAL WITH UNDERBAR
U02A1D JOIN
U02A1E LARGE LEFT TRIANGLE OPERATOR
U02A1F Z NOTATION SCHEMA COMPOSITION
U02A20 Z NOTATION SCHEMA PIPING
U02A21 Z NOTATION SCHEMA PROJECTION
U02AFC LARGE TRIPLE VERTICAL BAR OPERATOR
U02AFF N-ARY WHITE VERTICAL BAR
U02140 DOUBLE-STRUCK N-ARY SUMMATION
U0220F N-ARY PRODUCT
U02210 N-ARY COPRODUCT
U02211 N-ARY SUMMATION
U022C0 N-ARY LOGICAL AND
U022C1 N-ARY LOGICAL OR
U022C2 N-ARY INTERSECTION
U022C3 N-ARY UNION
U02A00 N-ARY CIRCLED DOT OPERATOR
U02A01 N-ARY CIRCLED PLUS OPERATOR
U02A02 N-ARY CIRCLED TIMES OPERATOR
U02A03 N-ARY UNION OPERATOR WITH DOT
U02A04 N-ARY UNION OPERATOR WITH PLUS
U02A05 N-ARY SQUARE INTERSECTION OPERATOR
U02A06 N-ARY SQUARE UNION OPERATOR
U02A09 N-ARY TIMES OPERATOR
U02AFF N-ARY WHITE VERTICAL BAR
The size of the lists is actually smaller than those lists, because they would be split across core and open lists. So that might argue against have a special category that consolidates the lists.
However, as mentioned in the initial comment, each operator has three variants: unadorned, adorned with just a subscript/underscript, adorned with two scripts. All of those need to be listed. So that makes the lists 3 times larger than the number of characters, or maybe 5 times larger if we break out msub
/munder
, etc.
On top of that, we still (I think) need to decide whether the core concept for the intent goes on the adorned large operator or on mrow for the entire concept, or both. If both, that's two times more listings on top of the other multiplies. That's a lot of spec space for essentially identical prose. Based on a philosophy that we've agreed on over the years but I don't think written down, the intent should be as low as possible in the MathML tree. So I'm in favor of it only being shown on a potentially adorned script.
Because of this multiplicative effect, I'm in favor a condensed list. If we agree on the 11 large operators mentioned in the first comment as what goes into core, that's potentially one extended listing versus 33 or maybe 55 individual listings. That's a lot of space savings. For the open list (assuming we add all or most of the large operator to that list), the space savings is huge.
OK I'll experiment on my fork, see what it looks like...
Practical suggestion: for very related cases, such as some integral signs, develop one concept fully and for the others just add 1 row with:
expressions follow the structure for the 'integral' concept
to the Comments
column.
the space savings is huge.
Is this an organizational question for the HTML concept list pages? There are standard approaches to manage length, for example pagination (e.g. max 100 concepts per page) and sub-pages (e.g. the different aritiy rows could be subpages linked from the outer page that has 1 row per concept).
If the Open list grows as much as it should, these techniques may become necessary on the frontend side. There are also js frameworks capable of navigating extremely large lists (100,000 rows with 22 columns in that example).
At least for the Open side of the question, it would be better to prepare for healthy growth, rather than try to constrain the space with custom conventions.
The condensed version of the table @NSoiffer is suggesting is starting to read like a math grammar to me. I hope that remains out of scope for the list pages, as it changes their character and makes it harder to contribute new concepts, as we no longer have a uniform organization.
Btw, one design philosophy we have written down are the Guidelines for core list curation. Note item 4.
Practical suggestion: for very related cases, such as some integral signs, develop one concept fully and for the others just add 1 row with:
yes some possibly slightly more formalised version of that is the plan (think)
Currently the issue is I think mostly about re-organising the yaml input (to make it possibly easier for implemntations to deal with similar concepts with shared code)
However you are correct the html display may also become an issue
Is this an organizational question for the HTML concept list pages? There are standard approaches to manage length, for example pagination (e.g. max 100 concepts per page
Yes the current display is very minimalist. On the other hand pagination is possibly less needed than it was, eg in previous iterations we always had the mathml spec split by chapter as the whole thing was too big to load in practice, but these days loading the whole spec isn't really an issue at all.
But jekyll does have some built in pagination features we could invoke without having to change the build too much if that does prove to be an issue in the open list (I can't see it being needed for core list)
@NSoiffer no PR yet but made a start
https://davidcarlisle.github.io/mathml-docs/intent-core-concepts/#default-large-operator-concepts
source diff
https://github.com/w3c/mathml-docs/compare/main...davidcarlisle:mathml-docs:main
Currently it pulls the sum
template out of the main list (so it appears twice, but we could hide the second) not sure whether the yaml is easier to understand if this is on its own at the start, or if it is in its "correct" mathematical section as now.
not sure yet how best to resolve conflicts with infix ops eg
https://davidcarlisle.github.io/mathml-docs/intent-core-concepts/#union
is currently double defined, the one that works actual goes to the default infix fixity list, not the largeop.
We could have the default infix line for plus
/ +
and then for each character say whether it has an infix version or just the large-op, not sure....
the existing comments in the sum
core entry seemed to indicate that the intent should or could go on the munderover
but in the examples I added to the fork, I only managed to place it on the mrow
.
you could use the
0-arity form on the mo
<mo intent="sum">∑</mo>
sum
1-arity form for implict sum
<mrow intent="sum($f)"><mo>∑</mo><mrow arg="f"><mi>f</mi><mo>(</mo><mi>x</mmi>...
sum of f of x
2-arity form for sum over a range
<mrow intent="sum($r,$f)"><munder><mo>∑</mo><mi arg=r>R</mi></munder><mrow arg="f"><mi>f</mi><mo>(</mo><mi>x</mi>...
sum over R of f of ..
3-arity form for sum between limits
<mrow intent="sum($a,$b,$f)"><munderover><mo>∑</mo><mi arg="a">a</mi><mi arg="b">b</mi></munderover><mi>f</mi><mo>(</mo><mi>x</mmi>...
sum from a to b of f of ..
But that doesn't really leave any good way to mark up the summation without the summand expression
Perhaps this?
<munderover intent="sum($a,$b,_)"><mo>∑</mo><mi arg="a">a</mi><mi arg="b">b</mi></munderover>
sum from a to b
or in principle we could give a different interpretation for the arguments in the 2-arity form with a different property so
<munderover intent="sum:prefix($a,$b)"><mo>∑</mo><mi arg="a">a</mi><mi arg="b">b</mi></munderover>
sum from a to b
One additional markup variant that has been discussed in general and fits with David's sum examples is using a "higher-order" application. There the summation is first attached to its indexing signature. And then, a level up, is applied to an argument.
In the process of writing my example I also noticed that summations typically have an explicit indexing variable, which is also used in the argument being summed over. And wondered if that is motivation to reuse the same intent concept index
? Maybe not, but here is some prototype markup for what I am wondering about:
$$ \sum_{i=a}^b f(x_i) $$
<mrow intent="$sum_op($arg)">
<munderover arg="sum_op" intent="index($op, in($index_var, interval($from,$to)))">
<mo intent="sum" arg="op">∑</mo>
<mrow>
<mi arg="index_var">i</mi>
<mo>=</mo>
<mi arg="from">a</mi>
</mrow>
<mi arg="to">b</mi>
</munderover>
<mrow arg="arg">
<mi>f</mi><mo>(</mo>
<msub intent="index(x,$index_var)">
<mi>x</mi>
<mi arg="index_var">i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
Some of that was discussed in #454.
This is one of the cases where one can really feel that intent has a compact syntax. There are various extensions that jump to mind to tidy up and clearly mark the different kinds of arguments.
The way I wrote the intent expressions above backs out of any assumptions about "sum" being a known operator, and ought to be usable even without custom conventions. But it gets verbose and functional, much more so than a convention-based sum(i,a,b,$arg)
or such.
Here are two additional examples, showing how the markup changes when i
is left "to the reader", and when there is no variable at all.
$$ \sum_{i} f(x_i) $$
<mrow intent="$sum_op($arg)">
<munder arg="sum_op" intent="index($op, $index_var))">
<mo intent="sum" arg="op">∑</mo>
<mi arg="index_var">i</mi>
</munder>
<mrow arg="arg">
<mi>f</mi><mo>(</mo>
<msub intent="index(x,$index_var)">
<mi>x</mi>
<mi arg="index_var">i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
$$ \sum f(x_i) $$
<mrow intent="$sum_op($arg)">
<mo intent="sum" arg="sum_op">∑</mo>
<mrow arg="arg">
<mi>f</mi><mo>(</mo>
<msub intent="index(x,$index_var)">
<mi>x</mi>
<mi arg="index_var">i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
To summarize: A sum may be indexed or bare, and its indexing variable may be constrained by a range or bare. And each of those cases may benefit from specialized speech.
@davidcarlisle : what you did is not bad. I started to reply with a suggestion and then realized that this is not needed at all: we have :largeop
. With this property, there is no need for a sum
, etc., concept name. You just need to mark the munderover
, with intent=":largeop"
and hopefully good speech is generated by the AT. This will work for any of the 83 characters you found... plus any others that are (rightly or wrongly) tagged with this property.
Doing this is a much simpler approach than the other suggestions in this issue. And the only changes that need to be made to the core concept list are to remove sum, product, and integral and any other large op that we in the list (from the distant past).
Indeed, one choice here is deciding whether we want "hopefully" (Core list) or "certainly" (Open list) for large operators.
@NSoiffer oh :largeop
had completely gone out of my mind, yes that changes things...
Indeed, one choice here is deciding whether we want "hopefully" (Core list) or "certainly" (Open list) for large operators.
Apologies. I'm missing your point other than my use of the word "hopefully" (AT SHOULD use the core list, but it doesn't have to, hence "hopefully"). I don't understand '"certainly" (Open list) for large operators'.
@NSoiffer given we have :largeop
in core properties we could simply delete sum
and product
from core concept list, however there are some advantages in having them in the core concept list to default the property so you can use sum(...
not sum:largeop(...
We could delete the "full" entries and compress to a largeop
list of characters in the default fixity section at the top in which case we could consider listing more (or all) of the Uniocde n-ary and integral characters.
Apologies. I'm missing your point other than my use of the word "hopefully" (AT SHOULD use the core list, but it doesn't have to, hence "hopefully"). I don't understand '"certainly" (Open list) for large operators'.
Sure, I can elaborate:
We expanded on this during the meeting yesterday. There is a large variability on how the indexing variable is presented, or omitted. Aspirational language I heard during the meeting described the :largeop
approach as "hopefully good enough most of the time", as it is likely to not match some of the more advanced cases, or to match at the "presentation markup slots" (e.g. the argument slots for <munderover>
), and then vocalized with more generic language at that level.
In contrast - and this is generally true in the concept vs property approach - a concept-based intent expression adds more certainty that
(Of course at the cost of more verbose markup, as well as a successful parse of the argument structure already at the annotation stage.)
@dginev I have some sympathy with your functional intent version that is basically identifying $\sum{i=a}^b$ with $\sum{i\in[a,b]}$ as that was the approach taken in OpenMath and also in Content MathML which normalises use of <lowlimit>
and <uplimit>
with <domainofapplication>
But actually we found that most of the time the extra mathematical precision didn't really help (except for mapping to computational systems) you normally want to read the upper and lower limit form just as it appears so
"sum from i = a; to b"
not as the mathematically equivalent
"sum over i in the closed interval a b"
which is what I think you'd get from
intent="index($op, in($index_var, interval($from,$to)))"
@davidcarlisle My example has three mistakes if it was aiming for Core actually:
index
should be indexed-by
in
should be element-of
interval
.And the baseline readout for in
would be terrible, so I should have known better than use it as a function head. But I could have learned that by proof-listening... But the broader point is that a closer look would show us that with certainty.
Another point is that we have freedom to choose concrete readouts when a "concept-based" approach feels too artificial (which is what I think your worry is with "interval" ?). As with:
intent="indexed-by($op, _($index_var, _from, $from, _to, $to))"
@dginev my main worry is that I think it highly unlikely that anyone seeing $\sum_{i=a}^b$ on paper or a blackboard would ever read it as sum over i in the interval from a to b
, it would almost always be read (as we are suggesting the default largeop reading would be for the upper and lower limit form) as sum from i equals a ; to b
This used to come up quite often trying to typeset OpenMath where the underlying markup just had the sum over a set, but to get reasonable readings we had to "spot" common cases such as the set being an interval and express it as an upper and lower limit. So while I think the functional intent that you showed is correct, I think it's hard to generate and produces a less natural reading than the default.
I am trying to follow this thread with an eye to a key purpose of intent: disambiguating how to pronounce ambiguous notation.
Can someone please provide examples that are claimed to be ambiguous? None of these strike me as ambiguous: $\sum_{i=a}^b x_i$ $\sum_i xi$ $\sum{i \in [a,b]} xi$ $\sum{i \in \mathbb Z} x_i$ $\sum x_i$
I understand that the author may wish to suggest a particular pronunciation among the ways to say the same mathematical content. I am not asking about that (unless it is claimed that the existing ways to specify arbitrary pronunciation do not work adequately in this case).
@davidfarmer I don't think there is any question of ambiguity here, just giving the system a hint to use from... to
or over
speech templates rather than reading $\sum_i$ as sum sub i
The (current) proposal is that all that is needed is intent=":largeop"
and for summation that wouldn't be needed usually either as that would be listed as a default property for that character.
@davidfarmer for ambiguity it is probably quickest to reach towards Type theory. An example is the sigma type, where they write
$$ \Sigma_{x: A} B(x) $$
I am not at all certain what is the preferred pronunciation, but I could explain it as "the sigma type from A to B, dependent on x". I would have to reach out to a type theorist to find a preferred readout. I have seen capital sigmas used for various other purposes in arXiv - such as a type name variable, or a group invariant. I'm sure there are others...
$$ \Sigma_{G\prime} (G) $$
Shouldn't AT know that ∑i is not pronounced "summation symbol with subscript i", nor is it pronounced "sum sub i"?
I still don't get the need to put the :largeop
intent on something which is unambiguously a large operator.
@davidfarmer The need I see is that if I have materials where I want a sigma pronounced as a summation, materials where I want it pronounced as "sigma type" and materials where they are pronounced as simply "sigma" (e.g. for the group invariants), then I should be able to guide AT to produce the correct outcome, by supplying the intended concept name.
Shouldn't AT know that ∑i is not pronounced "summation symbol with subscript i",
well it won't unless someone specifies it should.
But it's the same as +
and the :infix
property, the property is mostly of use on more exotic characters, but clearly its allowed to use :infix
on +
.
The proposal is that we have the :largeop
property defined as a core property
https://w3c.github.io/mathml-docs/intent-core-properties/#prop-largeop
It is defined by example but the basic idea of the property is "read it like summation" so naturally if you use :largeop
on a summation it ends up looking a bit tautologous .
@davidfarmer The need I see is that if I have materials where I want a sigma pronounced as a summation, materials where I want it pronounced as "sigma type" and materials where they are pronounced as simply "sigma" (e.g. for the group invariants), then I should be able to guide AT to produce the correct outcome, by supplying the intended concept name.
I think that the dependent types usually use a summation rather than a sigma (even when called sigma types) the arXiv pdf you linked to seemed to be doing that. Of course you may still want to use intent to force correct speech even when the "wrong" Unicode character is used.
Maybe I understand now:
The proposal is that we can put :largeop
on anything, which will tell AT to pronounce
the sub
and sup
elements like it does for summation notation.
But I don't need to put :largeop
on the MathML version of $\sum_i$, because (assuming
the appropriate unicode character is used) it is already unambiguous.
If I have that correct now, it seems reasonable to me.
@davidfarmer
But I don't need to put :largeop on the MathML version of ∑ i , because (assuming the appropriate unicode character is used) it is already unambiguous.
Disambiguation isn't the only (or even main) use of intent
. There is nothing that says that AT should know how to generate speech for a summation, The proposal here is that we specify how large operators are spoken by specifying a core property :largeop
with suitable templates. Also we can hint that in the absence of a specific intent a system might default :largeop
on a summation sign, but in general how a system infers default intents on mathml without intent is implemenattion specific.
@davidfarmer: this ties in to #433, which we briefly discussed on the call last week as my "remaining big issue". My proposal is that if someone uses the :common
property (typically on the math
element), then authors can assume AT will behave according the what is specified for that property. One of those things is that msub
, msubsup
, munder
, and munderover
have special speech when the base is a large operator.
I think the way to flush out my proposal is to explicitly say notation X maps to concept or property Y. In this case, it should say those constructs should act as if the :largeop
property is specified.
There is also :literal
(uses the old "structure" name in #433), and :legacy
which is the default. I suspect that you would want to specify :common
for most of the MathML you generate (assuming of course that the proposal is adopted).
There are about 10 large operators that probably make sense to go into core (maybe integral, double integral, triple integral, contour integral, surface integral, volume integral, sum, product, coproduct, union, intersection). Likely what result is decided for core should be extended to open for the other large operators (e.g., ⊍).
These are all very similar in structure in that
intent
potentially goes onmsub
/munder
with one argument (typically specifying a domain for the "index") andmsubsup
/munderover
with two arguments ("... from xxx to yyy"). Or they go on some containingmrow
with an additional argument (e.g., "... from xxx to yyy of zzz"). If they go on one of the scripting elements, then there is no need for intents for indefinite integration or sums that don't have limits. If they go on anmrow
, then maybe it makes sense to have an intent for them although Neil felt the speech needs no intent because there is no other sensible speech for "integral", "sum", etc.In the Dec 21 meeting, no one stood up for the "dx" being part of the argument for integral as it would be spoken "dx" wherever it was and didn't need help from an intent.
In the meeting, Neil felt that listing these all out both uses up a lot space (and hence appears complicated) and more importantly, obscures their similarity making it harder on both generators and consumers of the spec. His suggestion is to create another list between the "Core Concept Default Fixity properties" and the "Core Concept Templates". Others were not enthusiastic with that idea.
This issue provides a place to discuss the pros and cons of how intents for large operators should be handled.