Open davidcarlisle opened 10 months ago
Suggestion here is we tidy up and pre-load the corrected data in the format?
@josephwright well this issue is just about getting the lists aligned or at least differences documented.
We could then consider working similarly to lccode tables, initialising the format based on unicode data, then over-riding with tex specific corrections and historic choices in a separate phase, but that would be a new issue. Actually I thought pre-loading the table was an old issue, it's certainly been raised before, but a search of issues didn't show anything just now.
I updated the query to also log (marked !
) characters with an entry in unicode-math-table but not in MathClass-15.
The list in the issue description is updated.
The entries marked ???
(where unicode.math
asserts ther is a unicode-math
definition that is not found are
U00332 | \underbar | D | ??? | COMBINING LOW LINE
U003D0 | \m?upvarbeta | A | ??? | GREEK BETA SYMBOL
U003D8 | \m?upoldKoppa | N | ??? | GREEK LETTER ARCHAIC KOPPA
U003D9 | \m?upoldkoppa | N | ??? | GREEK SMALL LETTER ARCHAIC KOPPA
U003DA | \m?upStigma | A | ??? | GREEK LETTER STIGMA
U003DB | \m?upstigma | A | ??? | GREEK SMALL LETTER STIGMA
U003DE | \m?upKoppa | ! | ??? | GREEK LETTER KOPPA
U003DF | \m?upkoppa | ! | ??? | GREEK SMALL LETTER KOPPA
U003E0 | \m?upSampi | A | ??? | GREEK LETTER SAMPI
U003E1 | \m?upsampi | A | ??? | GREEK SMALL LETTER SAMPI
I think the first is just an error \underbar
is obviously defined but I don't think it's associated with U+0332 in any real way and I should delete this entry in unicode.xml
The others are all old Greek. @wspr I must have thought these were defined in unicode-math at some point, has this changed or are they defined in some option?, I couldn't see them, if not I should probably just similarly correct unicode.xml
and delete these entries?
I think they were defined once upon a time, but I came up absolutely blank when looking for examples of them anyway, so I think I assumed they were examples of glyphs from a font that may have been associated with maths but wasn’t actually used in maths typesetting.
If you have even hints that they should be included in UM I’m happy to bring them back.
(I suspect that quite a number of the geometrical symbols and graphical looking arrows are also in this category.)
@wspr thanks, I'll delete them for now...
done, list in the description updated.
I’m actually a bit torn on this; I like the idea of providing more symbols for people to extend their repertoire with. And these symbols are in STIX Two, so I’d suggest we bring them back in UM rather than keep them deleted… WDYT?
Fine by me, now the list in unicode-math-table matches what unicode.xml says is in unicode-math so if you add them here, I'll add them there and keep in sync...
It's probably worth reviewing the cases where there are different assignments (rather than missing definitions) especially if we might consider pre-initialising the mathcodes from the MathClass-xx file.
I have also omitted here assignments to Unicode combining characters which will necessarily require modified (or probably missing) support in tex.
I plan to raise an issue at Unicode on these left and right delimiters that are marked as infix relations
U023B0 | \lmoustache | R | \mathopen | UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION
U023B1 | \rmoustache | R | \mathclose | UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION
U027C5 | \lbag | R | \mathopen | LEFT S-SHAPED BAG DELIMITER
U027C6 | \rbag | R | \mathclose | RIGHT S-SHAPED BAG DELIMITER
These are almost always accessed via (U)radical so the mathclass is perhaps not so important
U0221A | \sqrt | L | \mathopen | SQUARE ROOT
U0221B | \cuberoot | L | \mathopen | CUBE ROOT
U0221C | \fourthroot | L | \mathopen | FOURTH ROOT
U027CC | \longdivision | L | \mathopen | LONG DIVISION
mathord or mathop?
U1EEF0 | \arabicmaj | A | \mathop | ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL
U1EEF1 | \arabichad | A | \mathop | ARABIC MATHEMATICAL OPERATOR HAH WITH DAL
It isn't clear if Unicode P
really relates to \mathpunct
which would add spacing in 1.5
or should a name such as \mathperiod
relate to a sentence full stop rather than decimal point and be spaced that way
even if the mathcode of .
is mathord not mathpunct?
U00021 | \mathexclam | N | \mathclose | EXCLAMATION MARK
U0002E | \mathperiod | P | \mathord | FULL STOP
U0003F | \mathquestion | P | \mathord | QUESTION MARK
mathrel or mathord?
U021A8 | \updownarrowbar | R | \mathord | UP DOWN ARROW WITH BASE
U021BA | \acwopencirclearrow | R | \mathord | ANTICLOCKWISE OPEN CIRCLE ARROW
U021BB | \cwopencirclearrow | R | \mathord | CLOCKWISE OPEN CIRCLE ARROW
U021DE | \nHuparrow | R | \mathord | UPWARDS ARROW WITH DOUBLE STROKE
U021DF | \nHdownarrow | R | \mathord | DOWNWARDS ARROW WITH DOUBLE STROKE
U021E0 | \leftdasharrow | R | \mathord | LEFTWARDS DASHED ARROW
U021E1 | \updasharrow | R | \mathord | UPWARDS DASHED ARROW
U021E2 | \rightdasharrow | R | \mathord | RIGHTWARDS DASHED ARROW
U021E3 | \downdasharrow | R | \mathord | DOWNWARDS DASHED ARROW
U021E6 | \leftwhitearrow | R | \mathord | LEFTWARDS WHITE ARROW
U021E7 | \upwhitearrow | R | \mathord | UPWARDS WHITE ARROW
U021E8 | \rightwhitearrow | R | \mathord | RIGHTWARDS WHITE ARROW
U021E9 | \downwhitearrow | R | \mathord | DOWNWARDS WHITE ARROW
U021EA | \whitearrowupfrombar | R | \mathord | UPWARDS WHITE ARROW FROM BAR
U02234 | \therefore | R | \mathord | THEREFORE
U02235 | \because | R | \mathord | BECAUSE
U022A5 | \bot | R | \mathord | UP TACK
U022EF | \unicodecdots | R | \mathord | MIDLINE HORIZONTAL ELLIPSIS
U0237C | \rangledownzigzagarrow | R | \mathord | RIGHT ANGLE WITH DOWNWARDS ZIGZAG ARROW
U027C3 | \subsetcirc | R | \mathord | OPEN SUBSET
U027C4 | \supsetcirc | R | \mathord | OPEN SUPERSET
U027C7 | \veedot | R | \mathbin | OR WITH DOT INSIDE
U027CB | \diagup | R | \mathord | MATHEMATICAL RISING DIAGONAL
U027CD | \diagdown | R | \mathord | MATHEMATICAL FALLING DIAGONAL
U0292B | \rdiagovfdiag | R | \mathord | RISING DIAGONAL CROSSING FALLING DIAGONAL
U0292C | \fdiagovrdiag | R | \mathord | FALLING DIAGONAL CROSSING RISING DIAGONAL
U0292D | \seovnearrow | R | \mathord | SOUTH EAST ARROW CROSSING NORTH EAST ARROW
U0292E | \neovsearrow | R | \mathord | NORTH EAST ARROW CROSSING SOUTH EAST ARROW
U0292F | \fdiagovnearrow | R | \mathord | FALLING DIAGONAL CROSSING NORTH EAST ARROW
U02930 | \rdiagovsearrow | R | \mathord | RISING DIAGONAL CROSSING SOUTH EAST ARROW
U02931 | \neovnwarrow | R | \mathord | NORTH EAST ARROW CROSSING NORTH WEST ARROW
U02932 | \nwovnearrow | R | \mathord | NORTH WEST ARROW CROSSING NORTH EAST ARROW
U02934 | \uprightcurvearrow | R | \mathord | ARROW POINTING RIGHTWARDS THEN CURVING UPWARDS
U02935 | \downrightcurvedarrow | R | \mathord | ARROW POINTING RIGHTWARDS THEN CURVING DOWNWARDS
mathbin or mathord ?
U0002F | \mathslash | B | \mathord | SOLIDUS
U0005C | \backslash | B | \mathord | REVERSE SOLIDUS
U022B9 | \hermitmatrix | B | \mathord | HERMITIAN CONJUGATE MATRIX
U025B2 | \bigblacktriangleup | B | \mathord | BLACK UP-POINTING TRIANGLE
U025B4 | \blacktriangle | B | \mathord | BLACK UP-POINTING SMALL TRIANGLE
U025B5 | \vartriangle | B | \mathrel | WHITE UP-POINTING SMALL TRIANGLE
U025B6 | \blacktriangleright | B | \mathord | BLACK RIGHT-POINTING TRIANGLE
U025B8 | \smallblacktriangleright | B | \mathord | BLACK RIGHT-POINTING SMALL TRIANGLE
U025B9 | \smalltriangleright | B | \mathord | WHITE RIGHT-POINTING SMALL TRIANGLE
U025BC | \bigblacktriangledown | B | \mathord | BLACK DOWN-POINTING TRIANGLE
U025BD | \bigtriangledown | B | \mathord | WHITE DOWN-POINTING TRIANGLE
U025BE | \blacktriangledown | B | \mathord | BLACK DOWN-POINTING SMALL TRIANGLE
U025BF | \triangledown | B | \mathord | WHITE DOWN-POINTING SMALL TRIANGLE
U025C0 | \blacktriangleleft | B | \mathord | BLACK LEFT-POINTING TRIANGLE
U025C2 | \smallblacktriangleleft | B | \mathord | BLACK LEFT-POINTING SMALL TRIANGLE
U025C3 | \smalltriangleleft | B | \mathord | WHITE LEFT-POINTING SMALL TRIANGLE
U025C4 | \blackpointerleft | B | \mathord | BLACK LEFT-POINTING POINTER
U025C5 | \whitepointerleft | B | \mathord | WHITE LEFT-POINTING POINTER
U025CA | \mdlgwhtlozenge | B | \mathord | LOZENGE
U025E6 | \smwhtcircle | B | \mathord | WHITE BULLET
U025EC | \trianglecdot | B | \mathord | WHITE UP-POINTING TRIANGLE WITH DOT
U025F8 | \ultriangle | B | \mathord | UPPER LEFT TRIANGLE
U025F9 | \urtriangle | B | \mathord | UPPER RIGHT TRIANGLE
U025FA | \lltriangle | B | \mathord | LOWER LEFT TRIANGLE
U025FB | \mdwhtsquare | B | \mathord | WHITE MEDIUM SQUARE
U025FC | \mdblksquare | B | \mathord | BLACK MEDIUM SQUARE
U025FD | \mdsmwhtsquare | B | \mathord | WHITE MEDIUM SMALL SQUARE
U025FE | \mdsmblksquare | B | \mathord | BLACK MEDIUM SMALL SQUARE
U025FF | \lrtriangle | B | \mathord | LOWER RIGHT TRIANGLE
U02605 | \bigstar | B | \mathord | BLACK STAR
U02606 | \bigwhitestar | B | \mathord | WHITE STAR
bin or rel?
U02020 | \dagger | R | \mathbin | DAGGER
U02021 | \ddagger | R | \mathbin | DOUBLE DAGGER
U0210F | \hslash | N | \mathalpha | PLANCK CONSTANT OVER TWO PI
U02129 | \m?turnediota | N | \mathalpha | TURNED GREEK SMALL LETTER IOTA
U0213C | \Bbbpi | A | \mathord | DOUBLE-STRUCK SMALL PI
U0213E | \BbbGamma | N | \mathalpha | DOUBLE-STRUCK CAPITAL GAMMA
U0214B | \upand | N | \mathbin | TURNED AMPERSAND
U02202 | \partial | N | \mathalpha | PARTIAL DIFFERENTIAL
U02207 | \nabla | U | \mathalpha | NABLA
U023B4 | \overbracket | N | \mathover | TOP SQUARE BRACKET
U023B5 | \underbracket | N | \mathunder | BOTTOM SQUARE BRACKET
U023DC | \overparen | N | \mathover | TOP PARENTHESIS
U023DD | \underparen | N | \mathunder | BOTTOM PARENTHESIS
U023DE | \overbrace | N | \mathover | TOP CURLY BRACKET
U023DF | \underbrace | N | \mathunder | BOTTOM CURLY BRACKET
U029B5 | \circlehbar | N | \mathbin | CIRCLE WITH HORIZONTAL BAR
U029CD | \triangleserifs | N | \mathbin | TRIANGLE WITH SERIFS AT BOTTOM
Arabic operators
mathord or mathop?
U1EEF0 | \arabicmaj | A | \mathop | ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL U1EEF1 | \arabichad | A | \mathop | ARABIC MATHEMATICAL OPERATOR HAH WITH DAL
U1EEF0 (\arabicmaj
) is an alternative to \sum
so should have the same math class, which is \mathop
.
U1EEF1 (\arabichad
) is an alternative to \lim
, so should behave the same. \lim
is not a single symbol, but it is defined as \mathop{lim}
, so I think \mathop
is appropriate here.
Description
unicode.xml at https://github.com/w3c/xml-entities/ has data for unicode-math and MathClass-15 from TR25
I have updated it to track updates here and also added a comparison of the math classes assigned, in the stylesheet
comp-um.xsl
in that repository.The resulting log is included here. Some of the differences are intentional, but some are probably errors in MathClass-15 or unicode-math-table.
In the list below
***
denotes a character listed in MathClass-15 with no entry in unicode-math-table.???
denotes a csname unicode.xml associates with unicode-math that is not in unicode-math-table (all now fixed)!
denotes an entry in unicode-math that is not in MathClass-15Add info or delete as appropriate: