Closed fred-wang closed 4 years ago
I'm not sure why you opened this as a separate issues from #161. Is #161 about what should go in the spec (which I think is a pointer to some program friendly file format) and this issue is about what is in that external file/how it is organized?
Just in case you weren't aware, gperf (gnu) will generate a perfect hash for you. Perfect hashes can sometimes use a fair amount of space, so an alternative is "quasi-perfect" hashing. That allows for at most two probes into the hash table and can often significantly reduce the size of the table. There's probably an implementation that generates a table/hash function function for doing quasi-perfect hashing, but I didn't see it on the first page of a google search...
I'm not sure why you opened this as a separate issues from #161. Is #161 about what should go in the spec (which I think is a pointer to some program friendly file format) and this issue is about what is in that external file/how it is organized?
Just in case you weren't aware, gperf (gnu) will generate a perfect hash for you. Perfect hashes can sometimes use a fair amount of space, so an alternative is "quasi-perfect" hashing. That allows for at most two probes into the hash table and can often significantly reduce the size of the table. There's probably an implementation that generates a table/hash function function for doing quasi-perfect hashing, but I didn't see it on the first page of a google search...
I think @bfgeek proposal was actually a minimal perfect hash table https://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_perfect_hash_function (?)
Consensus from yesterday's meeting: @davidcarlisle will try to check the values to make them more consistent and reduce special cases.
@davidcarlisle How many categories remain after your changes?
@fred-wang The changes are mainly from @NSoiffer I've just been pushing through the resulting updated files, and I believe Neil is hoping to do at least one more round on this.
I also updated to Unicode 13, but not expecting that to affect MathML.
However as things stand now, if you ignore priority= (which isn't really a mathml-core thing) there are 17 different combinations of form, lspace, rspace
the form:... headings at
https://mathml-refresh.github.io/xml-entities/opdict.html
The report including priority and showing differences from Unicode TR25 is below.
The first part shows the priority values still need a bit of rationalisation but that's on Neil's radar (and doesn't affect core) the second part showing differences from the Mathclass-15 file is probably OK but we should (perhaps) coordinate with Murray and Barbara get the two back in sync at some point.
45 distinct priority values
Priority, (count)
010, (4)
020, (58)
030, (1) <semicolon>
040, (2) <comma> <invisible separator>
070, (2) <therefore> <because>
090, (5)
100, (3)
170, (9)
190, (1) <logical or>
200, (2) <multiple character operator: &&> <logical and>
230, (6)
240, (86)
260, (232)
265, (204)
270, (555)
275, (10)
290, (3)
300, (5)
310, (26)
320, (3)
330, (12)
340, (1) <wreath product>
350, (4)
390, (13)
400, (1) <middle dot>
410, (1) <circled times>
640, (1) <percent sign>
650, (2) <reverse solidus> <set minus>
670, (27)
680, (12)
690, (7)
700, (1) <vector or cross product>
720, (1) <multiple character operator: **>
730, (1) <circled dot operator>
740, (4)
780, (2) <multiple character operator: <>> <circumflex accent>
800, (4)
810, (2) <exclamation mark> <multiple character operator: !!>
820, (1) <multiple character operator: //>
825, (1) <commercial at>
835, (1) <question mark>
845, (3)
850, (1) <function application>
880, (58)
900, (2) <low line> <decimal separator key symbol>
----
Operator dictionary entries
for characters not listed in the Unicode TR25 MathClass file.
C0 Controls and Basic Latin
U00022 QUOTATION MARK
U00027 APOSTROPHE
C1 Controls and Latin-1 Supplement
U000B8 CEDILLA
Spacing Modifier Letters
U002C9 MODIFIER LETTER MACRON
U002CA MODIFIER LETTER ACUTE ACCENT
U002CB MODIFIER LETTER GRAVE ACCENT
U002CD MODIFIER LETTER LOW MACRON
U002DD DOUBLE ACUTE ACCENT
U002F7 MODIFIER LETTER LOW TILDE
General Punctuation
U02018 LEFT SINGLE QUOTATION MARK
U02019 RIGHT SINGLE QUOTATION MARK
U0201A SINGLE LOW-9 QUOTATION MARK
U0201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
U0201C LEFT DOUBLE QUOTATION MARK
U0201D RIGHT DOUBLE QUOTATION MARK
U0201E DOUBLE LOW-9 QUOTATION MARK
U0201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
U0203E OVERLINE
U02043 HYPHEN BULLET
Arrows
U021B4 RIGHTWARDS ARROW WITH CORNER DOWNWARDS
U021B5 DOWNWARDS ARROW WITH CORNER LEFTWARDS
U021B8 NORTH WEST ARROW TO LONG BAR
U021B9 LEFTWARDS ARROW TO BAR OVER RIGHTWARDS ARROW TO BAR
Miscellaneous Technical
U02301 ELECTRIC ARROW
U02329 LEFT-POINTING ANGLE BRACKET
U0232A RIGHT-POINTING ANGLE BRACKET
U0238B BROKEN CIRCLE WITH NORTHWEST ARROW
U02396 DECIMAL SEPARATOR KEY SYMBOL
U023CD SQUARE FOOT
Dingbats
U02758 LIGHT VERTICAL BAR
U02794 HEAVY WIDE-HEADED RIGHTWARDS ARROW
U02795 HEAVY PLUS SIGN
U02795 HEAVY PLUS SIGN
U02796 HEAVY MINUS SIGN
U02796 HEAVY MINUS SIGN
U02797 HEAVY DIVISION SIGN
U02798 HEAVY SOUTH EAST ARROW
U02799 HEAVY RIGHTWARDS ARROW
U0279A HEAVY NORTH EAST ARROW
U0279B DRAFTING POINT RIGHTWARDS ARROW
U0279C HEAVY ROUND-TIPPED RIGHTWARDS ARROW
U0279D TRIANGLE-HEADED RIGHTWARDS ARROW
U0279E HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW
U0279F DASHED TRIANGLE-HEADED RIGHTWARDS ARROW
U027A0 HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW
U027A1 BLACK RIGHTWARDS ARROW
U027A5 HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW
U027A6 HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW
U027A7 SQUAT BLACK RIGHTWARDS ARROW
U027A8 HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW
U027A9 RIGHT-SHADED WHITE RIGHTWARDS ARROW
U027AA LEFT-SHADED WHITE RIGHTWARDS ARROW
U027AB BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW
U027AC FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW
U027AD HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
U027AE HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
U027AF NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
U027B1 NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
U027B2 CIRCLED HEAVY WHITE RIGHTWARDS ARROW
U027B3 WHITE-FEATHERED RIGHTWARDS ARROW
U027B4 BLACK-FEATHERED SOUTH EAST ARROW
U027B5 BLACK-FEATHERED RIGHTWARDS ARROW
U027B6 BLACK-FEATHERED NORTH EAST ARROW
U027B7 HEAVY BLACK-FEATHERED SOUTH EAST ARROW
U027B8 HEAVY BLACK-FEATHERED RIGHTWARDS ARROW
U027B9 HEAVY BLACK-FEATHERED NORTH EAST ARROW
U027BA TEARDROP-BARBED RIGHTWARDS ARROW
U027BB HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW
U027BC WEDGE-TAILED RIGHTWARDS ARROW
U027BD HEAVY WEDGE-TAILED RIGHTWARDS ARROW
U027BE OPEN-OUTLINED RIGHTWARDS ARROW
Miscellaneous Symbols and Arrows
U02B45 LEFTWARDS QUADRUPLE ARROW
U02B46 RIGHTWARDS QUADRUPLE ARROW
U02B4D DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW
U02B4E SHORT SLANTED NORTH ARROW
U02B4F SHORT BACKSLANTED SOUTH ARROW
U02B5A SLANTED NORTH ARROW WITH HOOKED HEAD
U02B5B BACKSLANTED SOUTH ARROW WITH HOOKED TAIL
U02B5C SLANTED NORTH ARROW WITH HORIZONTAL TAIL
U02B5D BACKSLANTED SOUTH ARROW WITH HORIZONTAL TAIL
U02B5E BENT ARROW POINTING DOWNWARDS THEN NORTH EAST
U02B5F SHORT BENT ARROW POINTING DOWNWARDS THEN NORTH EAST
U02B60 LEFTWARDS TRIANGLE-HEADED ARROW
U02B61 UPWARDS TRIANGLE-HEADED ARROW
U02B62 RIGHTWARDS TRIANGLE-HEADED ARROW
U02B63 DOWNWARDS TRIANGLE-HEADED ARROW
U02B64 LEFT RIGHT TRIANGLE-HEADED ARROW
U02B65 UP DOWN TRIANGLE-HEADED ARROW
U02B66 NORTH WEST TRIANGLE-HEADED ARROW
U02B67 NORTH EAST TRIANGLE-HEADED ARROW
U02B68 SOUTH EAST TRIANGLE-HEADED ARROW
U02B69 SOUTH WEST TRIANGLE-HEADED ARROW
U02B6A LEFTWARDS TRIANGLE-HEADED DASHED ARROW
U02B6B UPWARDS TRIANGLE-HEADED DASHED ARROW
U02B6C RIGHTWARDS TRIANGLE-HEADED DASHED ARROW
U02B6D DOWNWARDS TRIANGLE-HEADED DASHED ARROW
U02B6E CLOCKWISE TRIANGLE-HEADED OPEN CIRCLE ARROW
U02B6F ANTICLOCKWISE TRIANGLE-HEADED OPEN CIRCLE ARROW
U02B70 LEFTWARDS TRIANGLE-HEADED ARROW TO BAR
U02B71 UPWARDS TRIANGLE-HEADED ARROW TO BAR
U02B72 RIGHTWARDS TRIANGLE-HEADED ARROW TO BAR
U02B73 DOWNWARDS TRIANGLE-HEADED ARROW TO BAR
U02B76 NORTH WEST TRIANGLE-HEADED ARROW TO BAR
U02B77 NORTH EAST TRIANGLE-HEADED ARROW TO BAR
U02B78 SOUTH EAST TRIANGLE-HEADED ARROW TO BAR
U02B79 SOUTH WEST TRIANGLE-HEADED ARROW TO BAR
U02B7A LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE
U02B7B UPWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE
U02B7C RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE
U02B7D DOWNWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE
U02B80 LEFTWARDS TRIANGLE-HEADED ARROW OVER RIGHTWARDS TRIANGLE-HEADED ARROW
U02B81 UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED ARROW
U02B82 RIGHTWARDS TRIANGLE-HEADED ARROW OVER LEFTWARDS TRIANGLE-HEADED ARROW
U02B83 DOWNWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF UPWARDS TRIANGLE-HEADED ARROW
U02B84 LEFTWARDS TRIANGLE-HEADED PAIRED ARROWS
U02B85 UPWARDS TRIANGLE-HEADED PAIRED ARROWS
U02B86 RIGHTWARDS TRIANGLE-HEADED PAIRED ARROWS
U02B87 DOWNWARDS TRIANGLE-HEADED PAIRED ARROWS
U02B88 LEFTWARDS BLACK CIRCLED WHITE ARROW
U02B89 UPWARDS BLACK CIRCLED WHITE ARROW
U02B8A RIGHTWARDS BLACK CIRCLED WHITE ARROW
U02B8B DOWNWARDS BLACK CIRCLED WHITE ARROW
U02B8C ANTICLOCKWISE TRIANGLE-HEADED RIGHT U-SHAPED ARROW
U02B8D ANTICLOCKWISE TRIANGLE-HEADED BOTTOM U-SHAPED ARROW
U02B8E ANTICLOCKWISE TRIANGLE-HEADED LEFT U-SHAPED ARROW
U02B8F ANTICLOCKWISE TRIANGLE-HEADED TOP U-SHAPED ARROW
U02B94 FOUR CORNER ARROWS CIRCLING ANTICLOCKWISE
U02B95 RIGHTWARDS BLACK ARROW
U02BA0 DOWNWARDS TRIANGLE-HEADED ARROW WITH LONG TIP LEFTWARDS
U02BA1 DOWNWARDS TRIANGLE-HEADED ARROW WITH LONG TIP RIGHTWARDS
U02BA2 UPWARDS TRIANGLE-HEADED ARROW WITH LONG TIP LEFTWARDS
U02BA3 UPWARDS TRIANGLE-HEADED ARROW WITH LONG TIP RIGHTWARDS
U02BA4 LEFTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP UPWARDS
U02BA5 RIGHTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP UPWARDS
U02BA6 LEFTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP DOWNWARDS
U02BA7 RIGHTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP DOWNWARDS
U02BA8 BLACK CURVED DOWNWARDS AND LEFTWARDS ARROW
U02BA9 BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW
U02BAA BLACK CURVED UPWARDS AND LEFTWARDS ARROW
U02BAB BLACK CURVED UPWARDS AND RIGHTWARDS ARROW
U02BAC BLACK CURVED LEFTWARDS AND UPWARDS ARROW
U02BAD BLACK CURVED RIGHTWARDS AND UPWARDS ARROW
U02BAE BLACK CURVED LEFTWARDS AND DOWNWARDS ARROW
U02BAF BLACK CURVED RIGHTWARDS AND DOWNWARDS ARROW
U02BB0 RIBBON ARROW DOWN LEFT
U02BB1 RIBBON ARROW DOWN RIGHT
U02BB2 RIBBON ARROW UP LEFT
U02BB3 RIBBON ARROW UP RIGHT
U02BB4 RIBBON ARROW LEFT UP
U02BB5 RIBBON ARROW RIGHT UP
U02BB6 RIBBON ARROW LEFT DOWN
U02BB7 RIBBON ARROW RIGHT DOWN
U02BB8 UPWARDS WHITE ARROW FROM BAR WITH HORIZONTAL BAR
U02BD1 UNCERTAINTY SIGN
Supplemental Arrows-C
U1F800 LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD
U1F801 UPWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD
U1F802 RIGHTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD
U1F803 DOWNWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD
U1F804 LEFTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD
U1F805 UPWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD
U1F806 RIGHTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD
U1F807 DOWNWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD
U1F808 LEFTWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
U1F809 UPWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
U1F80A RIGHTWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
U1F80B DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
U1F810 LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD
U1F811 UPWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD
U1F812 RIGHTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD
U1F813 DOWNWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD
U1F814 LEFTWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F815 UPWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F816 RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F817 DOWNWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F818 HEAVY LEFTWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F819 HEAVY UPWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F81A HEAVY RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F81B HEAVY DOWNWARDS ARROW WITH EQUILATERAL ARROWHEAD
U1F81C HEAVY LEFTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD
U1F81D HEAVY UPWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD
U1F81E HEAVY RIGHTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD
U1F81F HEAVY DOWNWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD
U1F820 LEFTWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT
U1F821 UPWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT
U1F822 RIGHTWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT
U1F823 DOWNWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT
U1F824 LEFTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT
U1F825 UPWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT
U1F826 RIGHTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT
U1F827 DOWNWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT
U1F828 LEFTWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT
U1F829 UPWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT
U1F82A RIGHTWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT
U1F82B DOWNWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT
U1F82C LEFTWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT
U1F82D UPWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT
U1F82E RIGHTWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT
U1F82F DOWNWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT
U1F830 LEFTWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT
U1F831 UPWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT
U1F832 RIGHTWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT
U1F833 DOWNWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT
U1F834 LEFTWARDS FINGER-POST ARROW
U1F835 UPWARDS FINGER-POST ARROW
U1F836 RIGHTWARDS FINGER-POST ARROW
U1F837 DOWNWARDS FINGER-POST ARROW
U1F838 LEFTWARDS SQUARED ARROW
U1F839 UPWARDS SQUARED ARROW
U1F83A RIGHTWARDS SQUARED ARROW
U1F83B DOWNWARDS SQUARED ARROW
U1F83C LEFTWARDS COMPRESSED ARROW
U1F83D UPWARDS COMPRESSED ARROW
U1F83E RIGHTWARDS COMPRESSED ARROW
U1F83F DOWNWARDS COMPRESSED ARROW
U1F840 LEFTWARDS HEAVY COMPRESSED ARROW
U1F841 UPWARDS HEAVY COMPRESSED ARROW
U1F842 RIGHTWARDS HEAVY COMPRESSED ARROW
U1F843 DOWNWARDS HEAVY COMPRESSED ARROW
U1F844 LEFTWARDS HEAVY ARROW
U1F845 UPWARDS HEAVY ARROW
U1F846 RIGHTWARDS HEAVY ARROW
U1F847 DOWNWARDS HEAVY ARROW
U1F850 LEFTWARDS SANS-SERIF ARROW
U1F851 UPWARDS SANS-SERIF ARROW
U1F852 RIGHTWARDS SANS-SERIF ARROW
U1F853 DOWNWARDS SANS-SERIF ARROW
U1F854 NORTH WEST SANS-SERIF ARROW
U1F855 NORTH EAST SANS-SERIF ARROW
U1F856 SOUTH EAST SANS-SERIF ARROW
U1F857 SOUTH WEST SANS-SERIF ARROW
U1F858 LEFT RIGHT SANS-SERIF ARROW
U1F859 UP DOWN SANS-SERIF ARROW
U1F860 WIDE-HEADED LEFTWARDS LIGHT BARB ARROW
U1F861 WIDE-HEADED UPWARDS LIGHT BARB ARROW
U1F862 WIDE-HEADED RIGHTWARDS LIGHT BARB ARROW
U1F863 WIDE-HEADED DOWNWARDS LIGHT BARB ARROW
U1F864 WIDE-HEADED NORTH WEST LIGHT BARB ARROW
U1F865 WIDE-HEADED NORTH EAST LIGHT BARB ARROW
U1F866 WIDE-HEADED SOUTH EAST LIGHT BARB ARROW
U1F867 WIDE-HEADED SOUTH WEST LIGHT BARB ARROW
U1F868 WIDE-HEADED LEFTWARDS BARB ARROW
U1F869 WIDE-HEADED UPWARDS BARB ARROW
U1F86A WIDE-HEADED RIGHTWARDS BARB ARROW
U1F86B WIDE-HEADED DOWNWARDS BARB ARROW
U1F86C WIDE-HEADED NORTH WEST BARB ARROW
U1F86D WIDE-HEADED NORTH EAST BARB ARROW
U1F86E WIDE-HEADED SOUTH EAST BARB ARROW
U1F86F WIDE-HEADED SOUTH WEST BARB ARROW
U1F870 WIDE-HEADED LEFTWARDS MEDIUM BARB ARROW
U1F871 WIDE-HEADED UPWARDS MEDIUM BARB ARROW
U1F872 WIDE-HEADED RIGHTWARDS MEDIUM BARB ARROW
U1F873 WIDE-HEADED DOWNWARDS MEDIUM BARB ARROW
U1F874 WIDE-HEADED NORTH WEST MEDIUM BARB ARROW
U1F875 WIDE-HEADED NORTH EAST MEDIUM BARB ARROW
U1F876 WIDE-HEADED SOUTH EAST MEDIUM BARB ARROW
U1F877 WIDE-HEADED SOUTH WEST MEDIUM BARB ARROW
U1F878 WIDE-HEADED LEFTWARDS HEAVY BARB ARROW
U1F879 WIDE-HEADED UPWARDS HEAVY BARB ARROW
U1F87A WIDE-HEADED RIGHTWARDS HEAVY BARB ARROW
U1F87B WIDE-HEADED DOWNWARDS HEAVY BARB ARROW
U1F87C WIDE-HEADED NORTH WEST HEAVY BARB ARROW
U1F87D WIDE-HEADED NORTH EAST HEAVY BARB ARROW
U1F87E WIDE-HEADED SOUTH EAST HEAVY BARB ARROW
U1F87F WIDE-HEADED SOUTH WEST HEAVY BARB ARROW
U1F880 WIDE-HEADED LEFTWARDS VERY HEAVY BARB ARROW
U1F881 WIDE-HEADED UPWARDS VERY HEAVY BARB ARROW
U1F882 WIDE-HEADED RIGHTWARDS VERY HEAVY BARB ARROW
U1F883 WIDE-HEADED DOWNWARDS VERY HEAVY BARB ARROW
U1F884 WIDE-HEADED NORTH WEST VERY HEAVY BARB ARROW
U1F885 WIDE-HEADED NORTH EAST VERY HEAVY BARB ARROW
U1F886 WIDE-HEADED SOUTH EAST VERY HEAVY BARB ARROW
U1F887 WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
U1F898 LEFTWARDS ARROW WITH NOTCHED TAIL
U1F899 UPWARDS ARROW WITH NOTCHED TAIL
U1F89A RIGHTWARDS ARROW WITH NOTCHED TAIL
U1F89B DOWNWARDS ARROW WITH NOTCHED TAIL
U1F8A0 LEFTWARDS BOTTOM-SHADED WHITE ARROW
U1F8A1 RIGHTWARDS BOTTOM SHADED WHITE ARROW
U1F8A2 LEFTWARDS TOP SHADED WHITE ARROW
U1F8A3 RIGHTWARDS TOP SHADED WHITE ARROW
U1F8A4 LEFTWARDS LEFT-SHADED WHITE ARROW
U1F8A5 RIGHTWARDS RIGHT-SHADED WHITE ARROW
U1F8A6 LEFTWARDS RIGHT-SHADED WHITE ARROW
U1F8A7 RIGHTWARDS LEFT-SHADED WHITE ARROW
U1F8A8 LEFTWARDS BACK-TILTED SHADOWED WHITE ARROW
U1F8A9 RIGHTWARDS BACK-TILTED SHADOWED WHITE ARROW
U1F8AA LEFTWARDS FRONT-TILTED SHADOWED WHITE ARROW
U1F8AB RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW
@davidcarlisle Do you plan to merge more? In particular how important is it to keep specific categories for some isolated values with lspace != rspace?
I would probably suggest merging some of them, but as I say @NSoiffer has more changes planned, so I was planning on waiting for that to end before really reviewing this.
Certainly TeX gets away with fewer space categories, with only three non zero spaces ever automatically added: thin medium and thick, which are theoretically user-settable but are nearly always the latex and plain tex defaults
\thinmuskip=3mu
\medmuskip=4mu plus 2mu minus 4mu
\thickmuskip=5mu plus 5mu
where 1mu =1/18 em
OK, let's wait for @NSoiffer
Here is a quick analysis on my side:
I think complement "∁" has prefix form and should be moved into an existing prefix category. Then I'm not sure how important it is to keep a single category for ":"? It seems fine to me to use a default symmetric spacing for this one, it can be used as a separator or as a binary operator (Note that in text, some languages use a spacing before ":"). So I would merge it into "form:infix lspace:2 rspace:2" for example or "form:infix lspace:1 rspace:1".
I guess unbalanced spacing separators is still important so we can't remove category "form:infix lspace:0 rspace:3", right?
How important is the expact spacing for postfix "♭", "♮", "♯", "!" and "!!" ? They don't seem to have a clear default spacing to me. Can't we merge them into a single category with zero lspace and rspace nonzero? Or even just into "form:postfix lspace:0 rspace:0"?
How important is the category for "form:prefix lspace:1 rspace:1"? I don't think people use this square root operator as a single mo, they would instead use the msqrt or mroot element. So I would just drop them from the operator dictionary or otherwise merge into another arbitrary existing prefix category.
I still don't quite understand what is the distinction between "form:prefix lspace:1 rspace:2" and "form:prefix lspace:3 rspace:3". Maybe it's integral VS non-integral but treating ∑ and ∏ differently seems dubious to me. Can we merge them into a single category?
I guess unbalanced spacing differential operators is still important so we can't remove "form:prefix lspace:3 rspace:0", right?
How important is the "form:prefix lspace:2 rspace:1"? Can't we merge it with another existing category with balanced spacing or with lspace > rspace?
Some characters will likely go away including the musical notation signs (and hence their spacing character), but I'm spending time for each character trying to find whether they have a mathematical usage and if so, what it is. I'm currently sifting through the priority 265 symbols and either removing them or moving them to a more appropriate place. That sometimes involves changing their form and also their spacing. Once I'm done with that, I'm going to review spacing for what remains.
On Thu, Mar 19, 2020 at 5:05 AM Frédéric Wang notifications@github.com wrote:
OK, let's wait for @NSoiffer https://github.com/NSoiffer
Here is a quick analysis on my side:
I think complement "∁" has prefix form and should be moved into an existing prefix category. Then I'm not sure how important it is to keep a single category for ":"? It seems fine to me to use a default symmetric spacing for this one, it can be used as a separator or as a binary operator (Note that in text, some languages use a spacing before ":"). So I would merge it into "form:infix lspace:2 rspace:2" for example or "form:infix lspace:1 rspace:1".
I guess unbalanced spacing separators is still important so we can't remove category "form:infix lspace:0 rspace:3", right?
How important is the expact spacing for postfix "♭", "♮", "♯", "!" and "!!" ? They don't seem to have a clear default spacing to me. Can't we merge them into a single category with zero lspace and rspace nonzero? Or even just into "form:postfix lspace:0 rspace:0"?
How important is the category for "form:prefix lspace:1 rspace:1"? I don't think people use this square root operator as a single mo, they would instead use the msqrt or mroot element. So I would just drop them from the operator dictionary or otherwise merge into another arbitrary existing prefix category.
I still don't quite understand what is the distinction between "form:prefix lspace:1 rspace:2" and "form:prefix lspace:3 rspace:3". Maybe it's integral VS non-integral but treating ∑ and ∏ differently seems dubious to me. Can we merge them into a single category?
I guess unbalanced spacing differential operators is still important so we can't remove "form:prefix lspace:3 rspace:0", right?
How important is the "form:prefix lspace:2 rspace:1"? Can't we merge it with another existing category with balanced spacing or with lspace > rspace?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mathml-refresh/mathml/issues/176#issuecomment-601141649, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3GYQLRCK5IY5EAA3Z3RIIDAJANCNFSM4JOQZ6LQ .
"a:b" in TeX seems to give wide symmetric spacing, probably colon should be in lspace:4 rspace:4 or lspace:3 rspace:3?
What is the logic behind the two categories "form:prefix lspace:1 rspace:2" and "form:prefix lspace:3 rspace:3" for largeop/integral ? It seems some "sums" are in the former.
This is the current output from the compact form script ( https://mathml-refresh.github.io/xml-entities/opdict.html#compressed only gives spacing, not properties):
('infixEntriesWithDefaultValues', 771)
['[U+003C, U+003E]', '[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B8]', '[U+21BA, U+21BB]', '[U+21CD, U+21CF]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2208, U+220D]', 'U+2219', 'U+221D', 'U+2223', '[U+2225, U+2226]', '[U+2234, U+2235]', 'U+2237', '[U+2239, U+223E]', '[U+2241, U+228B]', '[U+2290, U+2292]', '[U+229A, U+229C]', '[U+22A2, U+22BA]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22ED]', '[U+22F2, U+22FF]', 'U+2301', 'U+237C', 'U+238B', 'U+2758', 'U+2794', '[U+2798, U+27A1]', '[U+27A5, U+27AF]', '[U+27B1, U+27BE]', '[U+27F2, U+27F3]', '[U+2900, U+2909]', 'U+2911', '[U+2914, U+2920]', '[U+2923, U+294D]', '[U+2962, U+296D]', '[U+2970, U+297F]', '[U+29B6, U+29BB]', '[U+29BD, U+29C1]', '[U+29C4, U+29C8]', '[U+29CE, U+29D7]', 'U+29E1', '[U+29E3, U+29E6]', '[U+29F4, U+29F5]', 'U+29F7', 'U+2A3E', '[U+2A64, U+2AD9]', '[U+2ADE, U+2AEB]', '[U+2AEE, U+2AFA]', '[U+2B00, U+2B11]', '[U+2B30, U+2B31]', '[U+2B33, U+2B44]', '[U+2B47, U+2B4F]', '[U+2B5A, U+2B73]', '[U+2B76, U+2B7D]', '[U+2B80, U+2B8F]', '[U+2B94, U+2B95]', '[U+2BA0, U+2BB8]', 'U+2BD1', '[U+1F800, U+1F80B]', '[U+1F810, U+1F847]', '[U+1F850, U+1F859]', '[U+1F860, U+1F887]', '[U+1F898, U+1F89B]', '[U+1F8A0, U+1F8AB]']
('infixEntriesWithSpacing5AndStretchy', 138)
['[U+2190, U+2199]', '[U+219C, U+21AD]', '[U+21AF, U+21B5]', 'U+21B9', '[U+21BC, U+21CC]', '[U+21D0, U+21DD]', '[U+21E0, U+21F0]', 'U+21F3', '[U+21F5, U+21F6]', '[U+21FD, U+21FF]', '[U+27F0, U+27F1]', '[U+27F5, U+27FF]', '[U+290A, U+2910]', '[U+2912, U+2913]', '[U+2921, U+2922]', '[U+294E, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']
('infixEntriesWithSpacing4', 100)
['U+002B', 'U+002D', 'U+002F', 'U+00B1', 'U+00F7', '[U+2212, U+2214]', 'U+2216', 'U+2218', 'U+2224', '[U+2227, U+222A]', 'U+2236', 'U+2238', '[U+228C, U+228F]', '[U+2293, U+2296]', 'U+2298', '[U+229D, U+229F]', '[U+22BB, U+22BD]', 'U+22C4', 'U+22C6', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+2795, U+2797]', 'U+27F4', 'U+29BC', 'U+29F6', '[U+2A22, U+2A2E]', '[U+2A38, U+2A3A]', '[U+2A40, U+2A4F]', '[U+2A51, U+2A63]', '[U+2ADA, U+2ADB]', 'U+2AFB', 'U+2AFD', 'U+2B32']
('infixEntriesWithSpacing3', 84)
['U+0025', 'U+002A', 'U+002E', 'U+00B7', 'U+00D7', 'U+2022', 'U+2043', 'U+2206', 'U+220E', 'U+2217', '[U+223F, U+2240]', 'U+2297', 'U+2299', '[U+22A0, U+22A1]', 'U+22C5', 'U+22C7', '[U+22C9, U+22CC]', '[U+2305, U+2306]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+299A]', 'U+29B5', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', 'U+29DB', '[U+29DF, U+29E0]', 'U+29E2', '[U+29E7, U+29ED]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', '[U+2A2F, U+2A37]', '[U+2A3B, U+2A3D]', 'U+2A3F', 'U+2A50', '[U+2ADC, U+2ADD]', 'U+2AFE']
('prefixEntriesWithLspace0Rspace0', 49)
['U+0021', 'U+002B', 'U+002D', 'U+00AC', 'U+00B1', '[U+2200, U+2201]', '[U+2203, U+2204]', 'U+2207', '[U+2212, U+2213]', '[U+221B, U+221C]', '[U+221F, U+2222]', 'U+223C', '[U+22BE, U+22BF]', 'U+2310', 'U+2319', '[U+2795, U+2796]', 'U+27C0', '[U+299B, U+29AF]', '[U+2AEC, U+2AED]']
('postfixEntriesWithLspace0Rspace0', 33)
['[U+0021, U+0022]', '[U+0026, U+0027]', 'U+0060', 'U+00A8', 'U+00B0', '[U+00B2, U+00B4]', '[U+00B8, U+00B9]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+201A, U+201B]', '[U+201E, U+201F]', '[U+2032, U+2037]', 'U+2057', '[U+20DB, U+20DC]', 'U+23CD']
('prefixEntriesWithSpacing0AndStretchySymmetricFence', 25)
['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']
('postfixEntriesWithSpacing0AndStretchySymmetricFence', 25)
['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']
('postfixEntriesWithLspace0Rspace0AndStretchy', 24)
['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+203E', '[U+2322, U+2323]', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]', '[U+1EEF0, U+1EEF1]']
('prefixEntriesWithLspace3Rspace3AndSymmetricLargeop', 22)
['[U+222B, U+2233]', '[U+2A0B, U+2A0F]', '[U+2A15, U+2A1C]']
('prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop', 18)
['[U+220F, U+2210]', '[U+22C0, U+22C3]', '[U+2A00, U+2A09]', 'U+2AFC', 'U+2AFF']
('otherEntries', 35)
* {'lspace': 3, 'rspace': 3, 'form': 'prefix', 'properties': {'symmetric': True, 'movablelimits': True, 'largeop': True}}: 7
['U+2211', 'U+2A0A', 'U+2A10', 'U+2A11', 'U+2A12', 'U+2A13', 'U+2A14']
* {'lspace': 0, 'rspace': 0, 'form': 'infix'}: 5
['U+005C', 'U+2061', 'U+2062', 'U+2064', 'U+2396']
* {'lspace': 1, 'rspace': 1, 'form': 'infix'}: 4
['U+003F', 'U+0040', 'U+005E', 'U+005F']
* {'lspace': 3, 'rspace': 0, 'form': 'prefix'}: 3
['U+2145', 'U+2146', 'U+2202']
* {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}: 2
['U+2018', 'U+201C']
* {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}: 2
['U+2019', 'U+201D']
* {'lspace': 0, 'rspace': 3, 'form': 'infix', 'properties': {'separator': True}}: 2
['U+002C', 'U+003B']
* {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True, 'fence': True}}: 2
['U+2016', 'U+2980']
* {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'stretchy': True, 'fence': True}}: 2
['U+2016', 'U+2980']
* {'lspace': 4, 'rspace': 4, 'form': 'infix', 'properties': {'stretchy': True}}: 2
['U+2044', 'U+2215']
* {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}: 1
['U+007C']
* {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True}}: 1
['U+221A']
* {'lspace': 0, 'rspace': 0, 'form': 'infix', 'properties': {'separator': True}}: 1
['U+2063']
* {'lspace': 1, 'rspace': 2, 'form': 'infix'}: 1
['U+003A']
('entriesWithMultipleCharacters', 46)
* -= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
* ||| infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
* /= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* := infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* || postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
* ⪰̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* <= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ||| postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
* ≂̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⩾̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* *= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* -> infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ≦̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⧏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ||| prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
* ≿̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ∽̱ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
* ⧐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* <> infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
* += infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
* != infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⩽̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* // infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
* !! postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* >= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* || prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
* ⪡̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ≎̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* .. postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* ≏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ** infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
* ... postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* ≫̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊃⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⪯̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ++ postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* -- postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* ≪̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊂⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* || infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
* == infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⪢̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* && infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
* ⫝̸ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
"a:b" in TeX seems to give wide symmetric spacing, probably colon should be in lspace:4 rspace:4 or lspace:3 rspace:3?
in standard tex :
is \mathrel (so left and right space 5mu by default)
\colon is the same glyph but \mathpunct so left 0mu and right 3mu)
amsmath changes \colon to be a spacier version with left 2mu and right 6mu
\documentclass{article}
\usepackage{amsmath}
\showoutput
\begin{document}
$a{:}b$
$a:b$
$a\colon b$
\end{document}
mathml doesn't really have the distinction of : and \colon you need to pick one use as the default.
The current entry of
<operator-dictionary priority="100" form="infix" lspace="1" rspace="2"/>
is asymmetric so for the same interpretation as \colon, but less spacy. Neil?
you want the symmetric spacing for use in ratios 50 : 50
but the asymmetric spacing (which is more common in more technical math use) in f: x → y
@fred-wang : why break out multichar chars in the table into their own category? I thought the goal was to minimize the size of the operator dictionary in the core spec. Most of the entries would belong to existing groupings.
I've been trying to decide what to do about colon, which is why I haven't changed its values yet. I've written down what I found in https://github.com/mathml-refresh/mathml/issues/87#issuecomment-612544574, which is where this discussion properly belongs.
@fred-wang : why break out multichar chars in the table into their own category? I thought the goal was to minimize the size of the operator dictionary in the core spec. Most of the entries would belong to existing groupings.
The final script still depends on what the possible values will be. The general rule of thumb is still to try to reduce possible values as much as possible, independently on how the keys will be handled.
Regarding keys, strings in browsers are heavy objects, see [1] [2]. So to minimize space it seems optimal use single UTF-16 characters (only 2 bytes, less than any concept of generic 16-bits strings) which cover most of the operators but the non-BMP ones (only two of them so can easily be handled separately) and the multiple chars (for which we can maybe find a clever handling, e.g. the non-ASCII strings are always 'lspace': 5, 'rspace': '5').
[1] https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/wtf/text/README.md (webkit is similar) [2] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings
@NSoiffer There are still inconsistencies with largeop. Some of them are symmetric+largeop+movablelimits (e.g. anticlockwise integration) others are just symmetric+largeop (e.g. integral). Why can't we just make all of them symmetric+largeop+movablelimits?
I'm not sure people should use radical as mo, can we please remove them? Or at least make square root not stretchy so we don't have a special case:
√ √ square root prefix 845 1 1 stretchy
∛ ∛ cube root prefix 845 1 1
∜ ∜ fourth root prefix 845 1 1
Can we move U+007C to any other existing category? Do we really need to make fraction slash and division slash stretchy by default?
Script updated a bit (fences/separators are handled in separate table now). This is the current output. I feel like largeop could still be make more consistent and that we could reduce special cases (cf otherEntries table):
('infixEntriesWithDefaultValues', 772)
['[U+003C, U+003E]', '[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B8]', '[U+21BA, U+21BB]', '[U+21CD, U+21CF]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2208, U+220D]', 'U+2219', 'U+221D', 'U+2223', '[U+2225, U+2226]', '[U+2234, U+2235]', 'U+2237', '[U+2239, U+223E]', '[U+2241, U+228B]', '[U+2290, U+2292]', '[U+229A, U+229C]', '[U+22A2, U+22BA]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22ED]', '[U+22F2, U+22FF]', 'U+2301', 'U+237C', 'U+238B', 'U+2758', 'U+2794', '[U+2798, U+27A1]', '[U+27A5, U+27AF]', '[U+27B1, U+27BE]', 'U+27DF', '[U+27F2, U+27F3]', '[U+2900, U+2909]', 'U+2911', '[U+2914, U+2920]', '[U+2923, U+294D]', '[U+2962, U+296D]', '[U+2970, U+297F]', '[U+29B6, U+29BB]', '[U+29BD, U+29C1]', '[U+29C4, U+29C8]', '[U+29CE, U+29D7]', 'U+29E1', '[U+29E3, U+29E6]', '[U+29F4, U+29F5]', 'U+29F7', 'U+2A3E', '[U+2A64, U+2AD9]', '[U+2ADE, U+2AEB]', '[U+2AEE, U+2AFA]', '[U+2B00, U+2B11]', '[U+2B30, U+2B31]', '[U+2B33, U+2B44]', '[U+2B47, U+2B4F]', '[U+2B5A, U+2B73]', '[U+2B76, U+2B7D]', '[U+2B80, U+2B8F]', '[U+2B94, U+2B95]', '[U+2BA0, U+2BB8]', 'U+2BD1', '[U+1F800, U+1F80B]', '[U+1F810, U+1F847]', '[U+1F850, U+1F859]', '[U+1F860, U+1F887]', '[U+1F898, U+1F89B]', '[U+1F8A0, U+1F8AB]']
('infixEntriesWithSpacing5AndStretchy', 138)
['[U+2190, U+2199]', '[U+219C, U+21AD]', '[U+21AF, U+21B5]', 'U+21B9', '[U+21BC, U+21CC]', '[U+21D0, U+21DD]', '[U+21E0, U+21F0]', 'U+21F3', '[U+21F5, U+21F6]', '[U+21FD, U+21FF]', '[U+27F0, U+27F1]', '[U+27F5, U+27FF]', '[U+290A, U+2910]', '[U+2912, U+2913]', '[U+2921, U+2922]', '[U+294E, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']
('infixEntriesWithSpacing4', 100)
['U+002B', 'U+002D', 'U+002F', 'U+00B1', 'U+00F7', '[U+2212, U+2214]', 'U+2216', 'U+2218', 'U+2224', '[U+2227, U+222A]', 'U+2236', 'U+2238', '[U+228C, U+228F]', '[U+2293, U+2296]', 'U+2298', '[U+229D, U+229F]', '[U+22BB, U+22BD]', 'U+22C4', 'U+22C6', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+2795, U+2797]', 'U+27F4', 'U+29BC', 'U+29F6', '[U+2A22, U+2A2E]', '[U+2A38, U+2A3A]', '[U+2A40, U+2A4F]', '[U+2A51, U+2A63]', '[U+2ADA, U+2ADB]', 'U+2AFB', 'U+2AFD', 'U+2B32']
('infixEntriesWithSpacing3', 85)
['U+0025', 'U+002A', 'U+002E', 'U+0040', 'U+00B7', 'U+00D7', 'U+2022', 'U+2043', 'U+2206', 'U+220E', 'U+2217', '[U+223F, U+2240]', 'U+2297', 'U+2299', '[U+22A0, U+22A1]', 'U+22C5', 'U+22C7', '[U+22C9, U+22CC]', '[U+2305, U+2306]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+299A]', 'U+29B5', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', 'U+29DB', '[U+29DF, U+29E0]', 'U+29E2', '[U+29E7, U+29ED]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', '[U+2A2F, U+2A37]', '[U+2A3B, U+2A3D]', 'U+2A3F', 'U+2A50', '[U+2ADC, U+2ADD]', 'U+2AFE']
('prefixEntriesWithLspace0Rspace0', 51)
['U+0021', 'U+002B', 'U+002D', 'U+00AC', 'U+00B1', 'U+2018', 'U+201C', '[U+2200, U+2201]', '[U+2203, U+2204]', 'U+2207', '[U+2212, U+2213]', '[U+221B, U+221C]', '[U+221F, U+2222]', 'U+223C', '[U+22BE, U+22BF]', 'U+2310', 'U+2319', '[U+2795, U+2796]', 'U+27C0', '[U+299B, U+29AF]', '[U+2AEC, U+2AED]']
('postfixEntriesWithLspace0Rspace0', 35)
['[U+0021, U+0022]', '[U+0026, U+0027]', 'U+0060', 'U+00A8', 'U+00B0', '[U+00B2, U+00B4]', '[U+00B8, U+00B9]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+2019, U+201B]', '[U+201D, U+201F]', '[U+2032, U+2037]', 'U+2057', '[U+20DB, U+20DC]', 'U+23CD']
('postfixEntriesWithLspace0Rspace0AndStretchy', 26)
['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+2016', 'U+203E', '[U+2322, U+2323]', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]', 'U+2980', '[U+1EEF0, U+1EEF1]']
('prefixEntriesWithSpacing0AndStretchySymmetric', 25)
['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']
('postfixEntriesWithSpacing0AndStretchySymmetric', 25)
['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']
('prefixEntriesWithLspace3Rspace3AndSymmetricLargeop', 22)
['[U+222B, U+2233]', '[U+2A0B, U+2A0F]', '[U+2A15, U+2A1C]']
('prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop', 18)
['[U+220F, U+2210]', '[U+22C0, U+22C3]', '[U+2A00, U+2A09]', 'U+2AFC', 'U+2AFF']
('prefixEntriesWithLspace3Rspace3AndSymmetricMovablelimitsLargeop', 7)
['U+2211', 'U+2A0A', '[U+2A10, U+2A14]']
('otherEntries', 21)
* {'lspace': 0, 'rspace': 0, 'form': 'infix'}: 6
['U+005C', 'U+2061', 'U+2062', 'U+2063', 'U+2064', 'U+2396']
* {'lspace': 3, 'rspace': 0, 'form': 'prefix'}: 3
['U+2145', 'U+2146', 'U+2202']
* {'lspace': 1, 'rspace': 1, 'form': 'infix'}: 3
['U+003F', 'U+005E', 'U+005F']
* {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True}}: 3
['U+2016', 'U+221A', 'U+2980']
* {'lspace': 0, 'rspace': 3, 'form': 'infix'}: 3
['U+002C', 'U+003A', 'U+003B']
* {'lspace': 4, 'rspace': 4, 'form': 'infix', 'properties': {'stretchy': True}}: 2
['U+2044', 'U+2215']
* {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True}}: 1
['U+007C']
Separate table for multiple characters:
('entriesWithMultipleCharacters', 46)
* -= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
* ||| infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'fence': True}}
* /= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* := infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* || postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}
* ⪰̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* <= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ||| postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}
* ≂̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⩾̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* *= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* -> infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ≦̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⧏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ||| prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}
* ≿̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ∽̱ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
* ⧐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* <> infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
* += infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
* != infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⩽̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* // infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
* !! postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* >= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* || prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}
* ⪡̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ≎̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* .. postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* ≏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ** infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
* ... postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* ≫̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊃⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⪯̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ++ postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* -- postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
* ≪̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⊂⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* || infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'fence': True}}
* == infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* ⪢̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
* && infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
* ⫝̸ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
Separate tables for fences and separators:
('fences', 59)
['[U+0028, U+0029]', 'U+005B', 'U+005D', '[U+007B, U+007C]', 'U+007C', '[U+007C, U+007D]', 'U+2016', 'U+2016', '[U+2018, U+2019]', '[U+201C, U+201D]', '[U+2308, U+230B]', '[U+2329, U+232A]', '[U+2772, U+2773]', '[U+27E6, U+27EF]', 'U+2980', 'U+2980', '[U+2983, U+2998]', '[U+29FC, U+29FD]']
('separators', 3)
['U+002C', 'U+003B', 'U+2063']
Trying to put multichars into existing categories:
infixEntriesWithDefaultValues
singleChar (772): ['[U+003C, U+003E]', '[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B8]', '[U+21BA, U+21BB]', '[U+21CD, U+21CF]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2208, U+220D]', 'U+2219', 'U+221D', 'U+2223', '[U+2225, U+2226]', '[U+2234, U+2235]', 'U+2237', '[U+2239, U+223E]', '[U+2241, U+228B]', '[U+2290, U+2292]', '[U+229A, U+229C]', '[U+22A2, U+22BA]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22ED]', '[U+22F2, U+22FF]', 'U+2301', 'U+237C', 'U+238B', 'U+2758', 'U+2794', '[U+2798, U+27A1]', '[U+27A5, U+27AF]', '[U+27B1, U+27BE]', 'U+27DF', '[U+27F2, U+27F3]', '[U+2900, U+2909]', 'U+2911', '[U+2914, U+2920]', '[U+2923, U+294D]', '[U+2962, U+296D]', '[U+2970, U+297F]', '[U+29B6, U+29BB]', '[U+29BD, U+29C1]', '[U+29C4, U+29C8]', '[U+29CE, U+29D7]', 'U+29E1', '[U+29E3, U+29E6]', '[U+29F4, U+29F5]', 'U+29F7', 'U+2A3E', '[U+2A64, U+2AD9]', '[U+2ADE, U+2AEB]', '[U+2AEE, U+2AFA]', '[U+2B00, U+2B11]', '[U+2B30, U+2B31]', '[U+2B33, U+2B44]', '[U+2B47, U+2B4F]', '[U+2B5A, U+2B73]', '[U+2B76, U+2B7D]', '[U+2B80, U+2B8F]', '[U+2B94, U+2B95]', '[U+2BA0, U+2BB8]', 'U+2BD1', '[U+1F800, U+1F80B]', '[U+1F810, U+1F847]', '[U+1F850, U+1F859]', '[U+1F860, U+1F887]', '[U+1F898, U+1F89B]', '[U+1F8A0, U+1F8AB]']
multipleChar (27): '!=' '*=' '->' '/=' ':=' '<=' '==' '>=' '≂̸' '≎̸' '≏̸' '≦̸' '≪̸' '≫̸' '≿̸' '⊂⃒' '⊃⃒' '⊏̸' '⊐̸' '⧏̸' '⧐̸' '⩽̸' '⩾̸' '⪡̸' '⪢̸' '⪯̸' '⪰̸'
infixEntriesWithSpacing5AndStretchy
singleChar (138): ['[U+2190, U+2199]', '[U+219C, U+21AD]', '[U+21AF, U+21B5]', 'U+21B9', '[U+21BC, U+21CC]', '[U+21D0, U+21DD]', '[U+21E0, U+21F0]', 'U+21F3', '[U+21F5, U+21F6]', '[U+21FD, U+21FF]', '[U+27F0, U+27F1]', '[U+27F5, U+27FF]', '[U+290A, U+2910]', '[U+2912, U+2913]', '[U+2921, U+2922]', '[U+294E, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']
infixEntriesWithSpacing4
singleChar (100): ['U+002B', 'U+002D', 'U+002F', 'U+00B1', 'U+00F7', '[U+2212, U+2214]', 'U+2216', 'U+2218', 'U+2224', '[U+2227, U+222A]', 'U+2236', 'U+2238', '[U+228C, U+228F]', '[U+2293, U+2296]', 'U+2298', '[U+229D, U+229F]', '[U+22BB, U+22BD]', 'U+22C4', 'U+22C6', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+2795, U+2797]', 'U+27F4', 'U+29BC', 'U+29F6', '[U+2A22, U+2A2E]', '[U+2A38, U+2A3A]', '[U+2A40, U+2A4F]', '[U+2A51, U+2A63]', '[U+2ADA, U+2ADB]', 'U+2AFB', 'U+2AFD', 'U+2B32']
multipleChar (3): '&&' '+=' '-='
infixEntriesWithSpacing3
singleChar (85): ['U+0025', 'U+002A', 'U+002E', 'U+0040', 'U+00B7', 'U+00D7', 'U+2022', 'U+2043', 'U+2206', 'U+220E', 'U+2217', '[U+223F, U+2240]', 'U+2297', 'U+2299', '[U+22A0, U+22A1]', 'U+22C5', 'U+22C7', '[U+22C9, U+22CC]', '[U+2305, U+2306]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+299A]', 'U+29B5', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', 'U+29DB', '[U+29DF, U+29E0]', 'U+29E2', '[U+29E7, U+29ED]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', '[U+2A2F, U+2A37]', '[U+2A3B, U+2A3D]', 'U+2A3F', 'U+2A50', '[U+2ADC, U+2ADD]', 'U+2AFE']
multipleChar (4): '||' '|||' '∽̱' '⫝̸'
prefixEntriesWithLspace0Rspace0
singleChar (51): ['U+0021', 'U+002B', 'U+002D', 'U+00AC', 'U+00B1', 'U+2018', 'U+201C', '[U+2200, U+2201]', '[U+2203, U+2204]', 'U+2207', '[U+2212, U+2213]', '[U+221B, U+221C]', '[U+221F, U+2222]', 'U+223C', '[U+22BE, U+22BF]', 'U+2310', 'U+2319', '[U+2795, U+2796]', 'U+27C0', '[U+299B, U+29AF]', '[U+2AEC, U+2AED]']
multipleChar (2): '||' '|||'
postfixEntriesWithLspace0Rspace0
singleChar (35): ['[U+0021, U+0022]', '[U+0026, U+0027]', 'U+0060', 'U+00A8', 'U+00B0', '[U+00B2, U+00B4]', '[U+00B8, U+00B9]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+2019, U+201B]', '[U+201D, U+201F]', '[U+2032, U+2037]', 'U+2057', '[U+20DB, U+20DC]', 'U+23CD']
multipleChar (7): '!!' '++' '--' '..' '...' '||' '|||'
postfixEntriesWithLspace0Rspace0AndStretchy
singleChar (26): ['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+2016', 'U+203E', '[U+2322, U+2323]', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]', 'U+2980', '[U+1EEF0, U+1EEF1]']
prefixEntriesWithSpacing0AndStretchySymmetric
singleChar (25): ['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']
postfixEntriesWithSpacing0AndStretchySymmetric
singleChar (25): ['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']
prefixEntriesWithLspace3Rspace3AndSymmetricLargeop
singleChar (22): ['[U+222B, U+2233]', '[U+2A0B, U+2A0F]', '[U+2A15, U+2A1C]']
prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop
singleChar (18): ['[U+220F, U+2210]', '[U+22C0, U+22C3]', '[U+2A00, U+2A09]', 'U+2AFC', 'U+2AFF']
prefixEntriesWithLspace3Rspace3AndSymmetricMovablelimitsLargeop
singleChar (7): ['U+2211', 'U+2A0A', '[U+2A10, U+2A14]']
otherEntries 21
* {'form': 'infix', 'lspace': 0, 'rspace': 0}: 6
['U+005C', 'U+2061', 'U+2062', 'U+2063', 'U+2064', 'U+2396']
* {'form': 'infix', 'lspace': 0, 'rspace': 3}: 3
['U+002C', 'U+003A', 'U+003B']
* {'form': 'infix', 'lspace': 1, 'rspace': 1}: 3
['U+003F', 'U+005E', 'U+005F']
* {'form': 'prefix', 'lspace': 0, 'rspace': 0, 'properties': {'stretchy': True}}: 3
['U+2016', 'U+221A', 'U+2980']
* {'form': 'prefix', 'lspace': 3, 'rspace': 0}: 3
['U+2145', 'U+2146', 'U+2202']
* {'form': 'infix', 'lspace': 4, 'rspace': 4, 'properties': {'stretchy': True}}: 2
['U+2044', 'U+2215']
* {'form': 'infix', 'lspace': 3, 'rspace': 3, 'properties': {'stretchy': True, 'symmetric': True}}: 1
['U+007C']
otherEntriesWithMultipleCharacters 3
* ** infix: {'form': 'infix', 'lspace': 1, 'rspace': 1}
* // infix: {'form': 'infix', 'lspace': 1, 'rspace': 1}
* <> infix: {'form': 'infix', 'lspace': 1, 'rspace': 1}
Separate tables for fences and separators:
fences
singleChar (59): ['[U+0028, U+0029]', 'U+005B', 'U+005D', '[U+007B, U+007C]', 'U+007C', '[U+007C, U+007D]', 'U+2016', 'U+2016', '[U+2018, U+2019]', '[U+201C, U+201D]', '[U+2308, U+230B]', '[U+2329, U+232A]', '[U+2772, U+2773]', '[U+27E6, U+27EF]', 'U+2980', 'U+2980', '[U+2983, U+2998]', '[U+29FC, U+29FD]']
multipleChar (6): '||' '||' '||' '|||' '|||' '|||'
separators
singleChar (3): ['U+002C', 'U+003B', 'U+2063']
Can these three be moved into existing categories?
otherEntriesWithMultipleCharacters 3
Also, do you plan to review the multi char operators too? I thought we agreed some of them probably don't make sense...
First attempt to make the dictionary more compact:
https://mathml-refresh.github.io/mathml-core/#operator-dictionary
priority is not listed since it is not used at all for MathML Core.
The ~799 entries from infixEntriesWithDefaultValues are no longer listed explicitly since they use the default values anyway.
fences/separators are listed separately. I'll open a separate issue to decide what to do with these properties.
The 24 entries that don't fit in any pre-existing categories are currently striked out until it becomes clear what we really want to do with them.
Dictionary is sorted by categories and categories by their size.
There is not any "table per category" listing yet.
Second attempt: there are now two forms of the operator dictionaries:
The mo section now refers to the compact one.
The multi-char are mapped to the BMP PUA (range U+E000–U+F8FF) so that they can be treated as single-char.
The original dictionary used at least (counting single char only) 1387(16 + 2 + 2 8 + 6)/8 = 6935bytes. The estimated size of the compact dictionary (including multiple char) is currently 1540bytes (-78%). We can maybe do better but final result are still blocked on the pending changes to the operator dictionary.
@NSoiffer @davidcarlisle I submitted a couple of PR to make things more consistent:
https://github.com/mathml-refresh/xml-entities/pull/24 https://github.com/mathml-refresh/xml-entities/pull/23 https://github.com/mathml-refresh/xml-entities/pull/22 https://github.com/mathml-refresh/xml-entities/pull/21 https://github.com/mathml-refresh/xml-entities/pull/20 https://github.com/mathml-refresh/xml-entities/pull/19 https://github.com/mathml-refresh/xml-entities/pull/18
After these changes, I believe the remaining entries could be classified as:
infix lspace=0 rspace=0 (invisible op) infix lspace=0 rspace=0.16666666666666666em (comma-like punctuations) prefix lspace=0.16666666666666666em rspace=0 (derivation-like operators)
which seems to deserve their own category indeed.
(I haven't tried to run the script with all the changes merged, but I'm willing to do it and check again after this is done)
This is done:
https://mathml-refresh.github.io/mathml-core/#operator-dictionary-compact
We still need special handling for a few edge cases but the main subset (553 entries) is now treated uniformly. That subset can be encoded as a 560bytes table and as a binary search on 224 elements (8 comparisons). Alternatively, this main subset can be encoded as a perfect hash function with a table using 16 bits / entry, but not sure whether the extra overhead (memory & complexity) is worth it. A note gives suggestion to implementers.
As discussed in #161 ; we can remove the priority property, ignore the infix entries that just use the default values and try to make entries a bit more consistent. Then we should be able to describe the operator dictionary in a more compact way. This would help implementers to calculate default values without relying on a huge table (more than 1100 entries).
As a reminder, we agreed in #143 to keep entries with multiple characters, so they would need to be handled separately. @davidcarlisle Can you please check the operators from the
otherEntries
table and indicate whether we could actually integrate them in one of the existing larger tables? Fixing #6 and #151 would probably help here.I was also discussing with @bfgeek during BlinkOn 11 and he suggested we could even try and describe it as a perfect hash table for example by relying on the fact that many entries are in contiguous unicode ranges. If the hash is simple enough, that could make lookup faster than binary search. I modified my script to dump these unicode ranges. Below is what I obtain with the current state of the operator dictionary.