w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
63 stars 19 forks source link

Operator dictionary: Provide a compact form in MathML Core #176

Closed fred-wang closed 4 years ago

fred-wang commented 4 years ago

As discussed in #161 ; we can remove the priority property, ignore the infix entries that just use the default values and try to make entries a bit more consistent. Then we should be able to describe the operator dictionary in a more compact way. This would help implementers to calculate default values without relying on a huge table (more than 1100 entries).

As a reminder, we agreed in #143 to keep entries with multiple characters, so they would need to be handled separately. @davidcarlisle Can you please check the operators from the otherEntries table and indicate whether we could actually integrate them in one of the existing larger tables? Fixing #6 and #151 would probably help here.

I was also discussing with @bfgeek during BlinkOn 11 and he suggested we could even try and describe it as a perfect hash table for example by relying on the fact that many entries are in contiguous unicode ranges. If the hash is simple enough, that could make lookup faster than binary search. I modified my script to dump these unicode ranges. Below is what I obtain with the current state of the operator dictionary.

('infixEntriesWithDefaultValues', 379)
['[U+003C, U+003E]', 'U+03F6', 'U+21B8', '[U+21BA, U+21BB]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', '[U+2208, U+220D]', 'U+221D', 'U+221F', '[U+2223, U+2226]', '[U+2234, U+2237]', 'U+2239', '[U+223B, U+223E]', '[U+2241, U+228B]', '[U+228F, U+2292]', '[U+22A2, U+22B9]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22EE]', '[U+22F0, U+22FF]', 'U+2758', '[U+2908, U+2909]', '[U+2923, U+2932]', '[U+2934, U+2939]', '[U+293E, U+2941]', 'U+2949', '[U+294C, U+294D]', 'U+2963', 'U+2965', '[U+297E, U+297F]', '[U+29C0, U+29C1]', '[U+29CE, U+29D5]', 'U+29DE', 'U+29E1', '[U+29E3, U+29E6]', 'U+29F4', 'U+2A59', '[U+2A66, U+2A70]', '[U+2A73, U+2ADB]', '[U+2ADD, U+2AF3]', '[U+2AF7, U+2AFA]']

('infixEntriesWithSpacing4', 182)
['U+002B', 'U+002D', 'U+00B1', 'U+00B7', 'U+00D7', 'U+00F7', 'U+2022', 'U+2043', '[U+2212, U+2214]', '[U+2216, U+2219]', '[U+2227, U+222A]', 'U+2238', 'U+223A', 'U+2240', '[U+228C, U+228E]', '[U+2293, U+22A1]', '[U+22BA, U+22BD]', '[U+22C4, U+22C7]', '[U+22C9, U+22CC]', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+25B2, U+25B9]', '[U+25BC, U+25C9]', '[U+25CC, U+25CF]', '[U+25D6, U+25D7]', 'U+25E6', '[U+29B6, U+29BF]', '[U+29C4, U+29C8]', '[U+29D6, U+29D7]', 'U+29E2', '[U+29F5, U+29F7]', '[U+29FE, U+29FF]', '[U+2A22, U+2A58]', '[U+2A5A, U+2A65]', '[U+2A71, U+2A72]', '[U+2AF4, U+2AF6]', 'U+2AFB', 'U+2AFD']

('infixEntriesWithSpacing3', 85)
['U+0025', 'U+002A', 'U+002E', 'U+2206', 'U+220E', 'U+223F', '[U+22BE, U+22BF]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+29B5]', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', '[U+29DB, U+29DD]', '[U+29DF, U+29E0]', '[U+29E7, U+29F3]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', 'U+2AFE']

('infixEntriesWithSpacing5AndAccent', 75)
['[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B7]', '[U+21CD, U+21CF]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2900, U+2907]', 'U+2911', '[U+2914, U+2920]', 'U+2933', '[U+293A, U+293D]', '[U+2942, U+2948]', '[U+294A, U+294B]', 'U+2962', 'U+2964', '[U+2966, U+296D]', '[U+2970, U+297D]']

('infixEntriesWithSpacing5AndAccentStretchy', 70)
['U+2190', 'U+2192', 'U+2194', '[U+219C, U+21A0]', '[U+21A2, U+21A4]', 'U+21A6', '[U+21A9, U+21AD]', 'U+21B9', '[U+21BC, U+21BD]', '[U+21C0, U+21C1]', 'U+21C4', '[U+21C6, U+21C7]', 'U+21C9', '[U+21CB, U+21CC]', 'U+21D0', 'U+21D2', 'U+21D4', '[U+21DA, U+21DD]', 'U+21E0', 'U+21E2', '[U+21E4, U+21E6]', 'U+21E8', 'U+21F0', 'U+21F6', '[U+21FD, U+21FF]', '[U+27F5, U+27FF]', '[U+290C, U+2910]', 'U+294E', 'U+2950', '[U+2952, U+2953]', '[U+295A, U+295B]', '[U+295E, U+295F]']

('infixEntriesWithSpacing5AndStretchy', 68)
['U+2191', 'U+2193', '[U+2195, U+2199]', 'U+21A1', 'U+21A5', '[U+21A7, U+21A8]', '[U+21AF, U+21B5]', '[U+21BE, U+21BF]', '[U+21C2, U+21C3]', 'U+21C5', 'U+21C8', 'U+21CA', 'U+21D1', 'U+21D3', '[U+21D5, U+21D9]', 'U+21E1', 'U+21E3', 'U+21E7', '[U+21E9, U+21EF]', 'U+21F3', 'U+21F5', '[U+27F0, U+27F1]', '[U+290A, U+290B]', '[U+2912, U+2913]', '[U+2921, U+2922]', 'U+294F', 'U+2951', '[U+2954, U+2959]', '[U+295C, U+295D]', '[U+2960, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']

('postfixEntriesWithSpacing0AndAccent', 30)
['U+0022', 'U+0027', 'U+0060', 'U+00A8', 'U+00AA', '[U+00B2, U+00B4]', '[U+00B8, U+00BA]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+201A, U+201B]', '[U+201E, U+201F]', '[U+2033, U+2037]', 'U+2057', '[U+20DB, U+20DC]']

('prefixEntriesWithSpacing0AndStretchySymmetricFence', 25)
['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']

('prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop', 25)
['[U+220F, U+2211]', '[U+22C0, U+22C3]', '[U+2A00, U+2A0A]', '[U+2A10, U+2A14]', 'U+2AFC', 'U+2AFF']

('postfixEntriesWithSpacing0AndStretchySymmetricFence', 25)
['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']

('postfixEntriesWithSpacing0AndAccentStretchy', 20)
['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+203E', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]']

('prefixEntriesWithLspace1Rspace2AndSymmetricLargeop', 12)
['U+2A0B', '[U+2A0D, U+2A0F]', '[U+2A15, U+2A1C]']

('prefixEntriesWithLspace0Rspace1AndSymmetricLargeop', 10)
['[U+222B, U+2233]', 'U+2A0C']

('otherEntries', 55)
  * {'lspace': 2, 'rspace': 1, 'form': 'prefix'}: 7
    ['U+00AC', 'U+2145', 'U+2200', 'U+2202', 'U+2203', 'U+2204', 'U+2207']

  * {'lspace': 0, 'rspace': 0, 'form': 'infix'}: 6
    ['U+005C', 'U+2026', 'U+2061', 'U+2062', 'U+2064', 'U+22EF']

  * {'lspace': 0, 'rspace': 1, 'form': 'prefix'}: 5
    ['U+002B', 'U+002D', 'U+00B1', 'U+2212', 'U+2213']

  * {'lspace': 1, 'rspace': 1, 'form': 'infix'}: 5
    ['U+002F', 'U+003F', 'U+0040', 'U+005E', 'U+005F']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix'}: 3
    ['U+2220', 'U+2221', 'U+2222']

  * {'lspace': 0, 'rspace': 2, 'form': 'postfix'}: 3
    ['U+266D', 'U+266E', 'U+266F']

  * {'lspace': 0, 'rspace': 0, 'form': 'postfix'}: 3
    ['U+0026', 'U+00B0', 'U+2032']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True, 'fence': True}}: 2
    ['U+2016', 'U+2980']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}: 2
    ['U+2018', 'U+201C']

  * {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}: 2
    ['U+2019', 'U+201D']

  * {'lspace': 0, 'rspace': 3, 'form': 'infix', 'properties': {'separator': True}}: 2
    ['U+002C', 'U+003B']

  * {'lspace': 1, 'rspace': 2, 'form': 'infix'}: 2
    ['U+003A', 'U+2201']

  * {'lspace': 4, 'rspace': 4, 'form': 'infix', 'properties': {'stretchy': True}}: 2
    ['U+2044', 'U+2215']

  * {'lspace': 1, 'rspace': 1, 'form': 'prefix'}: 2
    ['U+221B', 'U+221C']

  * {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'stretchy': True, 'fence': True}}: 2
    ['U+2016', 'U+2980']

  * {'lspace': 5, 'rspace': 5, 'form': 'prefix', 'properties': {'stretchy': True}}: 2
    ['U+1EEF0', 'U+1EEF1']

  * {'lspace': 1, 'rspace': 1, 'form': 'prefix', 'properties': {'stretchy': True}}: 1
    ['U+221A']

  * {'lspace': 2, 'rspace': 0, 'form': 'prefix'}: 1
    ['U+2146']

  * {'lspace': 2, 'rspace': 2, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}: 1
    ['U+007C']

  * {'lspace': 0, 'rspace': 0, 'form': 'infix', 'properties': {'separator': True}}: 1
    ['U+2063']

  * {'lspace': 1, 'rspace': 0, 'form': 'postfix'}: 1
    ['U+0021']

('entriesWithMultipleCharacters', 46)
  * -= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ||| infix: {'lspace': 2, 'rspace': 2, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * /= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * := infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * || postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ⪰̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * <= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ||| postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ≂̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⩾̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * *= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ⊏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * -> infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ≦̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⧏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ||| prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ≿̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ∽̱ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
  * ⧐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * <> infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * += infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * != infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ⩽̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * // infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * !! postfix: {'lspace': 1, 'rspace': 0, 'form': 'postfix'}
  * >= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ⪡̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ≎̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * .. postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ** infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * ... postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≫̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊃⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⪯̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ++ postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * -- postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≪̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊂⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || infix: {'lspace': 2, 'rspace': 2, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * == infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ⪢̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * && infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ⫝̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
NSoiffer commented 4 years ago

I'm not sure why you opened this as a separate issues from #161. Is #161 about what should go in the spec (which I think is a pointer to some program friendly file format) and this issue is about what is in that external file/how it is organized?

Just in case you weren't aware, gperf (gnu) will generate a perfect hash for you. Perfect hashes can sometimes use a fair amount of space, so an alternative is "quasi-perfect" hashing. That allows for at most two probes into the hash table and can often significantly reduce the size of the table. There's probably an implementation that generates a table/hash function function for doing quasi-perfect hashing, but I didn't see it on the first page of a google search...

fred-wang commented 4 years ago

I'm not sure why you opened this as a separate issues from #161. Is #161 about what should go in the spec (which I think is a pointer to some program friendly file format) and this issue is about what is in that external file/how it is organized?

161 was only about the removing the priority property but as I said there I would open a separate more general issue later. Which is what I'm doing now. The other issue has already gone offtopic.

fred-wang commented 4 years ago

Just in case you weren't aware, gperf (gnu) will generate a perfect hash for you. Perfect hashes can sometimes use a fair amount of space, so an alternative is "quasi-perfect" hashing. That allows for at most two probes into the hash table and can often significantly reduce the size of the table. There's probably an implementation that generates a table/hash function function for doing quasi-perfect hashing, but I didn't see it on the first page of a google search...

I think @bfgeek proposal was actually a minimal perfect hash table https://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_perfect_hash_function (?)

fred-wang commented 4 years ago

Consensus from yesterday's meeting: @davidcarlisle will try to check the values to make them more consistent and reduce special cases.

fred-wang commented 4 years ago

@davidcarlisle How many categories remain after your changes?

davidcarlisle commented 4 years ago

@fred-wang The changes are mainly from @NSoiffer I've just been pushing through the resulting updated files, and I believe Neil is hoping to do at least one more round on this.

I also updated to Unicode 13, but not expecting that to affect MathML.

However as things stand now, if you ignore priority= (which isn't really a mathml-core thing) there are 17 different combinations of form, lspace, rspace the form:... headings at

https://mathml-refresh.github.io/xml-entities/opdict.html

The report including priority and showing differences from Unicode TR25 is below.

The first part shows the priority values still need a bit of rationalisation but that's on Neil's radar (and doesn't affect core) the second part showing differences from the Mathclass-15 file is probably OK but we should (perhaps) coordinate with Murray and Barbara get the two back in sync at some point.


 45 distinct priority values

Priority, (count)
  010, (4)
  020, (58)
  030, (1) <semicolon>
  040, (2) <comma> <invisible separator>
  070, (2) <therefore> <because>
  090, (5)
  100, (3)
  170, (9)
  190, (1) <logical or>
  200, (2) <multiple character operator: &&> <logical and>
  230, (6)
  240, (86)
  260, (232)
  265, (204)
  270, (555)
  275, (10)
  290, (3)
  300, (5)
  310, (26)
  320, (3)
  330, (12)
  340, (1) <wreath product>
  350, (4)
  390, (13)
  400, (1) <middle dot>
  410, (1) <circled times>
  640, (1) <percent sign>
  650, (2) <reverse solidus> <set minus>
  670, (27)
  680, (12)
  690, (7)
  700, (1) <vector or cross product>
  720, (1) <multiple character operator: **>
  730, (1) <circled dot operator>
  740, (4)
  780, (2) <multiple character operator: <>> <circumflex accent>
  800, (4)
  810, (2) <exclamation mark> <multiple character operator: !!>
  820, (1) <multiple character operator: //>
  825, (1) <commercial at>
  835, (1) <question mark>
  845, (3)
  850, (1) <function application>
  880, (58)
  900, (2) <low line> <decimal separator key symbol>

----

Operator dictionary entries
for characters not listed in the Unicode TR25 MathClass file.

 C0 Controls and Basic Latin 
U00022 QUOTATION MARK 
U00027 APOSTROPHE 

 C1 Controls and Latin-1 Supplement 
U000B8 CEDILLA 

 Spacing Modifier Letters 
U002C9 MODIFIER LETTER MACRON 
U002CA MODIFIER LETTER ACUTE ACCENT 
U002CB MODIFIER LETTER GRAVE ACCENT 
U002CD MODIFIER LETTER LOW MACRON 
U002DD DOUBLE ACUTE ACCENT 
U002F7 MODIFIER LETTER LOW TILDE 

 General Punctuation 
U02018 LEFT SINGLE QUOTATION MARK 
U02019 RIGHT SINGLE QUOTATION MARK 
U0201A SINGLE LOW-9 QUOTATION MARK 
U0201B SINGLE HIGH-REVERSED-9 QUOTATION MARK 
U0201C LEFT DOUBLE QUOTATION MARK 
U0201D RIGHT DOUBLE QUOTATION MARK 
U0201E DOUBLE LOW-9 QUOTATION MARK 
U0201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK 
U0203E OVERLINE 
U02043 HYPHEN BULLET 

 Arrows 
U021B4 RIGHTWARDS ARROW WITH CORNER DOWNWARDS 
U021B5 DOWNWARDS ARROW WITH CORNER LEFTWARDS 
U021B8 NORTH WEST ARROW TO LONG BAR 
U021B9 LEFTWARDS ARROW TO BAR OVER RIGHTWARDS ARROW TO BAR 

 Miscellaneous Technical 
U02301 ELECTRIC ARROW 
U02329 LEFT-POINTING ANGLE BRACKET 
U0232A RIGHT-POINTING ANGLE BRACKET 
U0238B BROKEN CIRCLE WITH NORTHWEST ARROW 
U02396 DECIMAL SEPARATOR KEY SYMBOL 
U023CD SQUARE FOOT 

 Dingbats 
U02758 LIGHT VERTICAL BAR 
U02794 HEAVY WIDE-HEADED RIGHTWARDS ARROW 
U02795 HEAVY PLUS SIGN 
U02795 HEAVY PLUS SIGN 
U02796 HEAVY MINUS SIGN 
U02796 HEAVY MINUS SIGN 
U02797 HEAVY DIVISION SIGN 
U02798 HEAVY SOUTH EAST ARROW 
U02799 HEAVY RIGHTWARDS ARROW 
U0279A HEAVY NORTH EAST ARROW 
U0279B DRAFTING POINT RIGHTWARDS ARROW 
U0279C HEAVY ROUND-TIPPED RIGHTWARDS ARROW 
U0279D TRIANGLE-HEADED RIGHTWARDS ARROW 
U0279E HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW 
U0279F DASHED TRIANGLE-HEADED RIGHTWARDS ARROW 
U027A0 HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW 
U027A1 BLACK RIGHTWARDS ARROW 
U027A5 HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW 
U027A6 HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW 
U027A7 SQUAT BLACK RIGHTWARDS ARROW 
U027A8 HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW 
U027A9 RIGHT-SHADED WHITE RIGHTWARDS ARROW 
U027AA LEFT-SHADED WHITE RIGHTWARDS ARROW 
U027AB BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW 
U027AC FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW 
U027AD HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 
U027AE HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 
U027AF NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 
U027B1 NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 
U027B2 CIRCLED HEAVY WHITE RIGHTWARDS ARROW 
U027B3 WHITE-FEATHERED RIGHTWARDS ARROW 
U027B4 BLACK-FEATHERED SOUTH EAST ARROW 
U027B5 BLACK-FEATHERED RIGHTWARDS ARROW 
U027B6 BLACK-FEATHERED NORTH EAST ARROW 
U027B7 HEAVY BLACK-FEATHERED SOUTH EAST ARROW 
U027B8 HEAVY BLACK-FEATHERED RIGHTWARDS ARROW 
U027B9 HEAVY BLACK-FEATHERED NORTH EAST ARROW 
U027BA TEARDROP-BARBED RIGHTWARDS ARROW 
U027BB HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW 
U027BC WEDGE-TAILED RIGHTWARDS ARROW 
U027BD HEAVY WEDGE-TAILED RIGHTWARDS ARROW 
U027BE OPEN-OUTLINED RIGHTWARDS ARROW 

 Miscellaneous Symbols and Arrows 
U02B45 LEFTWARDS QUADRUPLE ARROW 
U02B46 RIGHTWARDS QUADRUPLE ARROW 
U02B4D DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW 
U02B4E SHORT SLANTED NORTH ARROW 
U02B4F SHORT BACKSLANTED SOUTH ARROW 
U02B5A SLANTED NORTH ARROW WITH HOOKED HEAD 
U02B5B BACKSLANTED SOUTH ARROW WITH HOOKED TAIL 
U02B5C SLANTED NORTH ARROW WITH HORIZONTAL TAIL 
U02B5D BACKSLANTED SOUTH ARROW WITH HORIZONTAL TAIL 
U02B5E BENT ARROW POINTING DOWNWARDS THEN NORTH EAST 
U02B5F SHORT BENT ARROW POINTING DOWNWARDS THEN NORTH EAST 
U02B60 LEFTWARDS TRIANGLE-HEADED ARROW 
U02B61 UPWARDS TRIANGLE-HEADED ARROW 
U02B62 RIGHTWARDS TRIANGLE-HEADED ARROW 
U02B63 DOWNWARDS TRIANGLE-HEADED ARROW 
U02B64 LEFT RIGHT TRIANGLE-HEADED ARROW 
U02B65 UP DOWN TRIANGLE-HEADED ARROW 
U02B66 NORTH WEST TRIANGLE-HEADED ARROW 
U02B67 NORTH EAST TRIANGLE-HEADED ARROW 
U02B68 SOUTH EAST TRIANGLE-HEADED ARROW 
U02B69 SOUTH WEST TRIANGLE-HEADED ARROW 
U02B6A LEFTWARDS TRIANGLE-HEADED DASHED ARROW 
U02B6B UPWARDS TRIANGLE-HEADED DASHED ARROW 
U02B6C RIGHTWARDS TRIANGLE-HEADED DASHED ARROW 
U02B6D DOWNWARDS TRIANGLE-HEADED DASHED ARROW 
U02B6E CLOCKWISE TRIANGLE-HEADED OPEN CIRCLE ARROW 
U02B6F ANTICLOCKWISE TRIANGLE-HEADED OPEN CIRCLE ARROW 
U02B70 LEFTWARDS TRIANGLE-HEADED ARROW TO BAR 
U02B71 UPWARDS TRIANGLE-HEADED ARROW TO BAR 
U02B72 RIGHTWARDS TRIANGLE-HEADED ARROW TO BAR 
U02B73 DOWNWARDS TRIANGLE-HEADED ARROW TO BAR 
U02B76 NORTH WEST TRIANGLE-HEADED ARROW TO BAR 
U02B77 NORTH EAST TRIANGLE-HEADED ARROW TO BAR 
U02B78 SOUTH EAST TRIANGLE-HEADED ARROW TO BAR 
U02B79 SOUTH WEST TRIANGLE-HEADED ARROW TO BAR 
U02B7A LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE 
U02B7B UPWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE 
U02B7C RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE 
U02B7D DOWNWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE 
U02B80 LEFTWARDS TRIANGLE-HEADED ARROW OVER RIGHTWARDS TRIANGLE-HEADED ARROW 
U02B81 UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED ARROW 
U02B82 RIGHTWARDS TRIANGLE-HEADED ARROW OVER LEFTWARDS TRIANGLE-HEADED ARROW 
U02B83 DOWNWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF UPWARDS TRIANGLE-HEADED ARROW 
U02B84 LEFTWARDS TRIANGLE-HEADED PAIRED ARROWS 
U02B85 UPWARDS TRIANGLE-HEADED PAIRED ARROWS 
U02B86 RIGHTWARDS TRIANGLE-HEADED PAIRED ARROWS 
U02B87 DOWNWARDS TRIANGLE-HEADED PAIRED ARROWS 
U02B88 LEFTWARDS BLACK CIRCLED WHITE ARROW 
U02B89 UPWARDS BLACK CIRCLED WHITE ARROW 
U02B8A RIGHTWARDS BLACK CIRCLED WHITE ARROW 
U02B8B DOWNWARDS BLACK CIRCLED WHITE ARROW 
U02B8C ANTICLOCKWISE TRIANGLE-HEADED RIGHT U-SHAPED ARROW 
U02B8D ANTICLOCKWISE TRIANGLE-HEADED BOTTOM U-SHAPED ARROW 
U02B8E ANTICLOCKWISE TRIANGLE-HEADED LEFT U-SHAPED ARROW 
U02B8F ANTICLOCKWISE TRIANGLE-HEADED TOP U-SHAPED ARROW 
U02B94 FOUR CORNER ARROWS CIRCLING ANTICLOCKWISE 
U02B95 RIGHTWARDS BLACK ARROW 
U02BA0 DOWNWARDS TRIANGLE-HEADED ARROW WITH LONG TIP LEFTWARDS 
U02BA1 DOWNWARDS TRIANGLE-HEADED ARROW WITH LONG TIP RIGHTWARDS 
U02BA2 UPWARDS TRIANGLE-HEADED ARROW WITH LONG TIP LEFTWARDS 
U02BA3 UPWARDS TRIANGLE-HEADED ARROW WITH LONG TIP RIGHTWARDS 
U02BA4 LEFTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP UPWARDS 
U02BA5 RIGHTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP UPWARDS 
U02BA6 LEFTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP DOWNWARDS 
U02BA7 RIGHTWARDS TRIANGLE-HEADED ARROW WITH LONG TIP DOWNWARDS 
U02BA8 BLACK CURVED DOWNWARDS AND LEFTWARDS ARROW 
U02BA9 BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW 
U02BAA BLACK CURVED UPWARDS AND LEFTWARDS ARROW 
U02BAB BLACK CURVED UPWARDS AND RIGHTWARDS ARROW 
U02BAC BLACK CURVED LEFTWARDS AND UPWARDS ARROW 
U02BAD BLACK CURVED RIGHTWARDS AND UPWARDS ARROW 
U02BAE BLACK CURVED LEFTWARDS AND DOWNWARDS ARROW 
U02BAF BLACK CURVED RIGHTWARDS AND DOWNWARDS ARROW 
U02BB0 RIBBON ARROW DOWN LEFT 
U02BB1 RIBBON ARROW DOWN RIGHT 
U02BB2 RIBBON ARROW UP LEFT 
U02BB3 RIBBON ARROW UP RIGHT 
U02BB4 RIBBON ARROW LEFT UP 
U02BB5 RIBBON ARROW RIGHT UP 
U02BB6 RIBBON ARROW LEFT DOWN 
U02BB7 RIBBON ARROW RIGHT DOWN 
U02BB8 UPWARDS WHITE ARROW FROM BAR WITH HORIZONTAL BAR 
U02BD1 UNCERTAINTY SIGN 

 Supplemental Arrows-C 
U1F800 LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD 
U1F801 UPWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD 
U1F802 RIGHTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD 
U1F803 DOWNWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD 
U1F804 LEFTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD 
U1F805 UPWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD 
U1F806 RIGHTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD 
U1F807 DOWNWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD 
U1F808 LEFTWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 
U1F809 UPWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 
U1F80A RIGHTWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 
U1F80B DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 
U1F810 LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD 
U1F811 UPWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD 
U1F812 RIGHTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD 
U1F813 DOWNWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD 
U1F814 LEFTWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F815 UPWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F816 RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F817 DOWNWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F818 HEAVY LEFTWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F819 HEAVY UPWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F81A HEAVY RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F81B HEAVY DOWNWARDS ARROW WITH EQUILATERAL ARROWHEAD 
U1F81C HEAVY LEFTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD 
U1F81D HEAVY UPWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD 
U1F81E HEAVY RIGHTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD 
U1F81F HEAVY DOWNWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD 
U1F820 LEFTWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT 
U1F821 UPWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT 
U1F822 RIGHTWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT 
U1F823 DOWNWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT 
U1F824 LEFTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT 
U1F825 UPWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT 
U1F826 RIGHTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT 
U1F827 DOWNWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT 
U1F828 LEFTWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT 
U1F829 UPWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT 
U1F82A RIGHTWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT 
U1F82B DOWNWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT 
U1F82C LEFTWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT 
U1F82D UPWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT 
U1F82E RIGHTWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT 
U1F82F DOWNWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT 
U1F830 LEFTWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT 
U1F831 UPWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT 
U1F832 RIGHTWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT 
U1F833 DOWNWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT 
U1F834 LEFTWARDS FINGER-POST ARROW 
U1F835 UPWARDS FINGER-POST ARROW 
U1F836 RIGHTWARDS FINGER-POST ARROW 
U1F837 DOWNWARDS FINGER-POST ARROW 
U1F838 LEFTWARDS SQUARED ARROW 
U1F839 UPWARDS SQUARED ARROW 
U1F83A RIGHTWARDS SQUARED ARROW 
U1F83B DOWNWARDS SQUARED ARROW 
U1F83C LEFTWARDS COMPRESSED ARROW 
U1F83D UPWARDS COMPRESSED ARROW 
U1F83E RIGHTWARDS COMPRESSED ARROW 
U1F83F DOWNWARDS COMPRESSED ARROW 
U1F840 LEFTWARDS HEAVY COMPRESSED ARROW 
U1F841 UPWARDS HEAVY COMPRESSED ARROW 
U1F842 RIGHTWARDS HEAVY COMPRESSED ARROW 
U1F843 DOWNWARDS HEAVY COMPRESSED ARROW 
U1F844 LEFTWARDS HEAVY ARROW 
U1F845 UPWARDS HEAVY ARROW 
U1F846 RIGHTWARDS HEAVY ARROW 
U1F847 DOWNWARDS HEAVY ARROW 
U1F850 LEFTWARDS SANS-SERIF ARROW 
U1F851 UPWARDS SANS-SERIF ARROW 
U1F852 RIGHTWARDS SANS-SERIF ARROW 
U1F853 DOWNWARDS SANS-SERIF ARROW 
U1F854 NORTH WEST SANS-SERIF ARROW 
U1F855 NORTH EAST SANS-SERIF ARROW 
U1F856 SOUTH EAST SANS-SERIF ARROW 
U1F857 SOUTH WEST SANS-SERIF ARROW 
U1F858 LEFT RIGHT SANS-SERIF ARROW 
U1F859 UP DOWN SANS-SERIF ARROW 
U1F860 WIDE-HEADED LEFTWARDS LIGHT BARB ARROW 
U1F861 WIDE-HEADED UPWARDS LIGHT BARB ARROW 
U1F862 WIDE-HEADED RIGHTWARDS LIGHT BARB ARROW 
U1F863 WIDE-HEADED DOWNWARDS LIGHT BARB ARROW 
U1F864 WIDE-HEADED NORTH WEST LIGHT BARB ARROW 
U1F865 WIDE-HEADED NORTH EAST LIGHT BARB ARROW 
U1F866 WIDE-HEADED SOUTH EAST LIGHT BARB ARROW 
U1F867 WIDE-HEADED SOUTH WEST LIGHT BARB ARROW 
U1F868 WIDE-HEADED LEFTWARDS BARB ARROW 
U1F869 WIDE-HEADED UPWARDS BARB ARROW 
U1F86A WIDE-HEADED RIGHTWARDS BARB ARROW 
U1F86B WIDE-HEADED DOWNWARDS BARB ARROW 
U1F86C WIDE-HEADED NORTH WEST BARB ARROW 
U1F86D WIDE-HEADED NORTH EAST BARB ARROW 
U1F86E WIDE-HEADED SOUTH EAST BARB ARROW 
U1F86F WIDE-HEADED SOUTH WEST BARB ARROW 
U1F870 WIDE-HEADED LEFTWARDS MEDIUM BARB ARROW 
U1F871 WIDE-HEADED UPWARDS MEDIUM BARB ARROW 
U1F872 WIDE-HEADED RIGHTWARDS MEDIUM BARB ARROW 
U1F873 WIDE-HEADED DOWNWARDS MEDIUM BARB ARROW 
U1F874 WIDE-HEADED NORTH WEST MEDIUM BARB ARROW 
U1F875 WIDE-HEADED NORTH EAST MEDIUM BARB ARROW 
U1F876 WIDE-HEADED SOUTH EAST MEDIUM BARB ARROW 
U1F877 WIDE-HEADED SOUTH WEST MEDIUM BARB ARROW 
U1F878 WIDE-HEADED LEFTWARDS HEAVY BARB ARROW 
U1F879 WIDE-HEADED UPWARDS HEAVY BARB ARROW 
U1F87A WIDE-HEADED RIGHTWARDS HEAVY BARB ARROW 
U1F87B WIDE-HEADED DOWNWARDS HEAVY BARB ARROW 
U1F87C WIDE-HEADED NORTH WEST HEAVY BARB ARROW 
U1F87D WIDE-HEADED NORTH EAST HEAVY BARB ARROW 
U1F87E WIDE-HEADED SOUTH EAST HEAVY BARB ARROW 
U1F87F WIDE-HEADED SOUTH WEST HEAVY BARB ARROW 
U1F880 WIDE-HEADED LEFTWARDS VERY HEAVY BARB ARROW 
U1F881 WIDE-HEADED UPWARDS VERY HEAVY BARB ARROW 
U1F882 WIDE-HEADED RIGHTWARDS VERY HEAVY BARB ARROW 
U1F883 WIDE-HEADED DOWNWARDS VERY HEAVY BARB ARROW 
U1F884 WIDE-HEADED NORTH WEST VERY HEAVY BARB ARROW 
U1F885 WIDE-HEADED NORTH EAST VERY HEAVY BARB ARROW 
U1F886 WIDE-HEADED SOUTH EAST VERY HEAVY BARB ARROW 
U1F887 WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW 
U1F898 LEFTWARDS ARROW WITH NOTCHED TAIL 
U1F899 UPWARDS ARROW WITH NOTCHED TAIL 
U1F89A RIGHTWARDS ARROW WITH NOTCHED TAIL 
U1F89B DOWNWARDS ARROW WITH NOTCHED TAIL 
U1F8A0 LEFTWARDS BOTTOM-SHADED WHITE ARROW 
U1F8A1 RIGHTWARDS BOTTOM SHADED WHITE ARROW 
U1F8A2 LEFTWARDS TOP SHADED WHITE ARROW 
U1F8A3 RIGHTWARDS TOP SHADED WHITE ARROW 
U1F8A4 LEFTWARDS LEFT-SHADED WHITE ARROW 
U1F8A5 RIGHTWARDS RIGHT-SHADED WHITE ARROW 
U1F8A6 LEFTWARDS RIGHT-SHADED WHITE ARROW 
U1F8A7 RIGHTWARDS LEFT-SHADED WHITE ARROW 
U1F8A8 LEFTWARDS BACK-TILTED SHADOWED WHITE ARROW 
U1F8A9 RIGHTWARDS BACK-TILTED SHADOWED WHITE ARROW 
U1F8AA LEFTWARDS FRONT-TILTED SHADOWED WHITE ARROW 
U1F8AB RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW 
fred-wang commented 4 years ago

@davidcarlisle Do you plan to merge more? In particular how important is it to keep specific categories for some isolated values with lspace != rspace?

davidcarlisle commented 4 years ago

I would probably suggest merging some of them, but as I say @NSoiffer has more changes planned, so I was planning on waiting for that to end before really reviewing this.

Certainly TeX gets away with fewer space categories, with only three non zero spaces ever automatically added: thin medium and thick, which are theoretically user-settable but are nearly always the latex and plain tex defaults

\thinmuskip=3mu
\medmuskip=4mu plus 2mu minus 4mu
\thickmuskip=5mu plus 5mu

where 1mu =1/18 em

fred-wang commented 4 years ago

OK, let's wait for @NSoiffer

Here is a quick analysis on my side:

I think complement "∁" has prefix form and should be moved into an existing prefix category. Then I'm not sure how important it is to keep a single category for ":"? It seems fine to me to use a default symmetric spacing for this one, it can be used as a separator or as a binary operator (Note that in text, some languages use a spacing before ":"). So I would merge it into "form:infix lspace:2 rspace:2" for example or "form:infix lspace:1 rspace:1".

I guess unbalanced spacing separators is still important so we can't remove category "form:infix lspace:0 rspace:3", right?

How important is the expact spacing for postfix "♭", "♮", "♯", "!" and "!!" ? They don't seem to have a clear default spacing to me. Can't we merge them into a single category with zero lspace and rspace nonzero? Or even just into "form:postfix lspace:0 rspace:0"?

How important is the category for "form:prefix lspace:1 rspace:1"? I don't think people use this square root operator as a single mo, they would instead use the msqrt or mroot element. So I would just drop them from the operator dictionary or otherwise merge into another arbitrary existing prefix category.

I still don't quite understand what is the distinction between "form:prefix lspace:1 rspace:2" and "form:prefix lspace:3 rspace:3". Maybe it's integral VS non-integral but treating ∑ and ∏ differently seems dubious to me. Can we merge them into a single category?

I guess unbalanced spacing differential operators is still important so we can't remove "form:prefix lspace:3 rspace:0", right?

How important is the "form:prefix lspace:2 rspace:1"? Can't we merge it with another existing category with balanced spacing or with lspace > rspace?

NSoiffer commented 4 years ago

Some characters will likely go away including the musical notation signs (and hence their spacing character), but I'm spending time for each character trying to find whether they have a mathematical usage and if so, what it is. I'm currently sifting through the priority 265 symbols and either removing them or moving them to a more appropriate place. That sometimes involves changing their form and also their spacing. Once I'm done with that, I'm going to review spacing for what remains.

On Thu, Mar 19, 2020 at 5:05 AM Frédéric Wang notifications@github.com wrote:

OK, let's wait for @NSoiffer https://github.com/NSoiffer

Here is a quick analysis on my side:

I think complement "∁" has prefix form and should be moved into an existing prefix category. Then I'm not sure how important it is to keep a single category for ":"? It seems fine to me to use a default symmetric spacing for this one, it can be used as a separator or as a binary operator (Note that in text, some languages use a spacing before ":"). So I would merge it into "form:infix lspace:2 rspace:2" for example or "form:infix lspace:1 rspace:1".

I guess unbalanced spacing separators is still important so we can't remove category "form:infix lspace:0 rspace:3", right?

How important is the expact spacing for postfix "♭", "♮", "♯", "!" and "!!" ? They don't seem to have a clear default spacing to me. Can't we merge them into a single category with zero lspace and rspace nonzero? Or even just into "form:postfix lspace:0 rspace:0"?

How important is the category for "form:prefix lspace:1 rspace:1"? I don't think people use this square root operator as a single mo, they would instead use the msqrt or mroot element. So I would just drop them from the operator dictionary or otherwise merge into another arbitrary existing prefix category.

I still don't quite understand what is the distinction between "form:prefix lspace:1 rspace:2" and "form:prefix lspace:3 rspace:3". Maybe it's integral VS non-integral but treating ∑ and ∏ differently seems dubious to me. Can we merge them into a single category?

I guess unbalanced spacing differential operators is still important so we can't remove "form:prefix lspace:3 rspace:0", right?

How important is the "form:prefix lspace:2 rspace:1"? Can't we merge it with another existing category with balanced spacing or with lspace > rspace?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mathml-refresh/mathml/issues/176#issuecomment-601141649, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3GYQLRCK5IY5EAA3Z3RIIDAJANCNFSM4JOQZ6LQ .

fred-wang commented 4 years ago

"a:b" in TeX seems to give wide symmetric spacing, probably colon should be in lspace:4 rspace:4 or lspace:3 rspace:3?

What is the logic behind the two categories "form:prefix lspace:1 rspace:2" and "form:prefix lspace:3 rspace:3" for largeop/integral ? It seems some "sums" are in the former.

fred-wang commented 4 years ago

This is the current output from the compact form script ( https://mathml-refresh.github.io/xml-entities/opdict.html#compressed only gives spacing, not properties):

('infixEntriesWithDefaultValues', 771)
['[U+003C, U+003E]', '[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B8]', '[U+21BA, U+21BB]', '[U+21CD, U+21CF]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2208, U+220D]', 'U+2219', 'U+221D', 'U+2223', '[U+2225, U+2226]', '[U+2234, U+2235]', 'U+2237', '[U+2239, U+223E]', '[U+2241, U+228B]', '[U+2290, U+2292]', '[U+229A, U+229C]', '[U+22A2, U+22BA]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22ED]', '[U+22F2, U+22FF]', 'U+2301', 'U+237C', 'U+238B', 'U+2758', 'U+2794', '[U+2798, U+27A1]', '[U+27A5, U+27AF]', '[U+27B1, U+27BE]', '[U+27F2, U+27F3]', '[U+2900, U+2909]', 'U+2911', '[U+2914, U+2920]', '[U+2923, U+294D]', '[U+2962, U+296D]', '[U+2970, U+297F]', '[U+29B6, U+29BB]', '[U+29BD, U+29C1]', '[U+29C4, U+29C8]', '[U+29CE, U+29D7]', 'U+29E1', '[U+29E3, U+29E6]', '[U+29F4, U+29F5]', 'U+29F7', 'U+2A3E', '[U+2A64, U+2AD9]', '[U+2ADE, U+2AEB]', '[U+2AEE, U+2AFA]', '[U+2B00, U+2B11]', '[U+2B30, U+2B31]', '[U+2B33, U+2B44]', '[U+2B47, U+2B4F]', '[U+2B5A, U+2B73]', '[U+2B76, U+2B7D]', '[U+2B80, U+2B8F]', '[U+2B94, U+2B95]', '[U+2BA0, U+2BB8]', 'U+2BD1', '[U+1F800, U+1F80B]', '[U+1F810, U+1F847]', '[U+1F850, U+1F859]', '[U+1F860, U+1F887]', '[U+1F898, U+1F89B]', '[U+1F8A0, U+1F8AB]']

('infixEntriesWithSpacing5AndStretchy', 138)
['[U+2190, U+2199]', '[U+219C, U+21AD]', '[U+21AF, U+21B5]', 'U+21B9', '[U+21BC, U+21CC]', '[U+21D0, U+21DD]', '[U+21E0, U+21F0]', 'U+21F3', '[U+21F5, U+21F6]', '[U+21FD, U+21FF]', '[U+27F0, U+27F1]', '[U+27F5, U+27FF]', '[U+290A, U+2910]', '[U+2912, U+2913]', '[U+2921, U+2922]', '[U+294E, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']

('infixEntriesWithSpacing4', 100)
['U+002B', 'U+002D', 'U+002F', 'U+00B1', 'U+00F7', '[U+2212, U+2214]', 'U+2216', 'U+2218', 'U+2224', '[U+2227, U+222A]', 'U+2236', 'U+2238', '[U+228C, U+228F]', '[U+2293, U+2296]', 'U+2298', '[U+229D, U+229F]', '[U+22BB, U+22BD]', 'U+22C4', 'U+22C6', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+2795, U+2797]', 'U+27F4', 'U+29BC', 'U+29F6', '[U+2A22, U+2A2E]', '[U+2A38, U+2A3A]', '[U+2A40, U+2A4F]', '[U+2A51, U+2A63]', '[U+2ADA, U+2ADB]', 'U+2AFB', 'U+2AFD', 'U+2B32']

('infixEntriesWithSpacing3', 84)
['U+0025', 'U+002A', 'U+002E', 'U+00B7', 'U+00D7', 'U+2022', 'U+2043', 'U+2206', 'U+220E', 'U+2217', '[U+223F, U+2240]', 'U+2297', 'U+2299', '[U+22A0, U+22A1]', 'U+22C5', 'U+22C7', '[U+22C9, U+22CC]', '[U+2305, U+2306]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+299A]', 'U+29B5', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', 'U+29DB', '[U+29DF, U+29E0]', 'U+29E2', '[U+29E7, U+29ED]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', '[U+2A2F, U+2A37]', '[U+2A3B, U+2A3D]', 'U+2A3F', 'U+2A50', '[U+2ADC, U+2ADD]', 'U+2AFE']

('prefixEntriesWithLspace0Rspace0', 49)
['U+0021', 'U+002B', 'U+002D', 'U+00AC', 'U+00B1', '[U+2200, U+2201]', '[U+2203, U+2204]', 'U+2207', '[U+2212, U+2213]', '[U+221B, U+221C]', '[U+221F, U+2222]', 'U+223C', '[U+22BE, U+22BF]', 'U+2310', 'U+2319', '[U+2795, U+2796]', 'U+27C0', '[U+299B, U+29AF]', '[U+2AEC, U+2AED]']

('postfixEntriesWithLspace0Rspace0', 33)
['[U+0021, U+0022]', '[U+0026, U+0027]', 'U+0060', 'U+00A8', 'U+00B0', '[U+00B2, U+00B4]', '[U+00B8, U+00B9]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+201A, U+201B]', '[U+201E, U+201F]', '[U+2032, U+2037]', 'U+2057', '[U+20DB, U+20DC]', 'U+23CD']

('prefixEntriesWithSpacing0AndStretchySymmetricFence', 25)
['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']

('postfixEntriesWithSpacing0AndStretchySymmetricFence', 25)
['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']

('postfixEntriesWithLspace0Rspace0AndStretchy', 24)
['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+203E', '[U+2322, U+2323]', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]', '[U+1EEF0, U+1EEF1]']

('prefixEntriesWithLspace3Rspace3AndSymmetricLargeop', 22)
['[U+222B, U+2233]', '[U+2A0B, U+2A0F]', '[U+2A15, U+2A1C]']

('prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop', 18)
['[U+220F, U+2210]', '[U+22C0, U+22C3]', '[U+2A00, U+2A09]', 'U+2AFC', 'U+2AFF']

('otherEntries', 35)
  * {'lspace': 3, 'rspace': 3, 'form': 'prefix', 'properties': {'symmetric': True, 'movablelimits': True, 'largeop': True}}: 7
    ['U+2211', 'U+2A0A', 'U+2A10', 'U+2A11', 'U+2A12', 'U+2A13', 'U+2A14']

  * {'lspace': 0, 'rspace': 0, 'form': 'infix'}: 5
    ['U+005C', 'U+2061', 'U+2062', 'U+2064', 'U+2396']

  * {'lspace': 1, 'rspace': 1, 'form': 'infix'}: 4
    ['U+003F', 'U+0040', 'U+005E', 'U+005F']

  * {'lspace': 3, 'rspace': 0, 'form': 'prefix'}: 3
    ['U+2145', 'U+2146', 'U+2202']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}: 2
    ['U+2018', 'U+201C']

  * {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}: 2
    ['U+2019', 'U+201D']

  * {'lspace': 0, 'rspace': 3, 'form': 'infix', 'properties': {'separator': True}}: 2
    ['U+002C', 'U+003B']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True, 'fence': True}}: 2
    ['U+2016', 'U+2980']

  * {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'stretchy': True, 'fence': True}}: 2
    ['U+2016', 'U+2980']

  * {'lspace': 4, 'rspace': 4, 'form': 'infix', 'properties': {'stretchy': True}}: 2
    ['U+2044', 'U+2215']

  * {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}: 1
    ['U+007C']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True}}: 1
    ['U+221A']

  * {'lspace': 0, 'rspace': 0, 'form': 'infix', 'properties': {'separator': True}}: 1
    ['U+2063']

  * {'lspace': 1, 'rspace': 2, 'form': 'infix'}: 1
    ['U+003A']

('entriesWithMultipleCharacters', 46)
  * -= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ||| infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * /= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * := infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ⪰̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * <= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ||| postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ≂̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⩾̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * *= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * -> infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ≦̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⧏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ||| prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ≿̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ∽̱ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
  * ⧐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * <> infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * += infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * != infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⩽̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * // infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * !! postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * >= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * ⪡̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ≎̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * .. postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ** infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * ... postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≫̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊃⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⪯̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ++ postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * -- postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≪̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊂⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True, 'fence': True}}
  * == infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⪢̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * && infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ⫝̸ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
davidcarlisle commented 4 years ago

"a:b" in TeX seems to give wide symmetric spacing, probably colon should be in lspace:4 rspace:4 or lspace:3 rspace:3?

in standard tex : is \mathrel (so left and right space 5mu by default) \colon is the same glyph but \mathpunct so left 0mu and right 3mu)

amsmath changes \colon to be a spacier version with left 2mu and right 6mu image

image

\documentclass{article}

\usepackage{amsmath}
\showoutput
\begin{document}

$a{:}b$

$a:b$

$a\colon b$
\end{document}

mathml doesn't really have the distinction of : and \colon you need to pick one use as the default.

The current entry of

<operator-dictionary priority="100" form="infix" lspace="1" rspace="2"/>

is asymmetric so for the same interpretation as \colon, but less spacy. Neil?

you want the symmetric spacing for use in ratios 50 : 50 but the asymmetric spacing (which is more common in more technical math use) in f: x → y

NSoiffer commented 4 years ago

@fred-wang : why break out multichar chars in the table into their own category? I thought the goal was to minimize the size of the operator dictionary in the core spec. Most of the entries would belong to existing groupings.

NSoiffer commented 4 years ago

I've been trying to decide what to do about colon, which is why I haven't changed its values yet. I've written down what I found in https://github.com/mathml-refresh/mathml/issues/87#issuecomment-612544574, which is where this discussion properly belongs.

fred-wang commented 4 years ago

@fred-wang : why break out multichar chars in the table into their own category? I thought the goal was to minimize the size of the operator dictionary in the core spec. Most of the entries would belong to existing groupings.

The final script still depends on what the possible values will be. The general rule of thumb is still to try to reduce possible values as much as possible, independently on how the keys will be handled.

Regarding keys, strings in browsers are heavy objects, see [1] [2]. So to minimize space it seems optimal use single UTF-16 characters (only 2 bytes, less than any concept of generic 16-bits strings) which cover most of the operators but the non-BMP ones (only two of them so can easily be handled separately) and the multiple chars (for which we can maybe find a clever handling, e.g. the non-ASCII strings are always 'lspace': 5, 'rspace': '5').

[1] https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/wtf/text/README.md (webkit is similar) [2] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings

fred-wang commented 4 years ago

@NSoiffer There are still inconsistencies with largeop. Some of them are symmetric+largeop+movablelimits (e.g. anticlockwise integration) others are just symmetric+largeop (e.g. integral). Why can't we just make all of them symmetric+largeop+movablelimits?

fred-wang commented 4 years ago

I'm not sure people should use radical as mo, can we please remove them? Or at least make square root not stretchy so we don't have a special case:

√ √ square root prefix 845 1 1 stretchy ∛ ∛ cube root prefix 845 1 1
∜ ∜ fourth root prefix 845 1 1

fred-wang commented 4 years ago

Can we move U+007C to any other existing category? Do we really need to make fraction slash and division slash stretchy by default?

fred-wang commented 4 years ago

Script updated a bit (fences/separators are handled in separate table now). This is the current output. I feel like largeop could still be make more consistent and that we could reduce special cases (cf otherEntries table):

('infixEntriesWithDefaultValues', 772)
['[U+003C, U+003E]', '[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B8]', '[U+21BA, U+21BB]', '[U+21CD, U+21CF]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2208, U+220D]', 'U+2219', 'U+221D', 'U+2223', '[U+2225, U+2226]', '[U+2234, U+2235]', 'U+2237', '[U+2239, U+223E]', '[U+2241, U+228B]', '[U+2290, U+2292]', '[U+229A, U+229C]', '[U+22A2, U+22BA]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22ED]', '[U+22F2, U+22FF]', 'U+2301', 'U+237C', 'U+238B', 'U+2758', 'U+2794', '[U+2798, U+27A1]', '[U+27A5, U+27AF]', '[U+27B1, U+27BE]', 'U+27DF', '[U+27F2, U+27F3]', '[U+2900, U+2909]', 'U+2911', '[U+2914, U+2920]', '[U+2923, U+294D]', '[U+2962, U+296D]', '[U+2970, U+297F]', '[U+29B6, U+29BB]', '[U+29BD, U+29C1]', '[U+29C4, U+29C8]', '[U+29CE, U+29D7]', 'U+29E1', '[U+29E3, U+29E6]', '[U+29F4, U+29F5]', 'U+29F7', 'U+2A3E', '[U+2A64, U+2AD9]', '[U+2ADE, U+2AEB]', '[U+2AEE, U+2AFA]', '[U+2B00, U+2B11]', '[U+2B30, U+2B31]', '[U+2B33, U+2B44]', '[U+2B47, U+2B4F]', '[U+2B5A, U+2B73]', '[U+2B76, U+2B7D]', '[U+2B80, U+2B8F]', '[U+2B94, U+2B95]', '[U+2BA0, U+2BB8]', 'U+2BD1', '[U+1F800, U+1F80B]', '[U+1F810, U+1F847]', '[U+1F850, U+1F859]', '[U+1F860, U+1F887]', '[U+1F898, U+1F89B]', '[U+1F8A0, U+1F8AB]']

('infixEntriesWithSpacing5AndStretchy', 138)
['[U+2190, U+2199]', '[U+219C, U+21AD]', '[U+21AF, U+21B5]', 'U+21B9', '[U+21BC, U+21CC]', '[U+21D0, U+21DD]', '[U+21E0, U+21F0]', 'U+21F3', '[U+21F5, U+21F6]', '[U+21FD, U+21FF]', '[U+27F0, U+27F1]', '[U+27F5, U+27FF]', '[U+290A, U+2910]', '[U+2912, U+2913]', '[U+2921, U+2922]', '[U+294E, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']

('infixEntriesWithSpacing4', 100)
['U+002B', 'U+002D', 'U+002F', 'U+00B1', 'U+00F7', '[U+2212, U+2214]', 'U+2216', 'U+2218', 'U+2224', '[U+2227, U+222A]', 'U+2236', 'U+2238', '[U+228C, U+228F]', '[U+2293, U+2296]', 'U+2298', '[U+229D, U+229F]', '[U+22BB, U+22BD]', 'U+22C4', 'U+22C6', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+2795, U+2797]', 'U+27F4', 'U+29BC', 'U+29F6', '[U+2A22, U+2A2E]', '[U+2A38, U+2A3A]', '[U+2A40, U+2A4F]', '[U+2A51, U+2A63]', '[U+2ADA, U+2ADB]', 'U+2AFB', 'U+2AFD', 'U+2B32']

('infixEntriesWithSpacing3', 85)
['U+0025', 'U+002A', 'U+002E', 'U+0040', 'U+00B7', 'U+00D7', 'U+2022', 'U+2043', 'U+2206', 'U+220E', 'U+2217', '[U+223F, U+2240]', 'U+2297', 'U+2299', '[U+22A0, U+22A1]', 'U+22C5', 'U+22C7', '[U+22C9, U+22CC]', '[U+2305, U+2306]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+299A]', 'U+29B5', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', 'U+29DB', '[U+29DF, U+29E0]', 'U+29E2', '[U+29E7, U+29ED]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', '[U+2A2F, U+2A37]', '[U+2A3B, U+2A3D]', 'U+2A3F', 'U+2A50', '[U+2ADC, U+2ADD]', 'U+2AFE']

('prefixEntriesWithLspace0Rspace0', 51)
['U+0021', 'U+002B', 'U+002D', 'U+00AC', 'U+00B1', 'U+2018', 'U+201C', '[U+2200, U+2201]', '[U+2203, U+2204]', 'U+2207', '[U+2212, U+2213]', '[U+221B, U+221C]', '[U+221F, U+2222]', 'U+223C', '[U+22BE, U+22BF]', 'U+2310', 'U+2319', '[U+2795, U+2796]', 'U+27C0', '[U+299B, U+29AF]', '[U+2AEC, U+2AED]']

('postfixEntriesWithLspace0Rspace0', 35)
['[U+0021, U+0022]', '[U+0026, U+0027]', 'U+0060', 'U+00A8', 'U+00B0', '[U+00B2, U+00B4]', '[U+00B8, U+00B9]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+2019, U+201B]', '[U+201D, U+201F]', '[U+2032, U+2037]', 'U+2057', '[U+20DB, U+20DC]', 'U+23CD']

('postfixEntriesWithLspace0Rspace0AndStretchy', 26)
['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+2016', 'U+203E', '[U+2322, U+2323]', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]', 'U+2980', '[U+1EEF0, U+1EEF1]']

('prefixEntriesWithSpacing0AndStretchySymmetric', 25)
['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']

('postfixEntriesWithSpacing0AndStretchySymmetric', 25)
['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']

('prefixEntriesWithLspace3Rspace3AndSymmetricLargeop', 22)
['[U+222B, U+2233]', '[U+2A0B, U+2A0F]', '[U+2A15, U+2A1C]']

('prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop', 18)
['[U+220F, U+2210]', '[U+22C0, U+22C3]', '[U+2A00, U+2A09]', 'U+2AFC', 'U+2AFF']

('prefixEntriesWithLspace3Rspace3AndSymmetricMovablelimitsLargeop', 7)
['U+2211', 'U+2A0A', '[U+2A10, U+2A14]']

('otherEntries', 21)
  * {'lspace': 0, 'rspace': 0, 'form': 'infix'}: 6
    ['U+005C', 'U+2061', 'U+2062', 'U+2063', 'U+2064', 'U+2396']

  * {'lspace': 3, 'rspace': 0, 'form': 'prefix'}: 3
    ['U+2145', 'U+2146', 'U+2202']

  * {'lspace': 1, 'rspace': 1, 'form': 'infix'}: 3
    ['U+003F', 'U+005E', 'U+005F']

  * {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'stretchy': True}}: 3
    ['U+2016', 'U+221A', 'U+2980']

  * {'lspace': 0, 'rspace': 3, 'form': 'infix'}: 3
    ['U+002C', 'U+003A', 'U+003B']

  * {'lspace': 4, 'rspace': 4, 'form': 'infix', 'properties': {'stretchy': True}}: 2
    ['U+2044', 'U+2215']

  * {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'symmetric': True, 'stretchy': True}}: 1
    ['U+007C']

Separate table for multiple characters:

('entriesWithMultipleCharacters', 46)
  * -= infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ||| infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'fence': True}}
  * /= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * := infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}
  * ⪰̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * <= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ||| postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix', 'properties': {'fence': True}}
  * ≂̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⩾̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * *= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * -> infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ≦̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⧏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ||| prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}
  * ≿̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ∽̱ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}
  * ⧐̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * <> infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * += infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * != infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⩽̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * // infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * !! postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * >= infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || prefix: {'lspace': 0, 'rspace': 0, 'form': 'prefix', 'properties': {'fence': True}}
  * ⪡̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ≎̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * .. postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≏̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ** infix: {'lspace': 1, 'rspace': 1, 'form': 'infix'}
  * ... postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≫̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊃⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⪯̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ++ postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * -- postfix: {'lspace': 0, 'rspace': 0, 'form': 'postfix'}
  * ≪̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⊂⃒ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * || infix: {'lspace': 3, 'rspace': 3, 'form': 'infix', 'properties': {'fence': True}}
  * == infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * ⪢̸ infix: {'lspace': 5, 'rspace': 5, 'form': 'infix'}
  * && infix: {'lspace': 4, 'rspace': 4, 'form': 'infix'}
  * ⫝̸ infix: {'lspace': 3, 'rspace': 3, 'form': 'infix'}

Separate tables for fences and separators:

('fences', 59)
['[U+0028, U+0029]', 'U+005B', 'U+005D', '[U+007B, U+007C]', 'U+007C', '[U+007C, U+007D]', 'U+2016', 'U+2016', '[U+2018, U+2019]', '[U+201C, U+201D]', '[U+2308, U+230B]', '[U+2329, U+232A]', '[U+2772, U+2773]', '[U+27E6, U+27EF]', 'U+2980', 'U+2980', '[U+2983, U+2998]', '[U+29FC, U+29FD]']

('separators', 3)
['U+002C', 'U+003B', 'U+2063']
fred-wang commented 4 years ago

Trying to put multichars into existing categories:

infixEntriesWithDefaultValues
  singleChar (772):  ['[U+003C, U+003E]', '[U+219A, U+219B]', 'U+21AE', '[U+21B6, U+21B8]', '[U+21BA, U+21BB]', '[U+21CD, U+21CF]', '[U+21DE, U+21DF]', '[U+21F1, U+21F2]', 'U+21F4', '[U+21F7, U+21FC]', '[U+2208, U+220D]', 'U+2219', 'U+221D', 'U+2223', '[U+2225, U+2226]', '[U+2234, U+2235]', 'U+2237', '[U+2239, U+223E]', '[U+2241, U+228B]', '[U+2290, U+2292]', '[U+229A, U+229C]', '[U+22A2, U+22BA]', 'U+22C8', 'U+22CD', '[U+22D0, U+22D1]', '[U+22D4, U+22ED]', '[U+22F2, U+22FF]', 'U+2301', 'U+237C', 'U+238B', 'U+2758', 'U+2794', '[U+2798, U+27A1]', '[U+27A5, U+27AF]', '[U+27B1, U+27BE]', 'U+27DF', '[U+27F2, U+27F3]', '[U+2900, U+2909]', 'U+2911', '[U+2914, U+2920]', '[U+2923, U+294D]', '[U+2962, U+296D]', '[U+2970, U+297F]', '[U+29B6, U+29BB]', '[U+29BD, U+29C1]', '[U+29C4, U+29C8]', '[U+29CE, U+29D7]', 'U+29E1', '[U+29E3, U+29E6]', '[U+29F4, U+29F5]', 'U+29F7', 'U+2A3E', '[U+2A64, U+2AD9]', '[U+2ADE, U+2AEB]', '[U+2AEE, U+2AFA]', '[U+2B00, U+2B11]', '[U+2B30, U+2B31]', '[U+2B33, U+2B44]', '[U+2B47, U+2B4F]', '[U+2B5A, U+2B73]', '[U+2B76, U+2B7D]', '[U+2B80, U+2B8F]', '[U+2B94, U+2B95]', '[U+2BA0, U+2BB8]', 'U+2BD1', '[U+1F800, U+1F80B]', '[U+1F810, U+1F847]', '[U+1F850, U+1F859]', '[U+1F860, U+1F887]', '[U+1F898, U+1F89B]', '[U+1F8A0, U+1F8AB]']
  multipleChar (27): '!=' '*=' '->' '/=' ':=' '<=' '==' '>=' '≂̸' '≎̸' '≏̸' '≦̸' '≪̸' '≫̸' '≿̸' '⊂⃒' '⊃⃒' '⊏̸' '⊐̸' '⧏̸' '⧐̸' '⩽̸' '⩾̸' '⪡̸' '⪢̸' '⪯̸' '⪰̸' 

infixEntriesWithSpacing5AndStretchy
  singleChar (138):  ['[U+2190, U+2199]', '[U+219C, U+21AD]', '[U+21AF, U+21B5]', 'U+21B9', '[U+21BC, U+21CC]', '[U+21D0, U+21DD]', '[U+21E0, U+21F0]', 'U+21F3', '[U+21F5, U+21F6]', '[U+21FD, U+21FF]', '[U+27F0, U+27F1]', '[U+27F5, U+27FF]', '[U+290A, U+2910]', '[U+2912, U+2913]', '[U+2921, U+2922]', '[U+294E, U+2961]', '[U+296E, U+296F]', '[U+2B45, U+2B46]']

infixEntriesWithSpacing4
  singleChar (100):  ['U+002B', 'U+002D', 'U+002F', 'U+00B1', 'U+00F7', '[U+2212, U+2214]', 'U+2216', 'U+2218', 'U+2224', '[U+2227, U+222A]', 'U+2236', 'U+2238', '[U+228C, U+228F]', '[U+2293, U+2296]', 'U+2298', '[U+229D, U+229F]', '[U+22BB, U+22BD]', 'U+22C4', 'U+22C6', '[U+22CE, U+22CF]', '[U+22D2, U+22D3]', '[U+2795, U+2797]', 'U+27F4', 'U+29BC', 'U+29F6', '[U+2A22, U+2A2E]', '[U+2A38, U+2A3A]', '[U+2A40, U+2A4F]', '[U+2A51, U+2A63]', '[U+2ADA, U+2ADB]', 'U+2AFB', 'U+2AFD', 'U+2B32']
  multipleChar (3): '&&' '+=' '-=' 

infixEntriesWithSpacing3
  singleChar (85):  ['U+0025', 'U+002A', 'U+002E', 'U+0040', 'U+00B7', 'U+00D7', 'U+2022', 'U+2043', 'U+2206', 'U+220E', 'U+2217', '[U+223F, U+2240]', 'U+2297', 'U+2299', '[U+22A0, U+22A1]', 'U+22C5', 'U+22C7', '[U+22C9, U+22CC]', '[U+2305, U+2306]', '[U+25A0, U+25A1]', '[U+25AA, U+25AB]', '[U+25AD, U+25B1]', '[U+2981, U+2982]', '[U+2999, U+299A]', 'U+29B5', '[U+29C2, U+29C3]', '[U+29C9, U+29CD]', '[U+29D8, U+29D9]', 'U+29DB', '[U+29DF, U+29E0]', 'U+29E2', '[U+29E7, U+29ED]', '[U+29F8, U+29FB]', '[U+2A1D, U+2A21]', '[U+2A2F, U+2A37]', '[U+2A3B, U+2A3D]', 'U+2A3F', 'U+2A50', '[U+2ADC, U+2ADD]', 'U+2AFE']
  multipleChar (4): '||' '|||' '∽̱' '⫝̸' 

prefixEntriesWithLspace0Rspace0
  singleChar (51):  ['U+0021', 'U+002B', 'U+002D', 'U+00AC', 'U+00B1', 'U+2018', 'U+201C', '[U+2200, U+2201]', '[U+2203, U+2204]', 'U+2207', '[U+2212, U+2213]', '[U+221B, U+221C]', '[U+221F, U+2222]', 'U+223C', '[U+22BE, U+22BF]', 'U+2310', 'U+2319', '[U+2795, U+2796]', 'U+27C0', '[U+299B, U+29AF]', '[U+2AEC, U+2AED]']
  multipleChar (2): '||' '|||' 

postfixEntriesWithLspace0Rspace0
  singleChar (35):  ['[U+0021, U+0022]', '[U+0026, U+0027]', 'U+0060', 'U+00A8', 'U+00B0', '[U+00B2, U+00B4]', '[U+00B8, U+00B9]', '[U+02CA, U+02CB]', '[U+02D8, U+02DA]', 'U+02DD', 'U+0311', '[U+2019, U+201B]', '[U+201D, U+201F]', '[U+2032, U+2037]', 'U+2057', '[U+20DB, U+20DC]', 'U+23CD']
  multipleChar (7): '!!' '++' '--' '..' '...' '||' '|||' 

postfixEntriesWithLspace0Rspace0AndStretchy
  singleChar (26):  ['[U+005E, U+005F]', 'U+007E', 'U+00AF', '[U+02C6, U+02C7]', 'U+02C9', 'U+02CD', 'U+02DC', 'U+02F7', 'U+0302', 'U+2016', 'U+203E', '[U+2322, U+2323]', '[U+23B4, U+23B5]', '[U+23DC, U+23E1]', 'U+2980', '[U+1EEF0, U+1EEF1]']

prefixEntriesWithSpacing0AndStretchySymmetric
  singleChar (25):  ['U+0028', 'U+005B', '[U+007B, U+007C]', 'U+2308', 'U+230A', 'U+2329', 'U+2772', 'U+27E6', 'U+27E8', 'U+27EA', 'U+27EC', 'U+27EE', 'U+2983', 'U+2985', 'U+2987', 'U+2989', 'U+298B', 'U+298D', 'U+298F', 'U+2991', 'U+2993', 'U+2995', 'U+2997', 'U+29FC']

postfixEntriesWithSpacing0AndStretchySymmetric
  singleChar (25):  ['U+0029', 'U+005D', '[U+007C, U+007D]', 'U+2309', 'U+230B', 'U+232A', 'U+2773', 'U+27E7', 'U+27E9', 'U+27EB', 'U+27ED', 'U+27EF', 'U+2984', 'U+2986', 'U+2988', 'U+298A', 'U+298C', 'U+298E', 'U+2990', 'U+2992', 'U+2994', 'U+2996', 'U+2998', 'U+29FD']

prefixEntriesWithLspace3Rspace3AndSymmetricLargeop
  singleChar (22):  ['[U+222B, U+2233]', '[U+2A0B, U+2A0F]', '[U+2A15, U+2A1C]']

prefixEntriesWithLspace1Rspace2AndSymmetricMovablelimitsLargeop
  singleChar (18):  ['[U+220F, U+2210]', '[U+22C0, U+22C3]', '[U+2A00, U+2A09]', 'U+2AFC', 'U+2AFF']

prefixEntriesWithLspace3Rspace3AndSymmetricMovablelimitsLargeop
  singleChar (7):  ['U+2211', 'U+2A0A', '[U+2A10, U+2A14]']

otherEntries 21
  * {'form': 'infix', 'lspace': 0, 'rspace': 0}: 6
    ['U+005C', 'U+2061', 'U+2062', 'U+2063', 'U+2064', 'U+2396']

  * {'form': 'infix', 'lspace': 0, 'rspace': 3}: 3
    ['U+002C', 'U+003A', 'U+003B']

  * {'form': 'infix', 'lspace': 1, 'rspace': 1}: 3
    ['U+003F', 'U+005E', 'U+005F']

  * {'form': 'prefix', 'lspace': 0, 'rspace': 0, 'properties': {'stretchy': True}}: 3
    ['U+2016', 'U+221A', 'U+2980']

  * {'form': 'prefix', 'lspace': 3, 'rspace': 0}: 3
    ['U+2145', 'U+2146', 'U+2202']

  * {'form': 'infix', 'lspace': 4, 'rspace': 4, 'properties': {'stretchy': True}}: 2
    ['U+2044', 'U+2215']

  * {'form': 'infix', 'lspace': 3, 'rspace': 3, 'properties': {'stretchy': True, 'symmetric': True}}: 1
    ['U+007C']

otherEntriesWithMultipleCharacters 3
  * ** infix: {'form': 'infix', 'lspace': 1, 'rspace': 1}
  * // infix: {'form': 'infix', 'lspace': 1, 'rspace': 1}
  * <> infix: {'form': 'infix', 'lspace': 1, 'rspace': 1}

Separate tables for fences and separators:

fences
  singleChar (59):  ['[U+0028, U+0029]', 'U+005B', 'U+005D', '[U+007B, U+007C]', 'U+007C', '[U+007C, U+007D]', 'U+2016', 'U+2016', '[U+2018, U+2019]', '[U+201C, U+201D]', '[U+2308, U+230B]', '[U+2329, U+232A]', '[U+2772, U+2773]', '[U+27E6, U+27EF]', 'U+2980', 'U+2980', '[U+2983, U+2998]', '[U+29FC, U+29FD]']
  multipleChar (6): '||' '||' '||' '|||' '|||' '|||' 

separators
  singleChar (3):  ['U+002C', 'U+003B', 'U+2063']
fred-wang commented 4 years ago

Can these three be moved into existing categories?

otherEntriesWithMultipleCharacters 3

Also, do you plan to review the multi char operators too? I thought we agreed some of them probably don't make sense...

fred-wang commented 4 years ago

First attempt to make the dictionary more compact:

https://mathml-refresh.github.io/mathml-core/#operator-dictionary

fred-wang commented 4 years ago

Second attempt: there are now two forms of the operator dictionaries:

The mo section now refers to the compact one.

The multi-char are mapped to the BMP PUA (range U+E000–U+F8FF) so that they can be treated as single-char.

The original dictionary used at least (counting single char only) 1387(16 + 2 + 2 8 + 6)/8 = 6935bytes. The estimated size of the compact dictionary (including multiple char) is currently 1540bytes (-78%). We can maybe do better but final result are still blocked on the pending changes to the operator dictionary.

fred-wang commented 4 years ago

@NSoiffer @davidcarlisle I submitted a couple of PR to make things more consistent:

https://github.com/mathml-refresh/xml-entities/pull/24 https://github.com/mathml-refresh/xml-entities/pull/23 https://github.com/mathml-refresh/xml-entities/pull/22 https://github.com/mathml-refresh/xml-entities/pull/21 https://github.com/mathml-refresh/xml-entities/pull/20 https://github.com/mathml-refresh/xml-entities/pull/19 https://github.com/mathml-refresh/xml-entities/pull/18

After these changes, I believe the remaining entries could be classified as:

infix lspace=0 rspace=0 (invisible op) infix lspace=0 rspace=0.16666666666666666em (comma-like punctuations) prefix lspace=0.16666666666666666em rspace=0 (derivation-like operators)

which seems to deserve their own category indeed.

(I haven't tried to run the script with all the changes merged, but I'm willing to do it and check again after this is done)

fred-wang commented 4 years ago

This is done:

https://mathml-refresh.github.io/mathml-core/#operator-dictionary-compact

We still need special handling for a few edge cases but the main subset (553 entries) is now treated uniformly. That subset can be encoded as a 560bytes table and as a binary search on 224 elements (8 comparisons). Alternatively, this main subset can be encoded as a perfect hash function with a table using 16 bits / entry, but not sure whether the extra overhead (memory & complexity) is worth it. A note gives suggestion to implementers.