w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
60 stars 18 forks source link

Collect usage statistics for current MathML elements #55

Closed physikerwelt closed 2 years ago

physikerwelt commented 5 years ago

@physikerwelt I think in general it would be good to gather usage metrics of elements/attributes that are proposed for deprecation/removal. Maybe you can do that for wikipedia.

Originally posted by @fred-wang in https://github.com/mathml-refresh/mathml/issues/1#issuecomment-465093357

fred-wang commented 5 years ago

I think we should probably send an email to the Math WG mailing list to see if MathML users or developer of MathML authoring tools can provide more data.

NSoiffer commented 5 years ago

I suspect that we need to ask specific questions when we ask for data. For example:

fred-wang commented 5 years ago

@dginev @kohlhase @brucemiller would you be able to provide info for LaTeXML / Arxiv? @physikerwelt would you be able to provide info for Mathoid / Wikipedia?

dginev commented 5 years ago

Sure. As @NSoiffer suggests, you could ask what specific statistics may interest you, and I could generate a report. We have been publishing our recent arXiv HTML5 datasets (1.2 million papers with ~500 million math elements) and it is easy to extract some information on frequencies of math elements and attributes. E.g. just counting the math elements and their attributes is somewhat direct.

That said, since the arXMLiv resource is generated via latexml, it may have better behaved MathML than an arbitrary web page, and Bruce can already directly tell you which of the removal suggestions would require latexml changes.

Edit: I've started a stats collection job, should take a couple of days to finish and report back.

fred-wang commented 5 years ago

Thanks. I'm going to prepare a set of questions. I think both are interesting as it's possible that tools can generate some specific MathML element/attribute but that they are not really used in practice.

fred-wang commented 5 years ago

This survey intends to track usage statitics of MathML in order to get a better idea of what should belong to MathML Core, to MathML 4 or should be deprecated. Please answer the following questions as accurately as you can:

  1. Description. Please describe the MathML database / authoring tool (e.g. Wiki, digital library, latex-to-mathml converter, WYSIWYG MathML editor, computer algebra system, etc):

  2. Native MathML. Does your database / tool serve MathML content to native web engines (e.g. Firefox, iOS WebView, ...)?

  3. MathML elements. Please provide usage percentage for MathML elements in your database / list of generated MathML elements by your tool. Does your database / tool rely on the following elements? munder, mover, msub, msup, msubsup, mlabeledtr, merror, mphantom, maction, mglyph, mfenced, mstyle, ms.

  4. MathML attributes. Please provide usage percentage for MathML attributes in your database / list of generated MathML attributes by your tool. Does your database / tool rely on the following attributes? mathvariant, numalign, denomalign, align (on munderover/munder/mover), bevelled, subscripshift, superscriptshift, other, macros, mode, fontfamily, index, fontfamily, fontweight, fontstyle, fontsize, color, background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace

  5. Attributes on the mstyle element. Does your database / tool use attributes on the mstyle element other than the following ones? displaystyle, dir, mathsize, mathbackground, mathcolor, mathvariant, scriptlevel

  6. Attribute values. Does your database / tool use any of the following attribute values?

    • linethickness attribute with value "thin", "thick" or "medium"
    • mathsize attribute with value "small", "normal" or "big"
    • attribute with value a nonzero number without unit (e.g. "4") that are defined as length (i.e. exclude mglyph@index, scriptlevel, mtd@rowspan, mtd@columnspan, maction@selection, msgroup@position, msgroup@shift, msrow@position, mscarries@position, msline@position, msline@length)
    • attribute with value "veryverythinmathspace", "verythinmathspace", "thinmathspace", "mediummathspace", "thickmathspace", "verythickmathspace" or "veryverythickmathspace".
    • notation attribute containing the value "radical" (e.g. notation="radical circle")
    • attribute with leading or trailing white space characters (U+0020, U+0009, U+000A, U+000D or U+000C). For example width=" 5em ".
  7. Trailing/leading whitespace in token elements. Does your database / tool use any token elements (mi, mtext, mn, mo, mtext, ms) whose text content has leading or trailing white space characters (U+0020, U+0009, U+000A, U+000D or U+000C). For example <mi> x </mi>.

fred-wang commented 5 years ago

I wrote a basic survey in https://github.com/mathml-refresh/mathml/issues/55#issuecomment-474768228 ; the data can be provided by basic search features and does not require actual knowledge of the MathML semantic.

fred-wang commented 5 years ago
  1. TeXZilla, LALR Javascript Unicode LaTeX-to-MathML converter
  2. Yes, it has a web page https://fred-wang.github.io/TeXZilla/ and a Firefox add-on.
  3. annotation, maction, math, menclose merror, mfrac, mi, mmultiscripts, mn, mo, mover, mpadded, mphantom, mprescripts, mroot, mrow, ms, mspace, msqrt, mstyle, msub, msubsup, msup, mtable, mtd, mtext, mtr, munder, munderover, none, semantics It does not use mglyph, mfenced or mlabeledtr.
  4. actiontype, align, colspan, columnalign, columnlines, depth, dir, display, displaystyle, equalcolumns, equalrows, frame, height, linethickness, lspace, mathbackground, mathcolor, mathvariant, maxsize, minsize, notation, rowlines, rowspacing, rowspan, rspace, scriptlevel, stretchy, voffset, width, xmlns No, except mathvariant.
  5. It only uses displaystyle, scriptlevel, mathcolor, mathbackground ; dir/mathsize are used on the math element ; mathvariant is used on mstyle in some exceptional situations (always used prior to version 1.0.1).
  6. No (named spaces on mo and mspace prior to version 1.0.0)
  7. No.
davidcarlisle commented 5 years ago

results of survey for NAG manual (internal draft but basically https://www.nag.co.uk/numeric/fl/nagdoc_fl26.2/html/frontmatter/manconts.html

  1. essentially hand authored (with some XSLT post processing) Mostly using emacs nxml-mode

  2. mathml in HTML5 by default served as-is to firefox, via mathjax to other browsers.

  3. full detail at end, no use of mlabeledtr, merror, maction, mglyph

  4. full detail at end, uses mathvariant but not the others you list other than a few (removable) uses of other

  5. only displaystyle and mathcolor

  6. No, other than some use of lspace="thinmathspace"

  7. no

Details


436,262 math expressions
2,623,875 mathml elements

elements used

436262 instances
<math
 display="block"
 displaystyle="true"
>

58 instances
<menclose
  notation="bottom"
>

133002 instances
<mfenced
separators=","
separators=""
open="|"
open="'"
open=""
open="("
open="["
open="{"
open="&#x2016;"
open="&#x2308;"
open="&#x230a;"
open="&#x2329;"
close="|"
close="'"
close=""
close=")"
close="["
close="]"
close="}"
close="&#x2016;"
close="&#x2309;"
close="&#x230b;"
close="&#x232a;"
close="&#xa0;"
>

8653 instances
<mfrac
 other="display"
 other="small
>

818436 instances
<mi
 href=< URL >
 mathcolor=< #hex >
 mathvariant= bold|bold-italic|italic|monospace|normal|script
>

244 instances
<mmultiscripts>

281635 instances
<mn
 href=< URL >
 mathcolor=< #hex >
 mathvariant= bold|bold-italic|italic|monospace|normal|script
>

469293 instances
<mo
 lspace="0pt"
 rspace="0pt"
 lspace="thinmathspace"
 mathvariant="bold|normal"
 minsize="< length >em"
 other="big"
>

9561 instances
<mover>

2026 instances
<mpadded
  width=< length > em"
  height="< length >em"
  depth="< length >em"
  voffset="< length >em"
>

3914 instances
<mphantom>

244 instances
<mprescripts>

36 instances
<mroot>

157210 instances
<mrow>

127 instances
<ms>

12604 instances
<mspace
 linebreak="newline"
 width=< length >em
 >

2637 instances
<msqrt>

940 instances
<mstyle
  displaystyle="true"
  mathcolor="#003399"
>

93239 instances
<msub>

6350 instances
<msubsup>

28049 instances
<msup>

5715 instances
<mtable
  rowlines="none none none solid none"
  columnlines="none none none solid none"
>

61578 instances
<mtd
columnalign="center|left|right"
>

68458 instances
<mtext
 mathvariant="italic"
>

18725 instances<mtr
columnalign="center|left|right"
>

1260 instances
<munder>

3156 instances
<munderover
columnalign="center|left|right"
>

462 instances
<none>
sideshowbarker commented 5 years ago

I could add use counters to the W3C HTML checker to collect statistics for this

emilio commented 5 years ago

Also let me know if you want use counters for some of these in Gecko, I can let you know how to add them or add them myself.

physikerwelt commented 5 years ago

Sure. However the MathML is generated via MathJax. @andreg-p did recently analyse the arxiv dataset. Can you share your results here? However this was also generated (by LaTeXML). Maybe the MathML in PubMed Central is more diverse. I am travelling and will look into the Wikipedia dataset next week.

On Tue, 19 Mar 2019, 17:18 Frédéric Wang, notifications@github.com wrote:

@dginev https://github.com/dginev @kohlhase https://github.com/kohlhase @brucemiller https://github.com/brucemiller would you be able to provide info for LaTeXML / Arxiv? @physikerwelt https://github.com/physikerwelt would you be able to provide info for Mathoid / Wikipedia?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mathml-refresh/mathml/issues/55#issuecomment-474452772, or mute the thread https://github.com/notifications/unsubscribe-auth/ACpiiEMXLbqhebhTgTWqWLN-hcY0IUk_ks5vYQ3ZgaJpZM4bQoDG .

sideshowbarker commented 5 years ago

the MathML is generated via MathJax

Do you mean it’s generated on the client side (from JavaScript running in a browser)?

I guess I should note that for the case of the W3C HTML Checker, I won’t be able to collect use counters for any MathML markup that’s dynamically generated by JavaScript running on the client side in a browser. The HTML Checker sees only the source of the document, not the DOM.

fred-wang commented 5 years ago

the MathML is generated via MathJax

Do you mean it’s generated on the client side (from JavaScript running in a browser)?

It's server-side: https://github.com/wikimedia/mathoid

fred-wang commented 5 years ago

Sure. However the MathML is generated via MathJax. @AndreG-P did recently analyse the arxiv dataset. Can you share your results here? However this was also generated (by LaTeXML). Maybe the MathML in PubMed Central is more diverse. I am travelling and will look into the Wikipedia dataset next week. On Tue, 19 Mar 2019, 17:18 Frédéric Wang, @.***> wrote: @dginev https://github.com/dginev @kohlhase https://github.com/kohlhase @brucemiller https://github.com/brucemiller would you be able to provide info for LaTeXML / Arxiv? @physikerwelt https://github.com/physikerwelt would you be able to provide info for Mathoid / Wikipedia? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ACpiiEMXLbqhebhTgTWqWLN-hcY0IUk_ks5vYQ3ZgaJpZM4bQoDG .

I think it's not a problem to have several replies relying on an analysis of a converter's source code and real content generated by the same converter.

AndreG-P commented 5 years ago

I currently only have a sneak peek of 1001 mathematical arXiv documents (367.236 MathML expressions). We extracted these expressions from the arXMLiv dataset 08.2018 that @dginev mentioned.

I'm sorry it's not a list for the entire arXiv. We currently working on a minimized MathML dataset to save resources and therefore our distributions wouldn't representative for your questions. However, we didn't apply any filters or changes on these 1001 documents.

Here is a list of the elements (click me to unfold)

``` xml mo ci mi apply mrow annotation csymbol math semantics annotation-xml mn cn times msub msup eq minus plus divide mtd interval mfrac msubsup in list share mover mtext and leq abs mpadded geq mstyle mtr set sum lt matrixrow vector gt infinity munder partialdiff subset munderover intersect int mtable union root cerror msqrt neq matrix setdiff log equivalent factorial g emptyset ln compose path floor min limit max sin notin mspace cos approx span ceiling none exp a mmultiscripts or gcd circle determinant real tan mprescripts sinh cot cosh arg mroot degree tanh prsubset exists imaginary svg sec arctan not img implies arccos arcsin cite exponentiale menclose csc ```

And the list of the attributes (click me to unfold)

``` xml id xref encoding cd stretchy class alttext display kmcs-r type mathvariant closure columnalign href accent rspace largeop symmetric displaystyle width mathsize movablelimits lspace maxsize minsize rowspacing fence linethickness columnspacing separator stroke stroke-width fill d transform major-collection minor-collection fine-collection accentunder height depth mathcolor style title voffset r cx cy version fragid viewbox overflow scriptlevel align columnspan src alt notation ```

Here is the list of the 1001 document IDs. It's probably not helpful but you can check the documents manually if you wish. (click me to unfold)

``` 0705.0012 0705.0175 0705.0179 0705.0194 0705.0457 0705.0528 0705.0698 0705.0768 0705.0908 0705.1220 0705.1273 0705.1732 0705.1806 0705.2109 0705.2182 0705.2422 0705.2578 0705.3171 0705.3241 0705.3273 0705.3310 0705.3443 0705.3457 0705.3673 0705.3693 0705.3715 0705.3929 0705.3953 0705.4015 0705.4111 0705.4123 0705.4178 0705.4483 0705.4536 0705.4573 0706.2433 0707.0035 0707.0111 0707.0229 0707.0491 0707.0518 0707.0699 0707.0907 0707.1102 0707.1108 0707.1111 0707.1177 0707.1790 0707.2121 0707.2122 0707.2123 0707.2124 0707.2221 0707.2259 0707.2563 0707.2591 0707.2870 0707.2995 0707.3052 0707.3364 0707.3371 0707.3373 0707.3394 0707.3426 0707.3450 0707.3590 0707.3615 0707.3903 0707.4034 0707.4112 0707.4261 0707.4328 0707.4499 0710.0143 0710.0144 0710.0163 0710.0193 0710.0234 0710.0464 0710.0813 0710.0886 0710.0943 0710.0967 0710.0989 0710.1019 0710.1147 0710.1234 0710.1295 0710.1360 0710.1468 0710.1521 0710.1886 0710.1911 0710.1929 0710.1981 0710.2088 0710.2123 0710.2216 0710.2296 0710.2304 0710.2310 0710.2379 0710.2388 0710.2625 0710.2627 0710.2645 0710.2685 0710.2973 0710.3001 0710.3177 0710.3188 0710.3389 0710.3409 0710.3413 0710.3451 0710.3531 0710.3595 0710.3718 0710.3857 0710.3882 0710.3928 0710.3947 0710.3956 0710.3964 0710.3997 0710.4347 0710.4437 0710.4586 0710.4605 0710.4991 0710.5148 0710.5328 0710.5478 0710.5518 0710.5648 0710.5683 0710.5799 0710.5863 0710.5894 0711.0071 0711.0111 0711.0225 0711.0417 0711.0445 0711.0560 0711.0717 0711.0915 0711.0947 0711.1132 0711.1153 0711.1185 0711.1333 0711.1417 0711.1479 0711.1753 0711.1943 0711.1956 0711.2054 0711.2223 0711.2269 0711.2443 0711.2502 0711.2673 0711.2876 0711.2938 0711.3221 0711.3269 0711.3485 0711.3488 0711.3512 0711.3656 0711.3678 0711.3711 0711.3940 0711.3974 0711.4074 0711.4322 0711.4357 0711.4394 0711.4412 0711.4426 0711.4456 0711.4480 0711.4595 0711.4648 0711.4949 0711.4985 0711.4986 0711.4999 0711.5004 0909.0083 0909.0106 0909.0113 0909.0240 0909.0301 0909.0303 0909.0335 0909.0339 0909.0362 0909.0471 0909.0684 0909.0710 0909.0783 0909.1050 0909.1162 0909.1437 0909.1452 0909.1616 0909.1620 0909.1665 0909.1900 0909.1965 0909.1994 0909.2101 0909.2304 0909.2497 0909.2640 0909.2696 0909.2744 0909.2817 0909.2983 0909.3354 0909.3453 0909.3459 0909.3566 0909.3653 0909.3682 0909.3763 0909.3928 0909.3968 0909.3972 0909.4111 0909.4246 0909.4329 0909.4396 0909.4591 0909.4718 0909.4760 0909.4774 0909.4865 0909.4913 0909.4960 0909.5071 0909.5072 0909.5199 0909.5512 0909.5623 0909.5652 0909.5664 1004.0033 1004.0154 1004.0167 1004.0197 1004.0200 1004.0253 1004.0290 1004.0394 1004.0582 1004.0674 1004.0713 1004.0723 1004.0759 1004.0904 1004.1068 1004.1084 1004.1244 1004.1326 1004.1661 1004.1883 1004.1934 1004.2214 1004.2285 1004.2511 1004.2639 1004.2759 1004.2946 1004.2983 1004.3038 1004.3259 1004.3358 1004.3376 1004.3552 1004.3799 1004.3826 1004.3866 1004.3904 1004.3938 1004.4194 1004.4293 1004.4374 1004.4539 1004.4832 1004.5183 1004.5273 1004.5434 1004.5510 1007.0115 1007.0157 1007.0225 1007.0257 1007.0259 1007.0316 1007.0353 1007.0567 1007.0568 1007.0677 1007.0688 1007.0713 1007.0804 1007.1027 1007.1175 1007.1441 1007.1553 1007.1615 1007.1734 1007.1786 1007.1839 1007.2054 1007.2239 1007.2295 1007.2521 1007.2822 1007.2959 1007.3072 1007.3399 1007.3401 1007.3406 1007.3460 1007.3467 1007.3659 1007.4022 1007.4030 1007.4283 1007.4285 1007.4757 1007.4811 1007.5197 1007.5273 1007.5335 1007.5350 1007.5426 1009.0065 1009.0098 1009.0285 1009.0392 1009.0468 1009.0487 1009.0568 1009.0575 1009.0793 1009.0821 1009.1160 1009.1219 1009.1419 1009.1429 1009.1439 1009.1467 1009.1500 1009.1670 1009.2152 1009.2199 1009.2644 1009.2973 1009.2984 1009.3061 1009.3383 1009.3608 1009.3973 1009.4059 1009.4322 1009.4440 1009.4454 1009.4750 1009.4814 1009.4995 1009.5245 1009.5296 1009.5366 1009.5783 1009.5835 1009.5842 1009.5893 1009.5912 1009.5970 1009.6023 1009.6138 1009.6225 1103.0255 1103.0324 1103.0533 1103.0868 1103.1041 1103.1152 1103.1272 1103.1295 1103.1310 1103.1354 1103.1418 1103.1776 1103.1801 1103.1906 1103.1920 1103.2043 1103.2087 1103.2202 1103.2470 1103.2513 1103.2576 1103.2600 1103.2629 1103.2657 1103.2825 1103.2959 1103.3136 1103.3365 1103.3428 1103.3533 1103.3576 1103.3803 1103.3810 1103.3858 1103.3945 1103.4068 1103.4508 1103.4514 1103.4518 1103.4725 1103.4752 1103.4796 1103.4994 1103.5137 1103.5227 1103.5406 1103.5473 1103.5505 1103.5728 1103.5826 1103.5960 1204.0109 1204.0287 1204.0362 1204.0530 1204.0609 1204.0620 1204.0705 1204.0712 1204.0930 1204.0994 1204.1090 1204.1351 1204.1600 1204.1841 1204.2001 1204.2057 1204.2568 1204.2595 1204.2709 1204.2963 1204.3112 1204.3193 1204.3215 1204.3222 1204.3313 1204.3387 1204.3549 1204.3937 1204.3947 1204.4516 1204.4641 1204.4648 1204.4953 1204.4963 1204.5014 1204.5134 1204.5141 1204.5160 1204.5166 1204.5192 1204.5490 1204.5494 1204.5510 1204.5565 1204.5956 1204.6131 1204.6443 1204.6457 1204.6520 1204.6569 1204.6589 1204.6681 1204.6731 1206.0098 1206.0128 1206.0320 1206.0407 1206.0455 1206.0779 1206.0860 1206.0892 1206.1107 1206.1136 1206.1167 1206.1170 1206.1175 1206.1342 1206.1474 1206.1535 1206.1613 1206.1761 1206.1811 1206.1823 1206.1941 1206.1945 1206.2023 1206.2259 1206.2376 1206.2409 1206.2576 1206.2815 1206.2849 1206.2880 1206.2955 1206.3011 1206.3020 1206.3057 1206.3082 1206.3139 1206.3396 1206.3409 1206.3544 1206.3652 1206.3703 1206.3744 1206.3947 1206.4177 1206.4186 1206.4227 1206.4353 1206.4530 1206.4731 1206.4740 1206.4950 1206.5012 1206.5167 1206.5449 1206.5523 1206.5867 1206.5868 1206.6143 1206.6174 1206.6212 1206.6327 1206.6340 1206.6638 1206.6690 1206.6708 1206.6731 1206.6743 1206.6904 1206.7001 1206.7074 1302.0044 1302.0048 1302.0078 1302.0125 1302.0144 1302.0276 1302.0348 1302.0472 1302.0571 1302.0778 1302.0872 1302.0917 1302.1038 1302.1058 1302.1167 1302.1218 1302.1244 1302.1247 1302.1384 1302.1439 1302.1454 1302.2039 1302.2100 1302.2294 1302.2315 1302.2329 1302.2338 1302.2405 1302.2639 1302.2784 1302.2789 1302.3149 1302.3192 1302.3207 1302.3212 1302.3531 1302.3678 1302.3811 1302.3840 1302.3899 1302.4042 1302.4192 1302.4396 1302.4401 1302.4434 1302.4513 1302.4626 1302.4825 1302.4902 1302.5020 1302.5038 1302.5210 1302.5304 1302.5588 1302.5591 1302.5719 1302.5976 1302.5987 1302.6042 1302.6046 1302.6097 1302.6116 1302.6375 1302.6583 1302.6950 1302.6954 1302.7066 1302.7249 1306.0033 1306.0107 1306.0136 1306.0143 1306.0167 1306.0204 1306.0280 1306.0403 1306.0819 1306.0822 1306.0943 1306.0988 1306.1113 1306.1114 1306.1117 1306.1138 1306.1172 1306.1174 1306.1376 1306.1477 1306.1524 1306.1558 1306.1715 1306.1728 1306.1900 1306.2012 1306.2032 1306.2254 1306.2382 1306.2383 1306.2741 1306.3073 1306.3103 1306.3508 1306.3513 1306.3648 1306.4006 1306.4046 1306.4179 1306.4290 1306.4299 1306.4344 1306.4386 1306.4387 1306.4416 1306.4481 1306.4504 1306.4559 1306.4573 1306.4850 1306.4891 1306.4943 1306.5225 1306.5283 1306.5403 1306.5497 1306.5635 1306.5645 1306.5656 1306.5732 1306.5872 1306.5952 1306.5956 1306.6391 1306.6398 1306.6409 1306.6786 1306.6821 1306.6902 1307.0259 1307.0554 1307.0625 1307.0630 1307.0900 1307.0960 1307.1036 1307.1047 1307.1054 1307.1065 1307.1455 1307.1521 1307.1600 1307.1664 1307.1768 1307.1801 1307.1981 1307.2069 1307.2127 1307.2131 1307.2163 1307.2527 1307.2604 1307.2666 1307.2770 1307.2833 1307.2895 1307.2976 1307.3042 1307.3047 1307.3096 1307.3215 1307.3287 1307.3462 1307.3693 1307.3716 1307.3809 1307.3815 1307.3971 1307.3983 1307.4006 1307.4047 1307.4111 1307.4203 1307.4245 1307.4320 1307.4328 1307.4387 1307.4393 1307.4439 1307.4679 1307.4884 1307.4936 1307.5033 1307.5088 1307.5115 1307.5401 1307.5407 1307.5413 1307.5417 1307.5453 1307.5509 1307.5836 1307.5927 1307.6029 1307.6054 1307.6076 1307.6443 1307.6502 1307.6693 1307.6944 1307.7363 1307.7431 1307.7455 1307.7778 1307.7794 1307.7797 1307.8030 1307.8135 1307.8161 1307.8236 1307.8321 1307.8347 1307.8370 1402.2703 1402.4005 1611.07204 1702.03425 1703.06195 1704.00273 1704.00487 1704.00600 1704.00657 1704.00779 1704.00851 1704.01109 1704.01156 1704.01303 1704.01418 1704.01658 1704.01726 1704.01892 1704.01907 1704.01951 1704.02459 1704.02480 1704.02611 1704.02634 1704.02871 1704.03066 1704.03378 1704.03434 1704.03510 1704.03637 1704.03771 1704.03842 1704.04143 1704.04150 1704.04262 1704.04318 1704.04388 1704.04540 1704.04640 1704.04665 1704.05535 1704.05666 1704.05994 1704.06068 1704.06132 1704.06401 1704.06585 1704.06667 1704.07022 1704.07090 1704.07159 1704.07200 1704.07209 1704.07264 1704.07311 1704.07328 1704.07634 1704.07902 1704.08037 1704.08060 1704.08184 1704.08417 1704.08474 1704.08483 1704.08952 1704.08959 1704.09016 1802.00339 1802.00556 1802.00558 1802.01099 1802.01260 1802.01324 1802.01330 1802.01608 1802.01711 1802.01944 1802.02027 1802.02321 1802.02478 1802.02533 1802.02630 1802.02726 1802.03073 1802.03078 1802.03087 1802.03382 1802.03387 1802.03443 1802.03444 1802.03552 1802.03553 1802.03579 1802.03618 1802.03754 1802.03846 1802.03947 1802.04022 1802.04481 1802.04531 1802.04677 1802.04689 1802.04921 1802.04984 1802.05026 1802.05061 1802.05158 1802.05222 1802.05331 1802.05468 1802.05582 1802.05704 1802.05724 1802.05770 1802.05953 1802.06031 1802.06097 1802.06170 1802.06200 1802.06298 1802.06499 1802.06696 1802.06985 1802.07046 1802.07519 1802.07609 1802.07646 1802.08001 1802.08015 1802.08443 1802.08556 1802.09003 1802.09039 1802.09250 1802.09309 1802.09521 1802.09858 1802.09969 1802.10075 1802.10239 1802.10486 math0008028 math0008029 math0008039 math0008044 math0008045 math0008052 math0008078 math0008096 math0008107 math0008146 math0008148 math0008152 math0008167 math0008172 math0008180 math0008186 math0008187 math0008210 math0008240 math0109106 math0109162 math0109166 math0109167 math0109168 math0109191 math0109196 math0109197 math0109220 math0109222 math0110028 math0110057 math0110062 math0110066 math0110078 math0110123 math0110157 math0110160 math0110174 math0110197 math0110218 math0111006 math0111065 math0111091 math0111128 math0111168 math0111173 math0111257 math0111282 math0208025 math0208038 math0208043 math0208057 math0208075 math0208077 math0208120 math0208125 math0208158 math0208210 math0208219 math0208221 math0208236 math0304062 math0304090 math0304125 math0304136 math0304137 math0304142 math0304149 math0304160 math0304252 math0304381 math0304399 math0304410 math0304433 math0304434 math0304458 math0304496 math9708216 ```

brucemiller commented 5 years ago

Here are some data for LaTeXML as a converter; No statistics on usage, as that depends on the converted documents -- that'll probably follow.

(1) LaTeXML: authoring tool converts TeX/LaTeX (full documents or fragments) to various forms of XML, HTML, including MathML. (2) Native MathML intended, but users can configure polyfills such as MathJax when desired. (3) Used (presentation) elements: annotation, annotation-xml, math, menclose, merror, mfrac, mi, mmultiscripts, mn, mo, mover, mpadded, mphantom, mprescripts, mroot, mrow, mspace, msqrt, mstyle, msub, msubsup, msup, mtable, mtd, mtext, mtr, munder, munderover, none, semantics. Does NOT use: mlabeledtr, maction, mglyph, ms, mfenced (by default) (4) Many MathML attributes are used. Of the explicitly listed attributes, only mathvariant is used (but generally tries to map to Unicode). (5) mstyle uses displaystyle, scriptlevel, mathcolor (potentially, but rarely, href) (6) The listed named values for attributes are not used [but see note below] (7) no leading/trailing whitespace

Note: currently there are a couple of stray "mathspace" values used that were overlooked. These will be replaced by explicit lengths in the next software update, so consider them as not used.

fred-wang commented 5 years ago

Note: currently there are a couple of stray "mathspace" values used that were overlooked. These will be replaced by explicit lengths in the next software update, so consider them as not used.

I've just released a new version of TeXZilla that replace named mathspace with explicit lengths ; and updated my reply accordingly.

dginev commented 5 years ago

Following up on @brucemiller 's comment, here is the data footprint of the Digital Library of Mathematical Functions

DLMF v1.0.20

  1. Description: DLMF v1.0.20 is a collection of 1828 HTML5 pages, converted from semantically-enriched LaTeX via LaTeXML 0.8.3. It contains 108,952 <math> elements.

  2. The DLMF is served at https://dlmf.nist.gov . It uses (metadata-enhanced) Presentation MathML for capable browser engines, and a MathJax polyfill for others (with client-side MathML rendering).

  3. MathML elements. A full report over the data can be seen here. It was generated by the llamapun toolkit. Comparing to the shortlist:

    • in use: munder, mover, msub, msup, msubsup, mphantom, mstyle.
    • not in use: mlabeledtr, merror, maction, mglyph, mfenced, ms.
  4. MathML attributes:

    • in use: mathvariant, align (on mtable),
    • not in use: numalign, denomalign, align (on munderover/munder/mover), bevelled, subscripshift, superscriptshift, other, macros, mode, index, fontfamily, fontweight, fontstyle, fontsize, color, background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace
    1. <mstyle> attributes:

      • in use (expected): displaystyle, scriptlevel
    2. Attribute values:

      • [ ] linethickness -- numeric pt value only, none of named keywords
      • [ ] mathsize -- numeric % values only, none of named keywords
      • [ ] N/A attribute with value a nonzero number without unit (e.g. "4") other than scriptlevel
      • [x] one use of "veryverythickmathspace", as Bruce mentioned
      • [ ] notation with value "radical" - none. Only notation attribute value used is updiagonalstrike
      • [ ] attribute with leading or trailing white space characters - none
  1. No trailing/leading whitespace in token elements.

Notes: I find a couple of the reported <mtable> "align" attribute values curious -- unsure if the MathML 4 effort would like to simplify the syntax here. The data for align[1] come from align="baseline 1", as my report splits attributes by whitespace. (e.g. in DLMF 16.17.E1 ). The other curious entry (e.g. in DLMF 10.61.E3 ) is for an align="bottom1". Just reporting these as curious syntax to my untrained eye, I'm by no means an mtable expert.

P.S. Expect a similar report on the full arXiv data later today, walking the corpus for data collection ended up closer to 3 days than 2.

Edit: thanks for the clarification Frédéric! Definitely worth removing the confusion.

fred-wang commented 5 years ago

@dginev Thanks for the detailed report, looking forward to the arXiv one. Two quick comments:

dginev commented 5 years ago

arXMLiv 08.2018

  1. Description: arXMLiv 08.2018 is an HTML5 dataset of 1.2 million scientific articles from arXiv.org, created by me as part of our work at the KWARC research group. The data is converted from LaTeX via LaTeXML 0.8.3 and the CorTeX build system. The collection contains ~550 million <math> elements, with parallel Content MathML annotations.

  2. The dataset can be both downloaded and explored online. The CorTeX preview uses Presentation+Content MathML for capable browser engines, and a MathJax polyfill for others (with client-side MathML rendering).

  3. MathML elements. A de-noised, but otherwise exhaustive, report over the data can be seen here for presentation MathML, as well as here for content MathML.

    • Worth mentioning is that since the arXMLiv dataset is not curated in any form, and includes documents with known latexml errors, there are documents where the MathML is wrongly polluted with elements from external namespaces. I have tried my best to remove all of these cases before reporting here, and included the script, so that there is transparency in what data got discarded, for anyone interested.
    • As Frédéric initially requested, I have included pre-computed ratios for each report row, compared to the total <math> elements in arXiv. It's a curious report to study (again, generated via the llamapun toolkit).
      • Comparing to the shortlist:
        • in use: munder, mover, msub, msup, msubsup, mphantom, mstyle, merror,
        • not in use: mlabeledtr, maction, mglyph, mfenced, ms.
  4. MathML attributes:

    • in use: mathvariant, align (on mtable), mathbackground
    • not in use: numalign, denomalign, align (on munderover/munder/mover), bevelled, subscripshift, superscriptshift, other, macros, mode, index, fontfamily, fontweight, fontstyle, fontsize, color, background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace
    1. <mstyle> attributes:

      • in use : displaystyle, scriptlevel, id, xref, mathcolor, class, style
    2. Attribute values:

      • [ ] linethickness -- numeric pt value only, none of named keywords. (some errors for us to fix, but no intentional other use)
      • [ ] mathsize -- numeric % values only, none of named keywords
      • [x] attribute with value a nonzero number without unit. Yes, in what I see are two cases: e.g. mtd@rowspan[2], mtd@columnspan[8].
      • [x] one use of "veryverythickmathspace", as Bruce mentioned
      • [ ] notation with value "radical" - none. Notation attribute values are:
        • box, downdiagonalstrike, updiagonalarrow, updiagonalstrike
      • [ ] attribute with leading or trailing white space characters - none
  1. No trailing/leading whitespace in token elements.

Thanks for the patience with the report, a big part of the delay was the slowdown brought by the incredibly noisy error subset of the articles. I've left more details at the Gist, for anyone curious.

fred-wang commented 5 years ago

@dginev Thank you so much for this report, it's really cool to have such a big database of concrete MathML.

Regarding "attribute with value a nonzero number without unit", the survey should really be 'length attribute with value a nonzero number without unit'. However, I tried to make it understandable by anyone without detailed knowledge of the spec + so that one could easily write a script to extract data. mtd@columnspan and mtd@rowspan are defined as "positive-integer" ( https://mathml-refresh.github.io/mathml/chapter3.html#presm.mtdatts ) so they are not included in #24 ; I'll try updating the survey.

sideshowbarker commented 5 years ago

I added use counters to the W3C HTML checker. You can view the current results here:

https://validator.w3.org/nu/stats.html

(Scroll down and look at the rows that start with Math.)

sideshowbarker commented 5 years ago

For the record here, the following is the relevant use-counter data collected so far from 2,316,780 documents checked by the W3C HTML checker:

Use-counter data for 2,316,780 documents Counter | Occurrences* | Proportion -- | --: | -- element `` | 82 | 0.000035 element `` | 2 | 0.000001 element `` | 0 | 0.000000 element `` | 208 | 0.000090 element `` | 10 | 0.000004 element `` | 0 | 0.000000 element `` | 28 | 0.000012 element `` | 110 | 0.000047 element `` | 0 | 0.000000 element `` | 197 | 0.000085 element `` | 0 | 0.000000 element `` | 3 | 0.000001 element `` | 197 | 0.000085 element `` | 165 | 0.000071 element `` | 13 | 0.000006 element `` | 4 | 0.000002 element `` | 2 | 0.000001 element `` | 2 | 0.000001 element `` | 32 | 0.000014 element `` | 157 | 0.000068 element `` | 0 | 0.000000 element `` | 16 | 0.000007 element `` | 76 | 0.000033 element `` | 112 | 0.000048 element `` | 112 | 0.000048 element `` | 12 | 0.000005 element `` | 100 | 0.000043 element `` | 42 | 0.000018 element `` | 42 | 0.000018 element `` | 53 | 0.000023 element `` | 42 | 0.000018 element `` | 5 | 0.000002 element `` | 9 | 0.000004 element `` | 3 | 0.000001 element `` | 82 | 0.000035 attribute "actiontype" | 0 | 0.000000 attribute "background" | 0 | 0.000000 attribute "bevelled" | 0 | 0.000000 attribute "color" | 0 | 0.000000 attribute "colspan" | 0 | 0.000000 attribute "columnalign" | 35 | 0.000015 attribute "columnlines" | 0 | 0.000000 attribute "denomalign" | 0 | 0.000000 attribute "depth" | 0 | 0.000000 attribute "dir" | 0 | 0.000000 attribute "display" | 32 | 0.000014 attribute "displaystyle" | 104 | 0.000045 attribute "equalcolumns" | 0 | 0.000000 attribute "equalrows" | 0 | 0.000000 attribute "fontfamily" | 0 | 0.000000 attribute "fontsize" | 0 | 0.000000 attribute "fontstyle" | 0 | 0.000000 attribute "fontweight" | 0 | 0.000000 attribute "frame" | 0 | 0.000000 attribute "height" | 7 | 0.000003 attribute "index" | 0 | 0.000000 attribute "linethickness" | 6 | 0.000003 attribute "lspace" | 5 | 0.000002 attribute "macros" | 0 | 0.000000 attribute "mathbackground" | 1 | 0.000000 attribute "mathcolor" | 15 | 0.000006 attribute "mathvariant" | 26 | 0.000011 attribute "maxsize" | 2 | 0.000001 attribute "mediummathspace" | 0 | 0.000000 attribute "minsize" | 2 | 0.000001 attribute "mode" | 0 | 0.000000 attribute "notation" | 10 | 0.000004 attribute "numalign" | 0 | 0.000000 attribute "other" | 0 | 0.000000 attribute "rowlines" | 0 | 0.000000 attribute "rowspacing" | 1 | 0.000000 attribute "rowspan" | 0 | 0.000000 attribute "rspace" | 5 | 0.000002 attribute "scriptlevel" | 83 | 0.000036 attribute "stretchy" | 55 | 0.000024 attribute "subscripshift" | 0 | 0.000000 attribute "superscriptshift" | 2 | 0.000001 attribute "thickmathspace" | 0 | 0.000000 attribute "thinmathspace" | 0 | 0.000000 attribute "verythickmathspace" | 0 | 0.000000 attribute "verythinmathspace" | 0 | 0.000000 attribute "veryverythickmathspace" | 0 | 0.000000 attribute "veryverythinmathspace" | 0 | 0.000000 attribute "voffset" | 0 | 0.000000 attribute "width" | 16 | 0.000007 attribute "xmlns" | 0 | 0.000000 element `` with attributes other than "dir", etc. | 19 | 0.000008 attribute "linethickness" with value "thin", "thick" or "medium" | 0 | 0.000000 attribute "mathsize" with value "small", "normal" or "big" | 0 | 0.000000 attribute with unitless-length value | 0 | 0.000000 attribute with "named space" value: "verythinmathspace", etc. | 4 | 0.000002 attribute "notation" with "radical" in value | 4 | 0.000002 attribute with leading/trailing whitespace in value | 0 | 0.000000 element with leading/trailing whitespace in contents | 6 | 0.000003 * out of 2,316,780 documents total

The final column is a proportion where 1.0 would mean 100%. So the 0.000090 number for the <math>-element counter means that 0.009% of documents checked had a math element.

And so assuming all the MathML content checked had a math element, that means the numbers for the other counters can considered relative to 208.

So the “element with leading/trailing whitespace in contents” means 6 out of 208 instances of math content — ~2.9% — had at least one element with leading/trailing whitespace in its text content.

NSoiffer commented 5 years ago

It seems like mover/munder/munderover are big potential problems wrt to the accent rule. If we end up deciding that automatic determination of the value of the accent attr won't be part of core (can't use an ssty-like font attr or whatever), then it is important to get some usage stats as to how often the attr is specified and if it isn't given, how often it should be an accent vs a limit . To do that, we need to know what the second (and third) arguments are, or at least those that are mo and a count of the other cases.

@dginev's detailed data does provide us with those numbers (minus the characters that are accents) because the generator always uses the accent attrs. The really big arXMLiv's numbers are:

  • mover -- when it generates accent, it is always true. That's most of the time: 11.64% out of 11.89%
  • munder -- when it generates accentunder, it is always true. That's a smaller amount of time: .46% out of 2.24%
  • munderover -- when it generates accent and accentunder, it is always true. When one was true, the other was always true. Being true was rare: .03% out of 1.22%

For the smaller (but still substantial) DLMF dataset, there's a similar pattern (same generator):

  • mover -- 1050/1052 accent=true
  • munder -- 8/430 accentunder=true
  • munderover -- 0/2042 values true

So for this generator, we have a good indication that defaults for mover and munderover will work well. For munder, it will be wrong 20% of the time for arXiv, but only 2% of the time for DLMF. The first number is not great, but it's not awful.

These numbers are a great indication, but they come a single generator. Having data from a different generator would add a lot more validity to them.

dani31415 commented 5 years ago

Some statistics extracted from the MathType Web / WIRIS services. Note that some attributes are invalid as MathML but that's what the users tried to use with MathType Web services.

1905001 Expressions

12082445 instances
<mo
  xmlns = "http://www.w3.org/1998/Math/MathML"
  lspace = "mediummathspace" | "thinmathspace" | "? em" | "? pt" | "? px"
  form = "postfix" | "prefix" | "infix"
  stretchy = "false" | "true"
  linebreak = "newline" | "nobreak" | "goodbreak" | "badbreak"
  mathsize = "? px" | "? em" | "big" | "? %" | "? pt"
  separator = "true"
  mathvariant = "bold" | "bold-italic" | "italic" | "normal" | "double-struck" | "fraktur" | "\"italic\""
  linebreakstyle = "before" | "after"
  indentshift = "? em"
  mathcolor = "#??????"
  symmetric = "true"
  fence = "true" | "false"
  accent = "false" | "true"
  class = ...
  movablelimits = "true" | "false"
  mathbackground = "#??????"
  id = ...
  dir = "rtl"
  style = ...
  minsize = "? em" | "? %"
  background = "violet"
  rspace = "mediummathspace" | "? em" | "? pt" | "? px"
  largeop = "true"
  fontstyle = "normal"
  maxsize = "? em" | "? %" | "1"
>

10973191 instances
<mi
  mathbackground = "#??????"
  style = ...
  background = "violet"
  id = ...
  dir = "rtl"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  fontstyle = "normal" | "italic"
  mathsize = "? px" | "? %"
  title = ...
  mathcolor = "#??????"
  mathvariant = ...
>

6552976 instances
<mn
  style = "color:#ff0000" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#000000" | "font-size: 80%"
  mathbackground = "#??????"
  id = ...
  dir = "rtl"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML" | "http://www.w3.org/1999/xhtml"
  fontsize = "? px" | "20"
  bold-italic = ""
  mathsize = "? px" | "0.5" | "? %"
  title = ...
  wrs:positionable = "true"
  mathcolor = "#??????" | "red"
  mathvariant = "bold" | "italic" | "bold-italic" | "normal" | "double-struck" | "bold>1</mn> </mrow><mrow> <mi mathvariant="
>

2026141 instances
<mrow
  wrs:positionable = "true" | "false"
  dir = "rtl"
  style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#ff0000" | "color:#c83740"
  class = ...
  mathcolor = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

1777653 instances
<math
  width = "444"
  linebreak = "auto"
  xmls = "http://www.w3.org/1998/Math/MathML"
  mode = "inline" | "display"
  indentalign = "left" | "id" | "right"
  mathvariant = "italic"
  border = "1"
  class = ...
  displaystyle = "true"
  xmlns = ...
  indenttarget = "aaa1" | "aaa2"
  altimg = ...
  tex = "\Omega" | "{}^{2}" | "\boldsymbol{\mathsf{G_{max}}}"
  mathsize = "? em" | "? px" | "16px;" | "15px;" | "? pt" | "medium" | "17px;"
  xml:id = ...
  display = "block" | "inline" | "" | "block;" | "blockquote" | "inline-block"
  text = "Omega" | "^2" | "G _ max"
  mathcolor = "#??????" | "white" | "blue"
  http: = ""
  times = ""
  indentshiftfirst = "? em"
  title = ...
  displaystye = "true"
  id = ...
  float = "left"
  wrs:positionable = "false"
  alttext = ...
  overflow = "scroll" | "scale"
  style = ...
  scriptlevel = "-1"
  baseline = "-2.5"
  align = "center" | "left"
  indentshift = "? em"
  roman = ""
  dir = "rtl" | "\"rtl\""
>

1130582 instances
<mfrac
  style = ...
  id = ...
  linethickness = "0" | "? px" | "1" | "? pt"
  mpadded = "0"
  dir = "rtl"
  denomalign = "center"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  bevelled = "true" | "\"true\""
  title = ...
  numalign = "center"
  mathcolor = "#??????"
  mathvariant = "bold"
>

955930 instances
<msup
  mathsize = "? em"
  dir = "rtl"
  style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"
  class = ...
  mathcolor = "#??????" | "blue"
  title = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

685983 instances
<mspace
  id = ...
  width = "? em" | "- ? em" | "thickmathspace" | "negativethinmathspace" | "? px" | "? pt" | "thinmathspace" | "50" | "mediummathspace" | "? cm" | "? ex" | "3"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  depth = "? ex" | "? em"
  mathsize = "? px"
  linebreak = "newline" | "\"newline\"" | "\"newline\"/" | "././newline" | ""newline"" | "nobreak"
  height = "? em" | "? ex" | "? pt"
  mathcolor = "#??????"
  mathvariant = "bold" | "italic"
>

596815 instances
<msub
  class = ...
  mathcolor = "#??????"
  mathbackground = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

527555 instances
<mfenced
  style = ...
  id = ...
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  separators = "|" | "" | "?" | "|,"
  open = "[" | "{" | "|" | "" | "(" | "||" | "?" | "<" | "?" | "?" | "?" | "c" | " " | "¨{¨" | "¨|¨" | "?" | "a" | "]" | "open" | "{{lessthan}}" | "?" | "\"{\"" | ")" | "&#060;" | "¨||¨"
  openclosebrackets = ""
  columnspacing = "200 px;"
  wrs:valign = "middle-baseline" | "middle"
  close = "]" | "}" | "" | "|" | ">" | "||" | "?" | ")" | "?" | "?" | "?" | " " | "¨¨" | "¨|¨" | "?" | "[" | "{{greaterthan}}" | "?" | "\"}\"" | "&#062;" | "¨}¨" | "¨||¨"
  mathcolor = "#??????"
  mathvariant = "bold" | "normal" | "bold-italic"
>

438269 instances
<mtd
  columnalign = "left" | "center" | "right"
  class = ...
  columnspan = "1" | "3"
  id = ...
>

242575 instances
<msqrt
  dir = "rtl"
  mathcolor = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

229436 instances
<mtr
  mathsize = "small"
  columnalign = "left" | "right"
  class = ...
  mathbackground = "#??????"
  id = ...
>

209278 instances
<mstyle
  xmlns = "http://www.w3.org/1998/Math/MathML"
  fontweight = "bold"
  indentalign = "left" | "center" | "right"
  mathsize = "? px" | "? pt" | "? em" | "normal" | "? %" | "\"18px\"" | "24" | "18" | "14" | "38" | "8"
  encoding = "LaTeX"
  displaystyle = "true" | "false" | "\"false\"" | "Ã?"falseÃ?"Ã?" | "false''" | ""true"" | "font-family:'Times New Roman' true" | "" | "¨false¨"
  mathvariant = "italic" | "bold" | "normal" | "bold-italic" | "sans-serif" | "fraktur" | "script"
  mathcolor = "#??????" | "red" | "green" | "blue" | "black" | "Black" | "Green" | "DarkGreen"
  denomalign = "center"
  numalign = "center"
  class = ...
  mathbackground = "#??????"
  id = ...
  rowspacing = "? ex"
  scriptsizemultiplier = ".85"
  style = "font-family: 'Euclid Fraktur';font-weight: normal;font-style: normal;" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"
  scriptlevel = "0" | "+1" | "-1"
  fontfamily = "Palatino, serif" | "Palatino, serif;" | "serif"
  lineleading = "? ex"
>

107766 instances
<mover
  accent = "true" | "false"
  wrs:positionable = "false"
  class = ...
  mathcolor = "#??????"
  mathbackground = "#??????"
  id = ...
  align = "center"
>

98094 instances
<mtext
  style = "border-color: black" | "font-size: larger;"
  mathbackground = "#??????"
  id = ...
  dir = "rtl"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  mathsize = "? pt" | "? px"
  xml:lang = "es"
  label = "unit"
  mathcolor = "#??????" | "0d87c5"
  matcholor = "#??????"
  mathvariant = "bold" | "bold-italic" | "double-struck" | "italic" | "normal" | "script"
>

96791 instances
<mtable
  columnalign = ...
  mathsize = "? px"
  displaystyle = "true" | "false"
  wrs:columnalign = "relation" | "center center relation" | "relation center left" | "relation relation relation" | "center relation center" | "relation center relation relation" | "relation center relation" | "center relation"
  mathcolor = "#??????"
  columnspacing = ...
  columnlines = ...
  class = ...
  frame = "solid" | "none" | "dashed"
  equalcolumns = "true" | "false"
  rowalign = ...
  id = ...
  width = "? %"
  rowspacing = ...
  align = "center" | "axis" | "right" | "axis 3"
  style = "text-align:axis;" | "" | "text-align: axis;" | "display: block; margin-top: 1.0em; margin-bottom: 2.0em" | "text-align:axis"
  equalrows = "true" | "false"
  fontsize = "? px"
  rowlines = ...
  columnwidth = "auto fit"
>

55664 instances
<menclose
  border = "1"
  notation = ...
  class = ...
  mathcolor = "#??????" | "#ff000"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  align = "center"
>

43909 instances
<msubsup
  class = ...
  mathcolor = "#??????"
  id = ...
>

36164 instances
<mroot
  mathcolor = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

32125 instances
<munder
  wrs:positionable = "false"
  underaccent = "false"
  accentunder = "false" | "true"
  class = ...
  mathcolor = "#??????"
  id = ...
>

26913 instances
<semantics
  style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"
  id = ...
>

13441 instances
<mmultiscripts
  mathcolor = "#??????"
>

13273 instances
<mprescripts>

12801 instances
<munderover
  accent = "false"
  accentunder = "false"
  mathcolor = "#??????"
>

7094 instances
<msrow>

2781 instances
<maction
  actiontype = "argument" | "\"argument\"" | "argumentvalue"
  mathcolor = "#??????"
>

2344 instances
<msline
  position = "2" | "1" | "3" | "4" | "6"
  length = "2" | "3" | "6" | "5" | "4" | "1" | "14"
  mathcolor = "#??????"
>

2169 instances
<mstack
  charspacing = "? px"
  mathcolor = "#??????"
  stackalign = "right"
  charalign = "center"
>

1446 instances
<mlongdiv
  longdivstyle = "shortstackedrightright"
  charspacing = "? px"
  mathcolor = "#??????"
  stackalign = "left"
  charalign = "center"
>

1423 instances
<msgroup>

883 instances
<mpadded
  height = "? pt"
  lspace = "- ? px" | "+ ? px"
  voffset = "+ ? px" | "- ? px"
  width = "+ ? pt" | "0"
  voffsett = "- ? em"
  mathcolor = "#??????"
  depth = "? pt"
>

252 instances
<maligngroup
  class = ...
>

197 instances
<mphantom
  font-style = "normal"
>

174 instances
<malignmark>

88 instances
<mlabeledtr>

49 instances
<ms
  mathcolor = "#??????"
>

15 instances
<merror
  class = ...
>

4 instances
<mscarries
  location = "nw" | "s"
>

3 instances
<matrixrow>
NSoiffer commented 5 years ago

Dani,

Is it possible to find out how many (if any) of the mfracs in your data have bevelled=true?

The other stat that I would like to know is how often accent and accentunder get used, implicitly and explicitly. By implicitly, I mean how often the 2nd/3rd arg is one of the chars in the operator dictionary that are marked as accents (typically horizontal arrows, brackets/braces, and ASCII and punctuation chars); you could use ranges for the arrows and pick out the ASCII and a few others to get a good approximation. Is it possible to get that data? We have some from the LaTeX side, but your data is probably from a different population.

On Mon, Apr 29, 2019 at 10:45 AM Daniel Marques notifications@github.com wrote:

Some statistics extracted from the MathType Web / WIRIS services. Note that some attributes are invalid as MathML but that's what the users tried to use with MathType Web services.

1905001 Expressions

12082445 instances

<mo

xmlns = "http://www.w3.org/1998/Math/MathML"

lspace = "mediummathspace" | "thinmathspace" | "? em" | "? pt" | "? px"

form = "postfix" | "prefix" | "infix"

stretchy = "false" | "true"

linebreak = "newline" | "nobreak" | "goodbreak" | "badbreak"

mathsize = "? px" | "? em" | "big" | "? %" | "? pt"

separator = "true"

mathvariant = "bold" | "bold-italic" | "italic" | "normal" | "double-struck" | "fraktur" | "\"italic\""

linebreakstyle = "before" | "after"

indentshift = "? em"

mathcolor = "#??????"

symmetric = "true"

fence = "true" | "false"

accent = "false" | "true"

class = ...

movablelimits = "true" | "false"

mathbackground = "#??????"

id = ...

dir = "rtl"

style = ...

minsize = "? em" | "? %"

background = "violet"

rspace = "mediummathspace" | "? em" | "? pt" | "? px"

largeop = "true"

fontstyle = "normal"

maxsize = "? em" | "? %" | "1"

10973191 instances

<mi

mathbackground = "#??????"

style = ...

background = "violet"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

fontstyle = "normal" | "italic"

mathsize = "? px" | "? %"

title = ...

mathcolor = "#??????"

mathvariant = ...

6552976 instances

<mn

style = "color:#ff0000" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#000000" | "font-size: 80%"

mathbackground = "#??????"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML" | "http://www.w3.org/1999/xhtml"

fontsize = "? px" | "20"

bold-italic = ""

mathsize = "? px" | "0.5" | "? %"

title = ...

wrs:positionable = "true"

mathcolor = "#??????" | "red"

mathvariant = "bold" | "italic" | "bold-italic" | "normal" | "double-struck" | "bold>1 <mi mathvariant="

2026141 instances

<mrow

wrs:positionable = "true" | "false"

dir = "rtl"

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#ff0000" | "color:#c83740"

class = ...

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

1777653 instances

<math

width = "444"

linebreak = "auto"

xmls = "http://www.w3.org/1998/Math/MathML"

mode = "inline" | "display"

indentalign = "left" | "id" | "right"

mathvariant = "italic"

border = "1"

class = ...

displaystyle = "true"

xmlns = ...

indenttarget = "aaa1" | "aaa2"

altimg = ...

tex = "\Omega" | "{}^{2}" | "\boldsymbol{\mathsf{G_{max}}}"

mathsize = "? em" | "? px" | "16px;" | "15px;" | "? pt" | "medium" | "17px;"

xml:id = ...

display = "block" | "inline" | "" | "block;" | "blockquote" | "inline-block"

text = "Omega" | "^2" | "G _ max"

mathcolor = "#??????" | "white" | "blue"

http: = ""

times = ""

indentshiftfirst = "? em"

title = ...

displaystye = "true"

id = ...

float = "left"

wrs:positionable = "false"

alttext = ...

overflow = "scroll" | "scale"

style = ...

scriptlevel = "-1"

baseline = "-2.5"

align = "center" | "left"

indentshift = "? em"

roman = ""

dir = "rtl" | "\"rtl\""

1130582 instances

<mfrac

style = ...

id = ...

linethickness = "0" | "? px" | "1" | "? pt"

mpadded = "0"

dir = "rtl"

denomalign = "center"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

bevelled = "true" | "\"true\""

title = ...

numalign = "center"

mathcolor = "#??????"

mathvariant = "bold"

955930 instances

<msup

mathsize = "? em"

dir = "rtl"

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

class = ...

mathcolor = "#??????" | "blue"

title = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

685983 instances

<mspace

id = ...

width = "? em" | "- ? em" | "thickmathspace" | "negativethinmathspace" | "? px" | "? pt" | "thinmathspace" | "50" | "mediummathspace" | "? cm" | "? ex" | "3"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

depth = "? ex" | "? em"

mathsize = "? px"

linebreak = "newline" | "\"newline\"" | "\"newline\"/" | "././newline" | ""newline"" | "nobreak"

height = "? em" | "? ex" | "? pt"

mathcolor = "#??????"

mathvariant = "bold" | "italic"

596815 instances

<msub

class = ...

mathcolor = "#??????"

mathbackground = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

527555 instances

<mfenced

style = ...

id = ...

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

separators = "|" | "" | "?" | "|,"

open = "[" | "{" | "|" | "" | "(" | "||" | "?" | "<" | "?" | "?" | "?" | "c" | " " | "¨{¨" | "¨|¨" | "?" | "a" | "]" | "open" | "{{lessthan}}" | "?" | "\"{\"" | ")" | "<" | "¨||¨"

openclosebrackets = ""

columnspacing = "200 px;"

wrs:valign = "middle-baseline" | "middle"

close = "]" | "}" | "" | "|" | ">" | "||" | "?" | ")" | "?" | "?" | "?" | " " | "¨¨" | "¨|¨" | "?" | "[" | "{{greaterthan}}" | "?" | "\"}\"" | ">" | "¨}¨" | "¨||¨"

mathcolor = "#??????"

mathvariant = "bold" | "normal" | "bold-italic"

438269 instances

<mtd

columnalign = "left" | "center" | "right"

class = ...

columnspan = "1" | "3"

id = ...

242575 instances

<msqrt

dir = "rtl"

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

229436 instances

<mtr

mathsize = "small"

columnalign = "left" | "right"

class = ...

mathbackground = "#??????"

id = ...

209278 instances

<mstyle

xmlns = "http://www.w3.org/1998/Math/MathML"

fontweight = "bold"

indentalign = "left" | "center" | "right"

mathsize = "? px" | "? pt" | "? em" | "normal" | "? %" | "\"18px\"" | "24" | "18" | "14" | "38" | "8"

encoding = "LaTeX"

displaystyle = "true" | "false" | "\"false\"" | "Ã?"falseÃ?"Ã?" | "false''" | ""true"" | "font-family:'Times New Roman' true" | "" | "¨false¨"

mathvariant = "italic" | "bold" | "normal" | "bold-italic" | "sans-serif" | "fraktur" | "script"

mathcolor = "#??????" | "red" | "green" | "blue" | "black" | "Black" | "Green" | "DarkGreen"

denomalign = "center"

numalign = "center"

class = ...

mathbackground = "#??????"

id = ...

rowspacing = "? ex"

scriptsizemultiplier = ".85"

style = "font-family: 'Euclid Fraktur';font-weight: normal;font-style: normal;" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

scriptlevel = "0" | "+1" | "-1"

fontfamily = "Palatino, serif" | "Palatino, serif;" | "serif"

lineleading = "? ex"

107766 instances

<mover

accent = "true" | "false"

wrs:positionable = "false"

class = ...

mathcolor = "#??????"

mathbackground = "#??????"

id = ...

align = "center"

98094 instances

<mtext

style = "border-color: black" | "font-size: larger;"

mathbackground = "#??????"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

mathsize = "? pt" | "? px"

xml:lang = "es"

label = "unit"

mathcolor = "#??????" | "0d87c5"

matcholor = "#??????"

mathvariant = "bold" | "bold-italic" | "double-struck" | "italic" | "normal" | "script"

96791 instances

<mtable

columnalign = ...

mathsize = "? px"

displaystyle = "true" | "false"

wrs:columnalign = "relation" | "center center relation" | "relation center left" | "relation relation relation" | "center relation center" | "relation center relation relation" | "relation center relation" | "center relation"

mathcolor = "#??????"

columnspacing = ...

columnlines = ...

class = ...

frame = "solid" | "none" | "dashed"

equalcolumns = "true" | "false"

rowalign = ...

id = ...

width = "? %"

rowspacing = ...

align = "center" | "axis" | "right" | "axis 3"

style = "text-align:axis;" | "" | "text-align: axis;" | "display: block; margin-top: 1.0em; margin-bottom: 2.0em" | "text-align:axis"

equalrows = "true" | "false"

fontsize = "? px"

rowlines = ...

columnwidth = "auto fit"

55664 instances

<menclose

border = "1"

notation = ...

class = ...

mathcolor = "#??????" | "#ff000"

xmlns = "http://www.w3.org/1998/Math/MathML"

align = "center"

43909 instances

<msubsup

class = ...

mathcolor = "#??????"

id = ...

36164 instances

<mroot

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

32125 instances

<munder

wrs:positionable = "false"

underaccent = "false"

accentunder = "false" | "true"

class = ...

mathcolor = "#??????"

id = ...

26913 instances

<semantics

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

id = ...

13441 instances

<mmultiscripts

mathcolor = "#??????"

13273 instances

12801 instances 7094 instances 2781 instances 2344 instances 2169 instances 1446 instances 1423 instances 883 instances 252 instances 197 instances 174 instances 88 instances 49 instances 15 instances 4 instances 3 instances — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .
NSoiffer commented 5 years ago

Hi Neil,

17077 bevelled="true" for a total of 1130582 4254 accent="true" for a total of 107766 (with explicit accent attribute but perhaps unnecessary according to symbol dictionary).

There are no instances of munder and munderover with either accent or accentunder. I've not checked the mo with accent=true that are immediate children of a mover, munder or munderover.

Dani

On Tue, Apr 30, 2019 at 8:12 PM Neil Soiffer soiffer@alum.mit.edu wrote:

Dani,

Is it possible to find out how many (if any) of the mfracs in your data have bevelled=true?

The other stat that I would like to know is how often accent and accentunder get used, implicitly and explicitly. By implicitly, I mean how often the 2nd/3rd arg is one of the chars in the operator dictionary that are marked as accents (typically horizontal arrows, brackets/braces, and ASCII and punctuation chars); you could use ranges for the arrows and pick out the ASCII and a few others to get a good approximation. Is it possible to get that data? We have some from the LaTeX side, but your data is probably from a different population.

On Mon, Apr 29, 2019 at 10:45 AM Daniel Marques notifications@github.com wrote:

Some statistics extracted from the MathType Web / WIRIS services. Note that some attributes are invalid as MathML but that's what the users tried to use with MathType Web services.

1905001 Expressions

12082445 instances

<mo

xmlns = "http://www.w3.org/1998/Math/MathML"

lspace = "mediummathspace" | "thinmathspace" | "? em" | "? pt" | "? px"

form = "postfix" | "prefix" | "infix"

stretchy = "false" | "true"

linebreak = "newline" | "nobreak" | "goodbreak" | "badbreak"

mathsize = "? px" | "? em" | "big" | "? %" | "? pt"

separator = "true"

mathvariant = "bold" | "bold-italic" | "italic" | "normal" | "double-struck" | "fraktur" | "\"italic\""

linebreakstyle = "before" | "after"

indentshift = "? em"

mathcolor = "#??????"

symmetric = "true"

fence = "true" | "false"

accent = "false" | "true"

class = ...

movablelimits = "true" | "false"

mathbackground = "#??????"

id = ...

dir = "rtl"

style = ...

minsize = "? em" | "? %"

background = "violet"

rspace = "mediummathspace" | "? em" | "? pt" | "? px"

largeop = "true"

fontstyle = "normal"

maxsize = "? em" | "? %" | "1"

10973191 instances

<mi

mathbackground = "#??????"

style = ...

background = "violet"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

fontstyle = "normal" | "italic"

mathsize = "? px" | "? %"

title = ...

mathcolor = "#??????"

mathvariant = ...

6552976 instances

<mn

style = "color:#ff0000" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#000000" | "font-size: 80%"

mathbackground = "#??????"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML" | "http://www.w3.org/1999/xhtml"

fontsize = "? px" | "20"

bold-italic = ""

mathsize = "? px" | "0.5" | "? %"

title = ...

wrs:positionable = "true"

mathcolor = "#??????" | "red"

mathvariant = "bold" | "italic" | "bold-italic" | "normal" | "double-struck" | "bold>1 <mi mathvariant="

2026141 instances

<mrow

wrs:positionable = "true" | "false"

dir = "rtl"

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#ff0000" | "color:#c83740"

class = ...

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

1777653 instances

<math

width = "444"

linebreak = "auto"

xmls = "http://www.w3.org/1998/Math/MathML"

mode = "inline" | "display"

indentalign = "left" | "id" | "right"

mathvariant = "italic"

border = "1"

class = ...

displaystyle = "true"

xmlns = ...

indenttarget = "aaa1" | "aaa2"

altimg = ...

tex = "\Omega" | "{}^{2}" | "\boldsymbol{\mathsf{G_{max}}}"

mathsize = "? em" | "? px" | "16px;" | "15px;" | "? pt" | "medium" | "17px;"

xml:id = ...

display = "block" | "inline" | "" | "block;" | "blockquote" | "inline-block"

text = "Omega" | "^2" | "G _ max"

mathcolor = "#??????" | "white" | "blue"

http: = ""

times = ""

indentshiftfirst = "? em"

title = ...

displaystye = "true"

id = ...

float = "left"

wrs:positionable = "false"

alttext = ...

overflow = "scroll" | "scale"

style = ...

scriptlevel = "-1"

baseline = "-2.5"

align = "center" | "left"

indentshift = "? em"

roman = ""

dir = "rtl" | "\"rtl\""

1130582 instances

<mfrac

style = ...

id = ...

linethickness = "0" | "? px" | "1" | "? pt"

mpadded = "0"

dir = "rtl"

denomalign = "center"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

bevelled = "true" | "\"true\""

title = ...

numalign = "center"

mathcolor = "#??????"

mathvariant = "bold"

955930 instances

<msup

mathsize = "? em"

dir = "rtl"

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

class = ...

mathcolor = "#??????" | "blue"

title = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

685983 instances

<mspace

id = ...

width = "? em" | "- ? em" | "thickmathspace" | "negativethinmathspace" | "? px" | "? pt" | "thinmathspace" | "50" | "mediummathspace" | "? cm" | "? ex" | "3"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

depth = "? ex" | "? em"

mathsize = "? px"

linebreak = "newline" | "\"newline\"" | "\"newline\"/" | "././newline" | ""newline"" | "nobreak"

height = "? em" | "? ex" | "? pt"

mathcolor = "#??????"

mathvariant = "bold" | "italic"

596815 instances

<msub

class = ...

mathcolor = "#??????"

mathbackground = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

527555 instances

<mfenced

style = ...

id = ...

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

separators = "|" | "" | "?" | "|,"

open = "[" | "{" | "|" | "" | "(" | "||" | "?" | "<" | "?" | "?" | "?" | "c" | " " | "¨{¨" | "¨|¨" | "?" | "a" | "]" | "open" | "{{lessthan}}" | "?" | "\"{\"" | ")" | "<" | "¨||¨"

openclosebrackets = ""

columnspacing = "200 px;"

wrs:valign = "middle-baseline" | "middle"

close = "]" | "}" | "" | "|" | ">" | "||" | "?" | ")" | "?" | "?" | "?" | " " | "¨¨" | "¨|¨" | "?" | "[" | "{{greaterthan}}" | "?" | "\"}\"" | ">" | "¨}¨" | "¨||¨"

mathcolor = "#??????"

mathvariant = "bold" | "normal" | "bold-italic"

438269 instances

<mtd

columnalign = "left" | "center" | "right"

class = ...

columnspan = "1" | "3"

id = ...

242575 instances

<msqrt

dir = "rtl"

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

229436 instances

<mtr

mathsize = "small"

columnalign = "left" | "right"

class = ...

mathbackground = "#??????"

id = ...

209278 instances

<mstyle

xmlns = "http://www.w3.org/1998/Math/MathML"

fontweight = "bold"

indentalign = "left" | "center" | "right"

mathsize = "? px" | "? pt" | "? em" | "normal" | "? %" | "\"18px\"" | "24" | "18" | "14" | "38" | "8"

encoding = "LaTeX"

displaystyle = "true" | "false" | "\"false\"" | "Ã?"falseÃ?"Ã?" | "false''" | ""true"" | "font-family:'Times New Roman' true" | "" | "¨false¨"

mathvariant = "italic" | "bold" | "normal" | "bold-italic" | "sans-serif" | "fraktur" | "script"

mathcolor = "#??????" | "red" | "green" | "blue" | "black" | "Black" | "Green" | "DarkGreen"

denomalign = "center"

numalign = "center"

class = ...

mathbackground = "#??????"

id = ...

rowspacing = "? ex"

scriptsizemultiplier = ".85"

style = "font-family: 'Euclid Fraktur';font-weight: normal;font-style: normal;" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

scriptlevel = "0" | "+1" | "-1"

fontfamily = "Palatino, serif" | "Palatino, serif;" | "serif"

lineleading = "? ex"

107766 instances

<mover

accent = "true" | "false"

wrs:positionable = "false"

class = ...

mathcolor = "#??????"

mathbackground = "#??????"

id = ...

align = "center"

98094 instances

<mtext

style = "border-color: black" | "font-size: larger;"

mathbackground = "#??????"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

mathsize = "? pt" | "? px"

xml:lang = "es"

label = "unit"

mathcolor = "#??????" | "0d87c5"

matcholor = "#??????"

mathvariant = "bold" | "bold-italic" | "double-struck" | "italic" | "normal" | "script"

96791 instances

<mtable

columnalign = ...

mathsize = "? px"

displaystyle = "true" | "false"

wrs:columnalign = "relation" | "center center relation" | "relation center left" | "relation relation relation" | "center relation center" | "relation center relation relation" | "relation center relation" | "center relation"

mathcolor = "#??????"

columnspacing = ...

columnlines = ...

class = ...

frame = "solid" | "none" | "dashed"

equalcolumns = "true" | "false"

rowalign = ...

id = ...

width = "? %"

rowspacing = ...

align = "center" | "axis" | "right" | "axis 3"

style = "text-align:axis;" | "" | "text-align: axis;" | "display: block; margin-top: 1.0em; margin-bottom: 2.0em" | "text-align:axis"

equalrows = "true" | "false"

fontsize = "? px"

rowlines = ...

columnwidth = "auto fit"

55664 instances

<menclose

border = "1"

notation = ...

class = ...

mathcolor = "#??????" | "#ff000"

xmlns = "http://www.w3.org/1998/Math/MathML"

align = "center"

43909 instances

<msubsup

class = ...

mathcolor = "#??????"

id = ...

36164 instances

<mroot

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

32125 instances

<munder

wrs:positionable = "false"

underaccent = "false"

accentunder = "false" | "true"

class = ...

mathcolor = "#??????"

id = ...

26913 instances

<semantics

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

id = ...

13441 instances

<mmultiscripts

mathcolor = "#??????"

13273 instances

12801 instances 7094 instances 2781 instances 2344 instances 2169 instances 1446 instances 1423 instances 883 instances 252 instances 197 instances 174 instances 88 instances 49 instances 15 instances 4 instances 3 instances — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .

--

MathType 7 is out! Check the new version at wiris.com/mathtype http://www.wiris.com/mathtype?utm_source=emailfooter

NSoiffer commented 5 years ago

Hi Neil,

I update the statistics and I correct my previous email:

bevelled=true for mfrac count: 17077 over 1130582

accent=true for mover count: 4254 over 107766 accent=false for mover count: 30 over 107766

accent=true for munderover count: 0 over 32125 accent=false for munderover count: 4 over 32125

accentunder=true for munder count: 380 over 12801 accentunder=true for munderover count: 0 over 12801 accentunder=false for munder count: 7 over 12801 accentunder=false for munderover count: 4 over 12801

Dani

On Mon, May 6, 2019 at 7:57 PM Daniel Marques dani@wiris.com wrote:

Hi Neil,

17077 bevelled="true" for a total of 1130582 4254 accent="true" for a total of 107766 (with explicit accent attribute but perhaps unnecessary according to symbol dictionary).

There are no instances of munder and munderover with either accent or accentunder. I've not checked the mo with accent=true that are immediate children of a mover, munder or munderover.

Dani

On Tue, Apr 30, 2019 at 8:12 PM Neil Soiffer soiffer@alum.mit.edu wrote:

Dani,

Is it possible to find out how many (if any) of the mfracs in your data have bevelled=true?

The other stat that I would like to know is how often accent and accentunder get used, implicitly and explicitly. By implicitly, I mean how often the 2nd/3rd arg is one of the chars in the operator dictionary that are marked as accents (typically horizontal arrows, brackets/braces, and ASCII and punctuation chars); you could use ranges for the arrows and pick out the ASCII and a few others to get a good approximation. Is it possible to get that data? We have some from the LaTeX side, but your data is probably from a different population.

On Mon, Apr 29, 2019 at 10:45 AM Daniel Marques notifications@github.com wrote:

Some statistics extracted from the MathType Web / WIRIS services. Note that some attributes are invalid as MathML but that's what the users tried to use with MathType Web services.

1905001 Expressions

12082445 instances

<mo

xmlns = "http://www.w3.org/1998/Math/MathML"

lspace = "mediummathspace" | "thinmathspace" | "? em" | "? pt" | "? px"

form = "postfix" | "prefix" | "infix"

stretchy = "false" | "true"

linebreak = "newline" | "nobreak" | "goodbreak" | "badbreak"

mathsize = "? px" | "? em" | "big" | "? %" | "? pt"

separator = "true"

mathvariant = "bold" | "bold-italic" | "italic" | "normal" | "double-struck" | "fraktur" | "\"italic\""

linebreakstyle = "before" | "after"

indentshift = "? em"

mathcolor = "#??????"

symmetric = "true"

fence = "true" | "false"

accent = "false" | "true"

class = ...

movablelimits = "true" | "false"

mathbackground = "#??????"

id = ...

dir = "rtl"

style = ...

minsize = "? em" | "? %"

background = "violet"

rspace = "mediummathspace" | "? em" | "? pt" | "? px"

largeop = "true"

fontstyle = "normal"

maxsize = "? em" | "? %" | "1"

10973191 instances

<mi

mathbackground = "#??????"

style = ...

background = "violet"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

fontstyle = "normal" | "italic"

mathsize = "? px" | "? %"

title = ...

mathcolor = "#??????"

mathvariant = ...

6552976 instances

<mn

style = "color:#ff0000" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#000000" | "font-size: 80%"

mathbackground = "#??????"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML" | "http://www.w3.org/1999/xhtml"

fontsize = "? px" | "20"

bold-italic = ""

mathsize = "? px" | "0.5" | "? %"

title = ...

wrs:positionable = "true"

mathcolor = "#??????" | "red"

mathvariant = "bold" | "italic" | "bold-italic" | "normal" | "double-struck" | "bold>1 <mi mathvariant="

2026141 instances

<mrow

wrs:positionable = "true" | "false"

dir = "rtl"

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#ff0000" | "color:#c83740"

class = ...

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

1777653 instances

<math

width = "444"

linebreak = "auto"

xmls = "http://www.w3.org/1998/Math/MathML"

mode = "inline" | "display"

indentalign = "left" | "id" | "right"

mathvariant = "italic"

border = "1"

class = ...

displaystyle = "true"

xmlns = ...

indenttarget = "aaa1" | "aaa2"

altimg = ...

tex = "\Omega" | "{}^{2}" | "\boldsymbol{\mathsf{G_{max}}}"

mathsize = "? em" | "? px" | "16px;" | "15px;" | "? pt" | "medium" | "17px;"

xml:id = ...

display = "block" | "inline" | "" | "block;" | "blockquote" | "inline-block"

text = "Omega" | "^2" | "G _ max"

mathcolor = "#??????" | "white" | "blue"

http: = ""

times = ""

indentshiftfirst = "? em"

title = ...

displaystye = "true"

id = ...

float = "left"

wrs:positionable = "false"

alttext = ...

overflow = "scroll" | "scale"

style = ...

scriptlevel = "-1"

baseline = "-2.5"

align = "center" | "left"

indentshift = "? em"

roman = ""

dir = "rtl" | "\"rtl\""

1130582 instances

<mfrac

style = ...

id = ...

linethickness = "0" | "? px" | "1" | "? pt"

mpadded = "0"

dir = "rtl"

denomalign = "center"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

bevelled = "true" | "\"true\""

title = ...

numalign = "center"

mathcolor = "#??????"

mathvariant = "bold"

955930 instances

<msup

mathsize = "? em"

dir = "rtl"

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

class = ...

mathcolor = "#??????" | "blue"

title = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

685983 instances

<mspace

id = ...

width = "? em" | "- ? em" | "thickmathspace" | "negativethinmathspace" | "? px" | "? pt" | "thinmathspace" | "50" | "mediummathspace" | "? cm" | "? ex" | "3"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

depth = "? ex" | "? em"

mathsize = "? px"

linebreak = "newline" | "\"newline\"" | "\"newline\"/" | "././newline" | ""newline"" | "nobreak"

height = "? em" | "? ex" | "? pt"

mathcolor = "#??????"

mathvariant = "bold" | "italic"

596815 instances

<msub

class = ...

mathcolor = "#??????"

mathbackground = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

527555 instances

<mfenced

style = ...

id = ...

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

separators = "|" | "" | "?" | "|,"

open = "[" | "{" | "|" | "" | "(" | "||" | "?" | "<" | "?" | "?" | "?" | "c" | " " | "¨{¨" | "¨|¨" | "?" | "a" | "]" | "open" | "{{lessthan}}" | "?" | "\"{\"" | ")" | "<" | "¨||¨"

openclosebrackets = ""

columnspacing = "200 px;"

wrs:valign = "middle-baseline" | "middle"

close = "]" | "}" | "" | "|" | ">" | "||" | "?" | ")" | "?" | "?" | "?" | " " | "¨¨" | "¨|¨" | "?" | "[" | "{{greaterthan}}" | "?" | "\"}\"" | ">" | "¨}¨" | "¨||¨"

mathcolor = "#??????"

mathvariant = "bold" | "normal" | "bold-italic"

438269 instances

<mtd

columnalign = "left" | "center" | "right"

class = ...

columnspan = "1" | "3"

id = ...

242575 instances

<msqrt

dir = "rtl"

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

229436 instances

<mtr

mathsize = "small"

columnalign = "left" | "right"

class = ...

mathbackground = "#??????"

id = ...

209278 instances

<mstyle

xmlns = "http://www.w3.org/1998/Math/MathML"

fontweight = "bold"

indentalign = "left" | "center" | "right"

mathsize = "? px" | "? pt" | "? em" | "normal" | "? %" | "\"18px\"" | "24" | "18" | "14" | "38" | "8"

encoding = "LaTeX"

displaystyle = "true" | "false" | "\"false\"" | "Ã?"falseÃ?"Ã?" | "false''" | ""true"" | "font-family:'Times New Roman' true" | "" | "¨false¨"

mathvariant = "italic" | "bold" | "normal" | "bold-italic" | "sans-serif" | "fraktur" | "script"

mathcolor = "#??????" | "red" | "green" | "blue" | "black" | "Black" | "Green" | "DarkGreen"

denomalign = "center"

numalign = "center"

class = ...

mathbackground = "#??????"

id = ...

rowspacing = "? ex"

scriptsizemultiplier = ".85"

style = "font-family: 'Euclid Fraktur';font-weight: normal;font-style: normal;" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

scriptlevel = "0" | "+1" | "-1"

fontfamily = "Palatino, serif" | "Palatino, serif;" | "serif"

lineleading = "? ex"

107766 instances

<mover

accent = "true" | "false"

wrs:positionable = "false"

class = ...

mathcolor = "#??????"

mathbackground = "#??????"

id = ...

align = "center"

98094 instances

<mtext

style = "border-color: black" | "font-size: larger;"

mathbackground = "#??????"

id = ...

dir = "rtl"

class = ...

xmlns = "http://www.w3.org/1998/Math/MathML"

mathsize = "? pt" | "? px"

xml:lang = "es"

label = "unit"

mathcolor = "#??????" | "0d87c5"

matcholor = "#??????"

mathvariant = "bold" | "bold-italic" | "double-struck" | "italic" | "normal" | "script"

96791 instances

<mtable

columnalign = ...

mathsize = "? px"

displaystyle = "true" | "false"

wrs:columnalign = "relation" | "center center relation" | "relation center left" | "relation relation relation" | "center relation center" | "relation center relation relation" | "relation center relation" | "center relation"

mathcolor = "#??????"

columnspacing = ...

columnlines = ...

class = ...

frame = "solid" | "none" | "dashed"

equalcolumns = "true" | "false"

rowalign = ...

id = ...

width = "? %"

rowspacing = ...

align = "center" | "axis" | "right" | "axis 3"

style = "text-align:axis;" | "" | "text-align: axis;" | "display: block; margin-top: 1.0em; margin-bottom: 2.0em" | "text-align:axis"

equalrows = "true" | "false"

fontsize = "? px"

rowlines = ...

columnwidth = "auto fit"

55664 instances

<menclose

border = "1"

notation = ...

class = ...

mathcolor = "#??????" | "#ff000"

xmlns = "http://www.w3.org/1998/Math/MathML"

align = "center"

43909 instances

<msubsup

class = ...

mathcolor = "#??????"

id = ...

36164 instances

<mroot

mathcolor = "#??????"

xmlns = "http://www.w3.org/1998/Math/MathML"

id = ...

32125 instances

<munder

wrs:positionable = "false"

underaccent = "false"

accentunder = "false" | "true"

class = ...

mathcolor = "#??????"

id = ...

26913 instances

<semantics

style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"

id = ...

13441 instances

<mmultiscripts

mathcolor = "#??????"

13273 instances

12801 instances 7094 instances 2781 instances 2344 instances 2169 instances 1446 instances 1423 instances 883 instances 252 instances 197 instances 174 instances 88 instances 49 instances 15 instances 4 instances 3 instances — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .

--

MathType 7 is out! Check the new version at wiris.com/mathtype http://www.wiris.com/mathtype?utm_source=emailfooter

fred-wang commented 5 years ago

I had put this on github but I believe it would be really nice to have a better process to collect the replies of the survey, to provide a page to present the results in a consistent way and to allow us to update the questions.

runarberg commented 4 years ago

I’m a bit late in the game here, but I’m the author of Mathup (npm; GitHub)—an authoring library that transforms an AsciiMath-like syntax into MathML. I had pretty much abandoned the project but there are still a few user using it (mostly in their custom browser based notebooks where they are taking on the fly math notes). I revisited the project last month and am planning a complete rewrite. Below are my answers to the survey:

  1. Description: Ascii2MathML—an AsciiMath-like to MathML converter
  2. Native MathML: Yes. See website. Future versions plan to offer a .toString(), .toDOM(), and .toVirtualDOM() options all in MathML.
  3. MathML Elements: annotation, math, menclose, mfenced, mfrac, mi, mn, mo, mover, mroot, mrow, msqrt, msub, msubsup, msup, mtable, mtd mtr, munder, munderover, and semantics.
  4. MathML Attributes: mathvariant, bevelled, veryverythickmathspace.
  5. Attributes on the mstyle Element: None. The tool does not use the mstyle element at all.
  6. Attribute Values: lspace and rspace with a value of veryverythickmathspace.
  7. Trailing/Leading Whitespace in Token Elements: No.

Note that in my current rewrite I plan to drop deprecated element and attributes (such as mfenced).

davidcarlisle commented 2 years ago

this survey proved useful, but is now completed, closing.