w3c / mathml-core

MathML Core draft
https://w3c.github.io/mathml-core
35 stars 14 forks source link

TeX-like spacing rules #141

Open fred-wang opened 5 years ago

fred-wang commented 5 years ago

"Follow-up of w3c/mathml#30 (closed), this is an entry to discuss spacing rules:

For now, the implementation note only says that only operator lspace/rspace and italic correction adds spacing in row-like elements: http://www.mathml-association.org/MathMLinHTML5/S3.html#SS3.SSS1

TeX spacing is mentioned here: http://www.mathml-association.org/MathMLinHTML5/S2.html#SS2.SSS2

Gecko's non-standard rules: http://hg.mozilla.org/mozilla-central/file/tip/layout/mathml/nsMathMLContainerFrame.cpp#l1058 " original report: https://gitlab.com/mathml/MathMLinHTML5/issues/32

fred-wang commented 5 years ago

I plan to remove the LaTeX section and move the “cramped” description in a new embellished operator section.

The rest of the section is about the LaTeX spacing, I'm going to remove it. Just copying it here for the record:

To implement math spacing, the TeXBook defines eight basic types (Ord for ordinary atoms, Op for large operators, Bin for binary operations, Rel for relations, Open for opening fences, Close for closing fences, Punct for punctuations and Inner for a delimited subformula) and define an inter space for each pair of such types. In the present document, we only follow the spacing algorithm of MathML 3: by default the inter space is always zero and the spacing is produced by spacing elements like mspace, mphantom or mpadded or by the leading and trailing space around embellished operators.

NSoiffer commented 4 years ago

Perhaps a separate issue, but since this fits under the TeX-like spacing rules, I'm putting it here. The spec currently uses the word may, which is something we should get rid of for consistency among renderers:

Some renderers may wish to use no spacing for most operators appearing in scripts (i.e. when scriptlevel is greater than 0; see Section 3.3.4 Style Change <mstyle>), as is the case in TEX.

Firefox makes a compromise that for scriptlevel>0, lspace and rspace are divided by 2.

fred-wang commented 4 years ago

Perhaps a separate issue, but since this fits under the TeX-like spacing rules, I'm putting it here. The spec currently uses the word may, which is something we should get rid of for consistency among renderers:

Some renderers may wish to use no spacing for most operators appearing in scripts (i.e. when scriptlevel is greater than 0; see Section 3.3.4 Style Change <mstyle>), as is the case in TEX.

where is it in the spec?

Firefox makes a compromise that for scriptlevel>0, lspace and rspace are divided by 2.

I'm not aware of that, do you have a link to the code?

emilio commented 4 years ago

https://searchfox.org/mozilla-central/rev/6866d3a650c826f1cabd123663e84b95ee743701/layout/mathml/nsMathMLmoFrame.cpp#349

https://searchfox.org/mozilla-central/rev/6866d3a650c826f1cabd123663e84b95ee743701/layout/mathml/nsMathMLmfencedFrame.cpp#450

NSoiffer commented 4 years ago

where is it in the spec?

Apparently I was looking at an older version. I can't find it now.

In any case, we should make sure we all agree the current behavior is what we want (I do) because it differs from TeX (and Firefox).

fred-wang commented 4 years ago

@NSoiffer What is the proposal here? I don't think the spec/tests have anything like what Gecko does.

NSoiffer commented 4 years ago

My proposal is that we follow MathML's spacing rules, not TeX's and not the Gecko "compromise". Here's my rationale:

  1. Spacing rules are meant to visually show grouping -- relational operators have the greatest spacing, whereas higher precedence operators have no spacing.
  2. Competing with that, sub/superscripts add horizontal space, thus making it a little harder to see the grouping intended in the outer context (the original baseline). Other notations such as fractions contribute to the same problem and they don't use TeX's "no spacing" rule, so I don't find this argument very compelling.
  3. Scripts tend to be short, so the difference between TeX and MathML is not usually much of a difference. However, when scripts are longer, I don't see a reason why visual grouping should be thrown out. Also note that because scripts are in a smaller font, the spacing is reduced proportionally. Thus there is only a small additional amount of horizontal space introduced in scripts in MathML relative to TeX.

I tend not to like arguments based on "it's easier to implement", but since that argument has come up a lot in MathML core discussions, not making scripts special means a small reduction in the spec and implementation.

If it is helpful to this discussion, I can generate a some examples that compare TeX's rules to MathML's rules specifically for scripts.

fred-wang commented 4 years ago

I tend not to like arguments based on "it's easier to implement", but since that argument has come up a lot in MathML core discussions, not making scripts special means a small reduction in the spec and implementation.

The argument is not whether "it's easier to implement" but "is the feature important enough for adding extra spec, test, implementation & maintenance cost". Note that MathJax intentionally always follows TeX spacing by default not MathML spacing, so that might be something important.

Regarding this particular case, with a CSS scriptlevel property and the removal of mfenced, this is only one extra line in the code. Also this is one extra sentence in the spec and one specific test (I guess we will need a test for spacing in scripts anyway). So this specific rule is not too intrusive compared to other MathML3 features we removed. That said, if nobody has strong opinion on keeping it I'm happy to remove it. Explicit lspace/rspace can always be used to tune operator spacing.

NSoiffer commented 4 years ago

Here are a few examples.

TeX input:

\displaystyle x^{(n+1)(n-1)}=x^{n^2-1}  \neq x^{n^2}-1
\vskip 1ex
\displaystyle\sum_{i=0}^{\infty} e^{i+1} \hskip 1em
\displaystyle\int _0^{2\pi}e^{e^{it} - it}dt \hskip 1em
\displaystyle\int _0^{2\pi}e^{ - it+e^{it}}dt \hskip 1em
\vskip 1ex
\displaystyle \left( \frac{-1}{p} \right) = (-1)^{\frac{p-1}2}=\ldots

Simulated MathML input to TeX:

\displaystyle x^{(n\>+\>1)(n\>-\>1)}=x^{n^2\>-\>1} \neq x^{n^2}-1
\vskip 1ex
\displaystyle\sum_{i\;=\;0}^{\infty} e^{i\>+\>1} \hskip 1em
\displaystyle\int _0^{2\pi}e^{e^{it}\> - \>it}dt \hskip 1em
\displaystyle\int _0^{2\pi}e^{ - it\>+\>e^{it}}dt \hskip 1em
\vskip 1ex
\displaystyle \left( \frac{-1}{p} \right) = (-1)^{\frac{p\>-\>1}2}=\ldots 

Simulated Gecko input to TeX:

\[\displaystyle x^{(n\hskip 2mu +\hskip 2mu 1)(n\hskip 2mu -\hskip 2mu 1)}=x^{n^2\hskip 2mu -\hskip 2mu 1} \neq x^{n^2}-1
\]\[
\displaystyle\sum_{i\hskip 2.5mu =\hskip 2.5mu 0}^{\infty} e^{i\hskip 2mu +\hskip 2mu 1} \hskip 1em
\displaystyle\int _0^{2\pi}e^{e^{it}\hskip 2mu  - \hskip 2mu it}dt \hskip 1em
\displaystyle\int _0^{2\pi}e^{ - it\hskip 2mu +\hskip 2mu e^{it}}dt \hskip 1em
\]\[
\displaystyle \left( \frac{-1}{p} \right) = (-1)^{\frac{p\hskip 2mu -\hskip 2mu 1}2}=\ldots 
\]

The following are rendered by quicklatex.com (couldn't render the Gecko Version)

TeX rules (18pt): image

MathML rules (18pt): image

Rendered using MathJax 2.7 in Firefox (100% size): image

NSoiffer commented 4 years ago

Much to my surprise, my favorite of the three rendering rules is the Gecko version. My least favorite is the TeX version although I don't like the spacing around the '=' in the MathML rules.

fred-wang commented 4 years ago

Consensus from 2019/11/11: postpone so that people can check

fred-wang commented 4 years ago

@NSoiffer What is the MathJax config you use? As I said above MathJax does not follow MathML spacing by default.

See "mathmlSpacing" in https://docs.mathjax.org/en/latest/options/output/

NSoiffer commented 4 years ago

Here's the header I used: <script type="text/javascript" src=" https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML">

V3.0 has a bug https://github.com/mathjax/MathJax/issues/2241 with it's interpretation of spacing in scripts, so I used the older version.

MathJax uses TeX rules. My comment shows the modifications I made to the TeX to manually add spacing.

On Tue, Nov 12, 2019 at 3:08 AM Frédéric Wang notifications@github.com wrote:

@NSoiffer https://github.com/NSoiffer What is the MathJax config you use? As I said above MathJax does not follow MathML spacing by default.

See "mathmlSpacing" in https://docs.mathjax.org/en/latest/options/output/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mathml-refresh/mathml/issues/38?email_source=notifications&email_token=AALZM3CCLXVZ7V5QELASBOLQTKFC5A5CNFSM4GZHD6H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDZ4TBA#issuecomment-552847748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3A2PVPR64XA5YTVDGTQTKFC5ANCNFSM4GZHD6HQ .

NSoiffer commented 4 years ago

From the Nov 18 meeting minutes, here is what @davidcarlisle said/found:

I looked at TeX, in classic TeX the script font (cmr7) is noticeably wider than 10pt scaled 70%, which means that the effects of TeX not adding space are less pronounced. With xetex/luatex and a scaled font (Stix Two here) it is perhaps more noticable and having the renderer add some space might be better. Here are details: Cmr cmr

Stix2 stix2 Images made with pdflatex and xelatex from

\documentclass{article}
\usepackage{graphics}
\ifx\Umathchardef\undefined\else
 \usepackage{unicode-math}
 \setmathfont{STIX2Math.otf}
\fi
\showoutput

\begin{document}

\setbox0\hbox{$\scriptstyle +$}
\setbox2\hbox{\scalebox{.7}{$+$}}

\showthe\wd0
\showthe\wd2

\[ 1+2 \quad  x^{1+2} \]

\end{document}
fred-wang commented 4 years ago

Source: https://people.igalia.com/fwang/mathml-spacing-in-scripts.html

Gecko: gecko

WebKit: webkit

Blink: blink

XeLaTeX: xelatex

LuaLaTeX: lualatex

fred-wang commented 4 years ago

I think relying on TeX's space commands to "simulate input" and using quicklatex.com or MathJax with TeX spacing rules to render them is a bit unreliable. I attached screenshots for all browsers and modern TeX engines, using their own default spacing rules.

davidcarlisle commented 4 years ago

@fred-wang which fonts were you using with xetex and luatex? I note you are using plain syntax and plain has no standard way to use the opentype math table layout rules.

fred-wang commented 4 years ago

@davidcarlisle What do you mean by plain syntax? All browsers and tex engines are using Latin Modern Fonts AFAIK.

davidcarlisle commented 4 years ago

$$ isn't latex. by default plain and latex formats on both luatex and xetex use 8bit math fonts and classic tex layout rules in math even if they are using opentype fonts for text. with lualatex and xelatex you can use the unicode-math package to enable layout rules based on opentype Math fonts with tables as used in the document showing Stix2 use in https://github.com/mathml-refresh/mathml/issues/38#issuecomment-555676502

fred-wang commented 4 years ago

I'm not sure, but I can't build my document with luatex and xetex. I had to use lualatex and xelatex.

davidcarlisle commented 4 years ago

@fred-wang you only showed the math fragments not the full tex document so I couldn't really tell very easily (hard to compare 8bit type 1 latin modern from the opentype version just from a png image) but if you used latex and didn't use the unicode-math package then it's unlikely to be using opentype math (of course it is possible but you would need to completely over-write the latex math support , which is what unicode-math does)

fred-wang commented 4 years ago

@davidcarlisle i updated the screenshot, using unicode-math

NSoiffer commented 4 years ago

@fred-wang: for completeness, can you provide the full files you used to do the layout? The link you provide to the igalia site only has the math, so no one can replicate what you did and try out other similar variations.

fred-wang commented 4 years ago

@NSoiffer page updated.

fred-wang commented 4 years ago

With scriptlevel merged into font-size ( https://github.com/mathml-refresh/mathml/issues/174 ) it's no longer possible to know when we are in a script and apply rules from https://github.com/mathml-refresh/mathml/issues/38#issuecomment-541630188

At least I think Chromium can access the scriptlevel internally ( https://chromium-review.googlesource.com/c/chromium/src/+/2184131/13/third_party/blink/renderer/core/style/computed_style_extra_fields.json5 ) but I'm not sure whether it is legit to specify it that way or if we should instead expose a separate new boolean CSS property.

I would avoid introducing TeX-like spacing rules contradicting MathML3 in the first version of core anyway as that probably needs more discussion / thoughts.