w3c / mathml-core

MathML Core draft
https://w3c.github.io/mathml-core
36 stars 14 forks source link

Linebreaking support #127

Open fred-wang opened 5 years ago

fred-wang commented 5 years ago

cc @bfgeek

The MathML core spec now defines all the min-content / max-content values, however these two are equal and linebreaking is supposed to never happens.

I believe linebreaking could potentially happen in:

MathML 3 rules for linebreaking are quite complex, maybe we should have a simple version first and refine it later or get it improved by polyfills when the CSS Layout API is ready.

Just opening this so that it can be referenced from the spec.

fred-wang commented 5 years ago

We need to investigate a bit this, but I imaging we could introduce a math-wrap property (or rely on an existing CSS one) in the future, which would default to nowrap if we are concerned about backward-compatibility change.

fred-wang commented 5 years ago

Gecko disables linebreaking in table cell: https://dxr.mozilla.org/mozilla-central/source/layout/mathml/mathml.css#137

WebKit disables linebreaking in foreign content: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/css/mathml.css#L134

I guess we could set white-space to nowrap by default on MathML elements to prevent any backward-compatibility issue.

fred-wang commented 5 years ago

It would probably interesting to start experiment line breaking with the CSS Layout API ( https://drafts.css-houdini.org/css-layout-api/ ) maybe starting with basic tests and then trying write a mrow-like layout.

In any case, I think w3c/mathml-core#123 should be resolved first.

ronkok commented 4 years ago

Line breaks are more important in the mobile-screen world than they were years ago. I think this should get a high priority.

The TeXbook, page 173 states that a "A formula will be broken only after a relation symbol like $=$ or $<$ or $\rightarrow$, or after a binary operation symbol like $+$ or $-$ or $\times$, where the relation or binary operation is on the ``outer level'' of the formula (i.e., not enclosed in {...} and not part of an \over construction)."

KaTeX does its best to emulate the TeXbook rule. For my own work, this issue the is single thing that will cause me to use use KaTeX HTML rather than MathML.

NSoiffer commented 4 years ago

The rules that TeX has for linebreaking only apply to inline math. TeX categorizes symbols into a small number of categories (to fit in four bits because efficiency was critical at the time of its design). This means that there are broad generalizations that work a lot of the time but not always. E.g., (a+b)⋅(a-b) might break at the + or -, which would be a poor place to break.

Generally, in linebreaking you want to look at the expression tree and break as close to the root as possible while still filling as much of the line as possible. Typically, relational operators will be at the root of a tree, then lower precedence operators like + and -, then higher precedence operators like ⋅, ⨯, and /. Well-structured MathML (which is unfortunately not very common) has mrows that align with the expression tree so knowing where is a good spot to break is easier with well structured MathML. MathML's operator dictionary gives priorities of operators that can be used for parsing and linebreaking, and serve as a guide for spacing also (higher priorities have less space around them in general). MathML 3 lists a potential linebreaking algorithm that takes time proportional to the number of token elements times the number of lines, so it is relatively quick and does a pretty good job. A more complicated version that looks at the whole expression vs a single line would mimic TeX's paragraph linebreaking rules. I would like to see the simpler algorithm become part of the core spec, but I understand we need to have priorities and this might add several weeks of implementation and spec time if cleanly doable at all with the current state of CSS.

In response to earlier comments:

  1. Linebreaking inside of tables/matrices is complicated if you want to do a good job. That's because if the width of the table cell is computed automatically, you really want to do that knowing how good/bad that width is for linebreaking the expression. If there is a fixed column width, then it is no more complicated than normal linebreaking. In my former job, we had a publisher that had two column layout and put math inside of tables in each column, so they were very concerned about good linebreaking. The point of this comment is that allowing linebreaks in math in tables will address real needs of publishers. On phones, those issues will be there for any math in a table.

  2. In addition to linebreaks, for display math, one needs to deal with indentation. As with linebreaks, indentation levels reflect the expression tree and help readers understand what is grouped with what.

  3. Linebreaking does require another pass over the layout once sizes have been determined, but it is not rocket science to figure out good linebreaks, at least conceptually. Whether it fits in with the CSS layout model is (I think) the main question. The CSS Layout API mentioned earlier seems promising with indentation maybe done by left-padding each line.

  4. Any polyfill that did linebreaking would cause reflow, which would be bad. It would have to come after layout is done. Potentially a polyfill could be written that creates well-structured mrows so that the browser implementation has an easier time doing linebreaking.

  5. MathML 3 has a number of manual linebreaking and indenting options that can set on mo. Maybe the first step is for the core spec to specify those. That would at least allow an author to get some linebreaking/indentation to happen so that (for example) an expression with multiple = signs can be broken at the =s and aligned.

fred-wang commented 4 years ago

I think there are two important points in Neil's reply:

ronkok commented 4 years ago

This must be incremental

Agreed and acknowledged. The work being done is excellent.

MathML 3 has a number of manual linebreaking and indenting options that can set on mo.

Yes, MathML 3 contemplates an attribute of linebreakstyle on a <mo>. It would be great if this were to be specified and actually implemented, unlike in current Firefox.

The idea with MathML Core is that you can just use normal CSS/JS technologies as for other HTML elements.

If I understand that statement correctly, then one could apply an inline style of display: inline-block to a top-level <mo> and it would act like just like a <mo> with a linebreakstyle="before" attribute. That would also be terrific and would be all that I ask.

Do I understand that statement correctly?

ronkok commented 4 years ago

So if KaTeX or other polyfills are able to do linebreaking with HTML elements they could just follow similar approach with MathML elements.

A similar approach would break the top level into multiple mrows, with each break occurring at a binary or relational operator. That would map a + b = d into:

<mrow><mi>a</mi><mo>+</mo></mrow>
<mrow><mi>b</mi><mo>=</mo></mrow>
<mrow><mi>d</mi></mrow>

This approach would create automatic line breaks in the TeXbook locations. It works, at some cost to the semantics.

As suggested, KaTeX could implement this method. It is very similar to what is now done in HTML. If the method in the previous statement will not work, the method in this comment is probably what we will do. Let me know where we stand.

fred-wang commented 4 years ago

So if KaTeX or other polyfills are able to do linebreaking with HTML elements they could just follow similar approach with MathML elements.

A similar approach would break the top level into multiple mrows, with each break occurring at a binary or relational operator. That would map a + b = d into:

This is the short term approach I was thinking about. You can use getBoundingClientRect() to know position and size after layout in order to apply line breaking depending on the screen size. If semantics is a problem, note that you can put these splited MathML pieces into a shadow tree so that the original MathML DOM is still available.

ronkok commented 4 years ago

@fred-wang Thank you for the quick response. That clears up the picture considerably.

fred-wang commented 4 years ago

@ronkok No problem. Additionally, note that you can use https://developer.mozilla.org/en-US/docs/Web/API/ResizeObserver to watch when the width of the container of the <math> tag changes in order to update linebreaking(e.g. when the user resize the window.

ronkok commented 4 years ago

KaTeX avoids, so far, any reliance on the browser for runtime information. It generates code that works whether generated client-side or server-side. So I think we're stuck with the multiple <mrow> approach.

fred-wang commented 4 years ago

consensus from 2020/06/23: postpone to a future version

NSoiffer commented 2 years ago

Since it is not in this issue and might prove useful to a future core implementation... I implemented a linebreaking polyfil back in 2020 (seems like a lifetime ago...). You can see it in action on github.io. Click on Apply Transform to see it work (if you have Chrome/Edge, the MathML display needs to be on).

This transform makes use of one column mtables as its target because there is currently no other way to get multiple lines to show up in the implementations. If core supported a manual linebreak (i.e., if <mo linebreak='newline'> is supported and forced the start of a new line), then this polyfill could take advantage of that and it would be much less intrusive in what it currently does to the MathML by adding an mtable.

Note: indentation is done using mspace and that same idea would carry forward to a version of core that supported a manual linebreak.

dginev commented 1 year ago

Since we have a prolonged gap period here, are there any current recommendations for pure CSS solutions for reflow?

I took a stab at switching the display of a simple equation to inline-flex with a corresponding @media query for small viewports, and it seemed to behave quite reasonably for a failsafe (in FF and Chrome).

Here is an example of that (with flex always on). The demo should be able to render 6 different arrangements as the screen shrinks: https://codepen.io/dginev/pen/rNQjdzR

It would take some more fine-tuning to control the finer details of reflow, but this could already be a healthy upgrade for common equation markup.

ronkok commented 1 year ago

I can confirm that a flex-based solution works pretty well. You can see it already in action if you navigate to Temml.org and turn display mode off.

In default mode, Temml writes MathML with <mrow> elements that each end in a binary operator or relation operator. (Per The TeXbook p. 173) Then the <math> element carries the following CSS:

/* flex-wrap for line-breaking in Chromium */
math {
  display: inline-flex;
  flex-wrap: wrap;
  align-items: baseline;
}
math > mrow {
  padding: 0.5ex 0ex;
}

/* Avoid flex-wrap in Firefox */
@supports (-moz-appearance:meterbar) and (display:flex) {
  math { display: inline; }
  math > mrow { padding: 0 }
}

I don’t apply flex-wrap in Firefox. In Firefox, the separation by <mrow> elements already works without a flexbox.

Temml has a rendering option which allows a website administrator to select breaks before = signs instead of after binary operators. The CSS remains the same, but Temml generates MathML with differently grouped <mrow> elements.

In the comments above, I was one of those asking for line-breaking action in the MathML Core specification. At this time, I do not make that request. In Chromium and Firefox, I think line-breaking is largely a solved problem. Chromium and WebKit have much bigger rendering problems to solve.

Sadly, flex-based line-breaking does not work in WebKit. Maybe someday.

dginev commented 1 year ago

@ronkok This is great, thank you for the extra context.

But I wouldn't go beyond calling this a "stopgap solution", since you've enumerated some serious problems. WebKit lacking support is one, another is ending up with two non-standard solutions for Chrome and Firefox.

Not having a standard way to manually force (or softly suggest) a linebreak in the usual MathML markup is a third. Flexbox allows a variety of techniques to support what Neil referred to as <mo linebreak='newline'>, but none seems to work on an element with pre-set content. Instead, with flexbox we seem to need a dedicated empty element to indicate the forced break (similar to the empty <br> in HTML). Here is an example of the best I could come up with. Since it had to be empty, I used <mspace>): https://codepen.io/dginev/pen/zYMNjgd

In summary:

<mrow> ...LHS... </mrow>
<mo>=</mo>
<mspace class="linebreak"></mspace>
<mrow> ...RHS... </mrow>
mspace.linebreak {
  flex-basis: 100%;
  height: 0;
  width: 0;
  overflow: hidden;
}

Update: In cases where we have a <semantics> wrapper to also hold Content MathML under the root math element, the inline-flex approach doesn't appear to be possible in Firefox. So it's a Chrome-only trick at this point.

As an alternative (and maybe even worse) trick, one could rearrange the <mrow> structure by tucking in the equal sign in the left-hand side mrow, which will have them treated as a single flex item. But that would break the default spacing support available via the operator dictionary. So I think inline-flex "mostly working" is a nice surprise, but still appears to be a crutch to ducktape some reflow together, until we have a proper mechanism.

bkardell commented 1 year ago

In Chromium and Firefox, I think line-breaking is largely a solved problem. Chromium and WebKit have much bigger rendering problems to solve.

One of these is a typo, I suppose? The second one?

ronkok commented 1 year ago

To clarify: Temml's <mrow> trick works pretty well at providing non-display mode line-breaking in both Chromium and Firefox. It does not work in WebKit.

On the more general question, I have begun compiling a list of browser issues. Firefox is the best, by a large margin. Chromium has some serious issues, especially if system fonts are used. WebKit is the worst. It cannot even render an accent at the correct vertical alignment.

I put a lot of work into the Temml library. It is my sincere hope that someday, MathML will get widespread use. But I think that day is not yet. I suspect most web site administrators will avoid MathML until browser rendering is more reliable.

NSoiffer commented 1 year ago

@ronkok: I'm a little late to the game, but I finally tried temml.org in Chrome and it is great to see flex-based solution works. But putting aside it currently only works in Chrome/Edge, I would add it is only half the solution. Not only does an expression need to wrap long lines, it needs to indent them appropriately. Is that something that can be done with flexbox?

The polyfill I mentioned illustrates why that's important. But it is hacky in that it had to create an mtable. I hope core level 2 will have at least the hook mentioned.

ronkok commented 1 year ago

@NSoiffer I'd like to look at your polyfill in action, but I'm getting a 404 error when I click on that link.

I'm still thinking about how to get an indent. I have a version working in a web app I call Hurmet.app. It renders math with Temml's version of MathML and I have it set to wrap before top-level = characters. Here is a screenshot of how that looks: Math wrap

To get that indent, I have appended the following CSS.

/* Create a hanging indent on calculations that wrap to a second line. */
.hurmet-calc > math > mrow:not(:first-child) { margin-left: 2em }
.hurmet-calc > math > mrow:not(:last-child) { margin-right: -2em }

That's pretty hacky and I have applied it only to my own site, Hurmet, not to the library Temml. I'd like better control over the width of the indent. I'd like something less odd. But it will serve as a temporary line breaking solution. I agree with both you and @dginev that this is a temporary fix and would benefit from better support in MathML-Core someday.

But there several other rendering issues that I think should get higher priority.

davidcarlisle commented 1 year ago

On Thu, 3 Aug 2023 at 16:55, Ron Kok @.***> wrote:

@NSoiffer https://github.com/NSoiffer I'd like to look at your polyfill in action, but I'm getting a 404 error when I click on that link.

I'm still thinking about how to get an indent. I have a version working in a web app I call Hurmet.app https://hurmet.app/sample. It renders math with Temml's version of MathML and I have it set to wrap before top-level = characters. Here is a screenshot of how that looks: [image: Math wrap] https://user-images.githubusercontent.com/16403058/258165385-eb5e6b13-4268-49b1-acd8-8e2afb5cb063.png

To get that indent, I have appended the following CSS.

/ Create a hanging indent on calculations that wrap to a second line. / .hurmet-calc > math > mrow:not(:first-child) { margin-left: 2em } .hurmet-calc > math > mrow:not(:last-child) { margin-right: -2em }

That's pretty hacky and I have applied it only to my own site, Hurmet, not to the library Temml. I'd like better control over the width of the indent. I'd like something less odd. But it will serve as a temporary line breaking solution. I agree with both you and @dginev https://github.com/dginev that this is a temporary fix and would benefit from better support in MathML-Core someday.

But there several other rendering issues that I think should get higher priority.

The polyfills moved from mathml-refresh to w3c github organisations

https://w3c.github.io/mathml-polyfills/

but the html linking seems broken in the move (all links off the above page are 404) , I'll look...

Just github pages is broken, the source is

https://github.com/w3c/mathml-polyfills/blob/main/acid-test.html

Message ID: @.***>

davidcarlisle commented 1 year ago

@NSoiffer @ronkok fixed at https://w3c.github.io/mathml-polyfills/acid-test.html

ronkok commented 1 year ago

Thank you for the link to the polyfill. It's nice work.

Temml is written to run either client-side or server-side. It therefore does not have access to document.getElementById() and cannot use the techniques in the polyfill. Temml line-breaking is a CSS solution.

There is room in the world for both CSS solutions and JavaScript solutions. Hopefully, one day the browser will have a native solution and neither will be necessary.