mathjax / MathJax

Beautiful and accessible math in all browsers
http://www.mathjax.org/
Apache License 2.0
10.15k stars 1.16k forks source link

Linebreaking: Custom goodbreaks with customizable penalties #1916

Open DorianLamaro opened 6 years ago

DorianLamaro commented 6 years ago

Hello. This is a question or a feature request about linebreaking.

We are using MathML to define equations. Is there a way to define multiple levels of goodbreak in mathjax?

For example, we want to define something like(not exact syntax)

goodbreakweak.penalty = -200 goodbreak.penalty=-600 goodbreakstrong.penalty = -800

We also want to set penalties for parentheses ( elements)

A bit about the background: We have a platform which is used by elementary and high schoolers as well as their professors, and most of them have no knowledge of HTML or TeX so manual editing of MathML is not an option. Instead, we automatically set goodbreaks on '=' , '+' and '-' elements. However, we have found that in most cases, + and - should be 'weaker' goodbreaks than '='. Is it possible to do this? Additionally, is it possible to assign goodbreaks or something similar to elements? If not, we might have to resort to putting or other elements after parentheses that can be 'goodbreak'd in order to 'discourage' breaking in the middle of the parentheses.

I'm aware that its impossible to perfectly linebreak every time, we are not trying to achieve that, but some guidelines as to what to do would be helpful.

Thank you in advance

dpvc commented 6 years ago

Well, as you know, line breaking is a complicated issue.

MathML's approach to this is to provide three levels of optional break points (goodbreak, auto, and badbreak), so it would be possible to consider these to be the three that you are asking for. Of course, that would mean you would have to mark all mo as badbreak, leave the pluses and minuses with no line break attribute, and use goodbreak for equal signs. So that's probably not what you want to do.

The other option MathML provides to discourage line breaks is to use mrows to group elements. MathJax's line-breaking algorithm discourages breaks within an mrow, and the deeper the nesting, the more discouraged the line break. So if you want an equal sign to be more breakable than the plus signs in an equation that follows, you would wrap the right-hand side of the equal sign in an mrow.

Here's an example, using the following code:

<div style="display:inline-block; width:120px; border:1px solid black">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>x</mi>
  <mo>+</mo>
  <mi>x</mi>
  <mo>+</mo>
  <mi>x</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

<div style="display:inline-block; width:120px; border:1px solid black; margin-left: 2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>x</mi>
  <mo>+</mo>
  <mi>x</mi>
  <mo>+</mo>
  <mi>x</mi>
  <mo>=</mo>
  <mrow>
    <mi>a</mi>
    <mo>+</mo>
    <mi>a</mi>
    <mo>+</mo>
    <mi>a</mi>
  </mrow>
</math>
</div>
linebreak1

Here, the first example is broken at one of the plus signs, while the second is at the equal sign because the plusses appear within an mrow, making them less-desirable breakpoints.

The more deeply nested, the less desirable the breakpoint. Here is an example where nesting in an mrow twice can change the line breaking:

<div style="display:inline-block; width:130px; border:1px solid black">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

<div style="display:inline-block; width:130px; border:1px solid black; margin-left:2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mrow>
    <mi>a</mi>
    <mo>+</mo>
    <mi>a</mi>
    <mo>+</mo>
    <mi>a</mi>
    <mo>+</mo>
    <mi>a</mi>
  </mrow>
</math>
</div>

<div style="display:inline-block; width:130px; border:1px solid black; margin-left:2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mrow>
    <mrow>
      <mi>a</mi>
      <mo>+</mo>
      <mi>a</mi>
      <mo>+</mo>
      <mi>a</mi>
      <mo>+</mo>
      <mi>a</mi>
    </mrow>
  </mrow>
</math>
</div>
linebreak3

Here, the first two are broken at plus signs, but in the third example, the double nesting makes that less desirable than the break at the equal sign.

MathJax automatically increases the nesting level for parentheses (and other grouping symbols). In the following example, parentheses inhibit the breakpoint between the bs, while in the second box, the parens are replaced by # (which are not considered grouping symbols), and the breakpoint is allowed. If you enclose the two #with an mrow, the breakpoint is again inhibited.

<div style="display:inline-block; width:120px; border:1px solid black">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mo>(</mo>
  <mi>b</mi>
  <mo>+</mo>
  <mi>b</mi>
  <mo>)</mo>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

<div style="display:inline-block; width:120px; border:1px solid black; margin-left: 2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mo>#</mo>
  <mi>b</mi>
  <mo>+</mo>
  <mi>b</mi>
  <mo>#</mo>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

<div style="display:inline-block; width:120px; border:1px solid black; margin-left: 2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mrow>
     <mo>#</mo>
     <mi>b</mi>
     <mo>+</mo>
     <mi>b</mi>
     <mo>#</mo>
  </mrow>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>
linebreak2

MathJax is able to line break properly nested MathML far better than a simple linear collection of symbols. You didn't say how your MathML is generated, or given an example, so it is hard to tell if you are doing this already. Putting mrows around parentheses is recommended for several reasons (it prevents them from stretching larger than they should, for example), and if you put mrows around multiplication, that will help it break at the nearby addition and subtraction instead.

MathJax contains a table internally that controls the penalties associated with different features (like goodbreak and badbreak, as well as how to modify the penalty based on nesting level, and so on. If you want nesting to be stronger at preventing line breaks, you can adjust that value, with code like the following:

<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  CommonHTML: {linebreaks: {automatic: true}}
});
MathJax.Hub.Register.StartupHook("CommonHTML multiline Ready", function () {
  var MML = MathJax.ElementJax.mml;
  var PENALTY = MML.mbase.prototype.CHTMLlinebreakPenalty;
  PENALTY.nestfactor *= 2;
});
</script>

Here, we make the nesting penalty twice its original value (assuming you are using CommonHTML; there are similar tables for the there output renderers).


While nesting is the MathML-recommended mechanism for the kinds of control that you are asking for, there are other possible approaches. One would be to allow MathJax to use a different penalty for different mo based on their content. So that the "default" penalty for = would be different from that for + or -. You can do that as in the following example:

<!DOCTYPE html>
<html>
<head>
<title>CHTML: Additional goodbreak penalties</title>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  CommonHTML: {linebreaks: {automatic: true}}
});
MathJax.Hub.Register.StartupHook("CommonHTML multiline Ready",function () {
  var MML = MathJax.ElementJax.mml;
  var PENALTY = MML.mbase.prototype.CHTMLlinebreakPenalty;
  var BETTERBREAK = MML.mo.prototype.CHTMLbetterBreak;
  //
  // Use this table to set the default penalties for the operators you
  //   want to change.  The usual value is 0, for regular breakpoints.
  //   A goodbreak is -200.
  //
  var MOPENALTY = {
    '-': -200
  }
  MML.mo.Augment({
    CHTMLbetterBreak: function (info, state) {
      var mo = this.data.join("");
      if (MOPENALTY.hasOwnProperty(mo)) {
        PENALTY.auto[0] = MOPENALTY[mo];
      }
      var result = BETTERBREAK.apply(this, arguments);
      PENALTY.auto[0] = 0;
      return result;
    }
  });
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/MathJax.js?config=MML_CHTML"></script>
</head>
<body>

<div style="display:inline-block; width:120px; border:1px solid black">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

<div style="display:inline-block; width:120px; border:1px solid black; margin-left:2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>-</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

</body>
</html>

This produces:

linebreak4

The first block shows the usual breakpoints, but the second has a minus sign, which is set to have an automatic penalty equivalent to a goodbreak, so the breakpoint occurs there. You can add whatever additional penalties you want to the MOPENATLY table above. That way, you don't have to include explicit breaks in the MathML itself.

It would also be possible to use the TeX class (like REL and BIN) to determine the default penalties, but I haven't made an example of that.


Finally, if you really want to have linebreak="greatbreak", etc, then it is possible to do that as well, though that is a hack that will make your MathML invalid. Here is an example:

<!DOCTYPE html>
<html>
<head>
<title>CHTML: Additional goodbreak penalties</title>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  CommonHTML: {linebreaks: {automatic: true}}
});
MathJax.Hub.Register.StartupHook("CommonHTML multiline Ready",function () {
  var PENALTY = MathJax.ElementJax.mml.mbase.prototype.CHTMLlinebreakPenalty;
  PENALTY.greatbreak = [-500];
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/MathJax.js?config=MML_CHTML"></script>
</head>
<body>

<div style="display:inline-block; width:120px; border:1px solid black">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

<div style="display:inline-block; width:120px; border:1px solid black; margin-left:2em">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>I</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo linebreak="greatbreak">+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>a</mi>
</math>
</div>

</body>
</html>

which produces:

linebreak5

You could add whatever additional linebreak values you want. But I don't recommend this approach, as the MathML is invalid that way.

DorianLamaro commented 6 years ago

Thank you very much for the help, and this detailed reply.

We decided to set different penalties for individual characters such as '+'.'=','(',')' etc which produced the desired effect after some experimenting in most cases.

Question, the penalties set for parentheses also seem to affect mfenced elements. Is this expected?

Also, is there somewhere like the documentation that similar information about modifications is available, without pestering the devs directly?

Lastly, but unrelated to the rest of the thread, I will try to lobby for a donation since you saved us a good amount of man hours. After today, Pledgie is closing, will there be an alternative option for individual donations?

dpvc commented 6 years ago

the penalties set for parentheses also seem to affect mfenced elements. Is this expected?

Yes. The MathML specification says that mfenced should act identically to the equivalent mrow-with-mo construct (i.e., <mfenced open="(" close=")">...</mfenced> should act exactly like <mrow><mo>(</mo>...<mo>)</mo></mrow>). Indeed, MathJax's output renders do use internal mo objects to handle their layout, so the output should be identical. That means they will have the same line break properties.

is there somewhere like the documentation that similar information about modifications is available, without pestering the devs directly?

Not really.

After today, Pledgie is closing, will there be an alternative option for individual donations?

I was unaware that it was closing. I will have to talk to the managing partner of the consortium and see what alternative they suggest.

Thank you for your willingness to contribute! That means a lot to us.

dpvc commented 6 years ago

After today, Pledgie is closing, will there be an alternative option for individual donations?

We have a new PayPal donation button on the MathJax.org website.