mathjax / MathJax

Beautiful and accessible math in all browsers
http://www.mathjax.org/
Apache License 2.0
10.21k stars 1.16k forks source link

mediawiki texvc: Commands printed with backslash #1236

Closed physikerwelt closed 9 years ago

physikerwelt commented 9 years ago

https://github.com/wikimedia/mathoid/blob/master/test/files/mathjax-texvc/test-p.md shows that some commands are printed as the original tex input and no error was thrown. Is there a way to throw an error on an unknown command? PS: Note that those commands were enclosed in a \mbox

pkra commented 9 years ago

I don't understand this bug report, I'm afraid.

physikerwelt commented 9 years ago

for example \textvisible space should be rendered as it looks in \LaTeX and not as the string \textvisiblespace (https://github.com/wikimedia/mathoid/raw/master/test/files/mathjax-texvc/png/33.png)

physikerwelt commented 9 years ago

and \wrongcommand should print out a warning or an error and not just render \wrongcommand

pkra commented 9 years ago

for example \textvisible space should be rendered as it looks in \LaTeX and not as the string \textvisiblespace

Thanks, I understand that part.

What I lack information on is what you've actually be doing that creates this result. The markdown file tells me the commands and provides an image.

When I render that content with MathJax, things work as expected -- those macros that are in mediawiki-texvc.js render fine, the others fail. http://codepen.io/pkra/pen/xGBKYX.

dpvc commented 9 years ago

The combined configuration files include the noErrors extension which causes any TeX that throws an error to be rendered as the original TeX code rather than the error message. If you want to see the error messages instead, then use

MathJax.Hub.Config({
  TeX: {noErrors: {disabled: true}}
});

Peter's example doesn't use a combined configuration file, and he didn't load noErrors.js, so he sees the messages. You don't give your configuration, but I suspect it includes noErrors.js.

physikerwelt commented 9 years ago

@dpvc : Thank you very much. @pkra : Yes now they fail. And I can investigate why my patch (https://github.com/wikimedia/MathJax/commit/71d31d6e98b5b654804f4151cf4b54ba73064de7) does not work. i.e. it seems that $\mbox{\AA}$ does not work whereas just \AA works (after having applied the patch)

physikerwelt commented 9 years ago

@pkra and of course thank you too;-)

pkra commented 9 years ago

No problem. Just like LaTeX, \mbox{} is text mode for MathJax and we don't support text mode.

You can switch back into math mode, e.g., \mbox{$\AA$}, but that won't get you what you want either because \AA is not supported; see https://github.com/mathjax/MathJax/issues/795 for an explanation. However, as discussed when you added the mediawiki extension, we would accept an \AA macro in that particular extension.

pkra commented 9 years ago

PS: "we don't support text mode" is technically wrong; we try as hard as possible not to because it's very different from math mode.

physikerwelt commented 9 years ago

OK. The problem is that we need the output $\mbox{\AA}$ for the PDF generation using regular latex. So whenever a user types \AA this is converted to \mbox{\AA} in a preprocessing step. The commands defined using "MathJax.InputJax.TeX.Definitions.Add" work in math mode which is perfectly fine and a great help. Now, I need to find out if there is somehting like "MathJax.InputJax.Text.Definitions.Add" to add new macros that get evaluated when MathJax is in something like a text mode.

physikerwelt commented 9 years ago

Otherwise we would need to touch the preprocessing step to produce different kinds of output depending if the output is for latex or mathjax... that's something @d00rman ane me would like to avoid.

pkra commented 9 years ago

Why not use \AA in math mode only? It's what most packages that provide \AA allow, I thought.

physikerwelt commented 9 years ago

If you use \LaTeX to render $\AA$ the circle is not centered on the A. So rendering $\mbox{\AA}$ is really what you want to when you use LaTeX. Test this $\AA$$\mbox{\AA}$$\mbox{$\AA$}$ in a LaTeX document

dpvc commented 9 years ago

$\mathrm{\AA}$ should work in both LaTeX and MathJax.

physikerwelt commented 9 years ago

@dpvc Thank you. It looks exactly the same for me. I'll try that and see if @cscott can live with changing \mbox to \mathrm

physikerwelt commented 9 years ago

... mh the problem is more involved... texvcjs is supposed to be idempotent which works for mbox but not for mathrm... So when I change mbox to mathrm all the tests fail... I have the feeling that overwriting the meaning of commands with backreferences to the command itself is conceptually bad

cscott commented 9 years ago

I'm sorry, I've read this discussion and I'm completely lost. The problem is with the \AA command? As I understand it, MathJaX doesn't render $\mbox{\AA}$ correctly because it doesn't support text mode at all?

The solution is probably for MathJax (or the WMF MathJax extension) to support the small subset of \mbox which is used by texvc. That's just simple commands which produce unicode characters.

texvcjs can help you out, since we have an AST where we've already parsed the \mbox contents IIRC.

ps. And it's not just \AA, texvcjs contains the following:

// text-mode literals; enclose in \mbox
module.exports.other_literals2 = arr2set([
    "\\AA",
    "\\Coppa",
    "\\coppa",
    "\\Digamma",
    "\\euro",
    "\\geneuro",
    "\\geneuronarrow",
    "\\geneurowide",
    "\\Koppa",
    "\\koppa",
    "\\officialeuro",
    "\\Sampi",
    "\\sampi",
    "\\Stigma",
    "\\stigma",
    "\\textvisiblespace",
    "\\varstigma"
]);

That is (I believe) the complete list of "text mode literals" which are present inside \mbox.

physikerwelt commented 9 years ago

@cscott the problem is that MathJax does not evaluate the macros defined inside the mediawiki texvc extenson inside mbox elements (i.e. the commands specified under other_literals2 in texvcjs). If mathjax does (not) support text mode is not clear to me. Currently WMF does not modify mathjax at all, only an extension is used to support custom markup. However, this extension is now (since mathoid 0.2.8) almost useless since texvcjs already transforms the custom commands to standard LaTeX. Only a few transformations end in markup that can be processed by LaTeX but not by MathJax.

dpvc commented 9 years ago

texvcjs is supposed to be idempotent which works for mbox but not for mathrm

I'm sorry, I don't know what that means in this context. Can you explain the problem in more detail?

The solution is probably for MathJax (or the WMF MathJax extension) to support the small subset of \mbox which is used by texvc.

There is a function that gets called on the text within \mbox, \hbox, \text, etc, that currently just wraps the results in an <mtext> element (and makes sure initial and trailing spaces are not lost). You could certainly override that in your extension so that it looks for the macros that you want to support and inserted the proper characters. Such processing is unlikely to be added to MathJax core, however.

physikerwelt commented 9 years ago

@dpvc texvcjs is the program that converts insecure user input to a restricted subset of LaTeX considered as secure. Currently it converts \AA to \mbox{\AA} and \mbox{AA} to \mbox{\AA}. I changed this behaiviour in my local branch so that it converts \AA to \mathrm{\AA} but unfortunately it converted \mathrm{\AA} to \mathrm{\mathrm{\AA}} which was not desired. (I could solve this problem in the meantime... but for some resaon texvcjs now does no longer realize that some commands like \euro require the euro package to be loaded in LaTex.) However, this is offtopic here all that matters that it is not trivial for me to change texvcjs to use mathrm instead of mbox.

Can you give some hints how to overwrite the fuction that is called for mbox from an extension?

pkra commented 9 years ago

Moritz, this is a bit screwed up. I raised this problem when you asked to keep the mediawiki extension in the core MathJax repo to simplify mathoid maintenance. As I recall from a F2F discussion, those macros were supposed to be taken care of by texvcjs. Now it turns out that's not the case. :disappointed:

While the question of processing \mbox{\AA} like TeX/LaTeX may seem trivial from a mathoid/texvc perspective, it's a significant problem for MathJax because the line between math and text mode is a highly non-trivial one and we avoid crossing it for good reasons. We realize that the situation is easier for tools like texvc that have a much simpler TeX input structure (and of course it's trivial if you're feeding into real TeX engines).

If I understood you correctly, the real problem is in a PDF generation pipeline. I would suggest to think about solving the problem there. A modern LaTeX setup should process most Unicode just fine (even more so when using engines like XeTeX that incidentally also understands MathML directly); this would allow you to create a simple macro mapping to Unicode, solving the problem in the straight-forward way. It might move your PDF pipeline forward as well, e.g., help with the problems of complex graphemes discussed on phabricator and on our tracker.

As for overwriting the function, you might want to take a look around our code base; for example, the color extension overwrites the original color macro; the Hbox function in the TeX input isn't complex beyond the basic TeX input parts.

But to repeat Davide's comment: such a change to mbox would not fit in the core repo.

physikerwelt commented 9 years ago

@pkra: First and formemost many thanks to @dpvc and you for the detailed explaination of the behavior, you expected and I did not expect.With regard to the mediawiki-texvc extension. I think the code currently in the extension is perfectly fine since it defines simple userdefined commands as one would usually do in a LaTeX workflow using newcommand. The changes we might make now to this extension are not something that should be kept in the long run. So I'm not to happy to put it to that extension because this deals about caeses where MathJax does not behave in the same way as LaTeX does. My experience with people using \LaTeX is that some of them do not like special chars (even ascii chars for example \"a instead of ä). However, it's not clear to me how would you make a upright Angström symbol using Unicode. I think independent of the Math extension and the user of MathJax in the Wilkipedia context, it would be great to know for MathJax users how to write a user defined command that wold render \AA as expected. As far as I understand it would be \AA -> \mathrm(\u00C5) right?

cscott commented 9 years ago

\AA -> \mbox{Ä} should be fine, I would think. On Aug 9, 2015 11:13 AM, "Moritz Schubotz" notifications@github.com wrote:

@pkra https://github.com/pkra: First and formemost many thanks to @dpvc https://github.com/dpvc and you for the detailed explaination of the behavior, you expected and I did not expect.With regard to the mediawiki-texvc extension. I think the code currently in the extension is perfectly fine since it defines simple userdefined commands as one would usually do in a LaTeX workflow using newcommand. The changes we might make now to this extension are not something that should be kept in the long run. So I'm not to happy to put it to that extension because this deals about caeses where MathJax does not behave in the same way as LaTeX does. My experience with people using \LaTeX is that some of them do not like special chars (even ascii chars for example \"a instead of ä). However, it's not clear to me how would you make a upright Angström symbol using Unicode. I think independent of the Math extension and the user of MathJax in the Wilkipedia context, it would be great to know for MathJax users how to write a user defined command that wold render \AA as expected. As far as I understand it would be \AA -> \mathrm(\u00C5) right?

— Reply to this email directly or view it on GitHub https://github.com/mathjax/MathJax/issues/1236#issuecomment-129197673.

cscott commented 9 years ago

.. Which should be done in the mathjax WMF extension or in a special texvcjs mode, BTW, since vanilla LaTeX does not always handle Unicode particularly well (depending on the tex engine) and so some users of texvcjs would prefer not to convert TeX commands to Unicode.

dpvc commented 9 years ago

it would be great to know for MathJax users how to write a user defined command that wold render \AA as expected. As far as I understand it would be \AA -> \mathrm(\u00C5) right?

If you mean for addition to a configuration file like the mediawiki-texvc.js file, then

    AA: ["Macro", "\\mathrm{\u00C5}"]

should work (note the braces rather than parens, and the double backslash for \mathrm but not \u00C5). But perhaps

    AA: ["Macro", "\\unicode{xC5}"]

would be simpler (it produced an <mtext> element so it is upright automatically). This is not standard LaTeX (but then neither is \mathrm{Ä}). Note that \def\AA{\mathrm{\u00C5}} as part of a math expression would not work, since \uXXXX is not defined in LaTeX. I suppose one could make such macro, however.

dpvc commented 9 years ago

texvcjs is the program that ... converts \AA to \mbox{\AA} and \mbox{AA} to \mbox{\AA}. I changed this behaiviour in my local branch so that it converts \AA to \mathrm{\AA} but unfortunately it converted \mathrm{\AA} to \mathrm{\mathrm{\AA}}

The texvcjs program must special-case the \mbox{\AA} to \mbox{\AA} somewhere; can't you use that same mechanism to make \mathrm{\AA} remain \mathrm{\AA} as well?

Can you give some hints how to overwrite the fuction that is called for mbox from an extension?

You would need to do something like:

MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () {
  var PARSE = MathJax.InputJax.TeX.Parse;
  var INTERNALTEXT = PARSE.prototype.InternalText;
  PARSE.Augment({
    InternalText: function (text,def) {
      text = text.replace(/\\AA/g,"\u00C5");
      return INTERNALTEXT.call(this,text,def);
    }
  });
});

This makes a copy of the InternalText() function and replaces it with one that does some pre-processing on the text (in this case, converts any occurrence of \AA to the literal U+00C5 character), then calls the original function on the modified text and returns the result.

physikerwelt commented 9 years ago

https://github.com/wikimedia/texvcjs/commit/5e7c01378eac5fb50407d819bdc542f4c5e6f8c3 does the first suggestion of @dpvc ... however I'm not sure if consensus for this change even with amandmends can be achieved.

physikerwelt commented 9 years ago

OK. With \mathrm it renderes at least something that in some cases looks like the expected rendering https://github.com/physikerwelt/mathoid-server/blob/P/test/files/mathjax-texvc/test-p.md

pkra commented 9 years ago

Here's what MathJax-node produces from \mathrm{Å}.

<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.5ex" height="3ex" style="vertical-align: -1ex; margin-left: 0ex; margin-right: 0ex; margin-bottom: 1px; margin-top: 1px;" viewBox="0 -875 610 1319.5" xmlns="http://www.w3.org/2000/svg">
<defs></defs>
<g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" font-style="" font-weight="" stroke="none" style="font-family: monospace" transform="scale(71.759) matrix(1 0 0 -1 0 0)">Å</text>
</g>
</svg>

test

dpvc commented 9 years ago

@pkra, not sure if you are registering a complaint or just indicating what the output is. The output is correct, and displays as expected in the browser for me.

physikerwelt commented 9 years ago

Is there a way to include a font to MathJax-node config so that the text is transformed to a path (like it's done for the rest of the input)?

pkra commented 9 years ago

@dpvc this was just to compare with Moritz problematic sample. It seems the problem comes from elsewhere.

@physikerwelt no, that would require something sophisticated for extracting the path and its metrics and a modification of MathJax to accept such data on the fly. Alternatively, MathJax needs to be extended to allow a more complex font fallback routine to re-use a glyph from another supported font, e.g., STIX. Both would require significant resources.

physikerwelt commented 9 years ago

@pkra Yes. You are correct. On modern browsers with support for that font the SVG looks good. It also does for mathoid see https://github.com/physikerwelt/mathoid-server/blob/P/test/files/mathjax-texvc/test.md (I added a link to the SVG image in the overview) It's exactly the same SVG source as posted by Peter. However, the SVG->PNG conversion does not look good yet.

Both would require significant resources.

What kind of resources? Humans or compute time?

physikerwelt commented 9 years ago

Is that related to the warning we see in the console: SVG - Unknown character: U+A7 in MathJax_Main,MathJax_Size1,MathJax_AMS Are there any alternative approaches to represent the Angstöm symbol and the textvisiblespace... I think those two are most significant.

physikerwelt commented 9 years ago

Just for reference the pull request that changes \mbox{\AA} to \mathrm{AA} in texvcjs https://github.com/wikimedia/texvcjs/pull/7

pkra commented 9 years ago

Humans or compute time?

Both but I was thinking of humans primarily. It would require several improvements to our dev-tools or alternatively brand new tools build on other technologies.

Are there any alternative approaches to represent the Angstöm symbol and the textvisiblespace...

For Angstrom, U+212B seem correct. You could use TeX to hack something (see #795 for examples) but then you lose the semantics. Similarly for textvisiblespace.

dpvc commented 9 years ago

Is that related to the warning we see in the console:

SVG - Unknown character: U+A7 in MathJax_Main,MathJax_Size1,MathJax_AMS

This one is actually from \S, but yes, \AA will cause a similar issue. These are characters that just aren't in the MathJax web font. You could try using the Latin-Modern web font, which has greater coverage in the Latin-1 Supplement and Latin Extended-A and B unicode blocks.

Note, however, that technically, you probably want U+212B (ANGSTOM SIGN) for \AA rather than U+00C5 (LATIN CAPITAL A WITH RING ABOVE), as that is the Unicode position specifically designed for it. Latin-Modern has both defined. You may have other issues using Latin-Modern, however, as there may be some characters in the MathJax web font that aren't in Latin-Modern (I haven't done a careful comparison), and the Latin-Modern font data are not in as good shape as the MathJax web font. But you might give it a try. Use

mjAPI.config({MathJax: {SVG: {font: "Latin-Modern"}}});

before your mjAPI.start() call.

(OOPS, looks like Peter answered while I was writing).

dpvc commented 9 years ago

U+2423 is a visible space, and Latin-Modern includes it, so that might work for your \textvisiblespace. Otherwise, you would have to make it up from rules, and (as Peter points out) that loses the semantics completely. The \AA could be \mathring{\mathrm{A}}, but that puts space between the ring and the A, and you won't like that. Other than that, I have no viable suggestion.

cscott commented 9 years ago

The fundamental problem is that MathJax doesn't handle TeX text mode, and the extension on Wikipedia can include arbitrary text inside \text, \mbox, \hbox, or \vbox. However, texvcjs doesn't allow any escape sequences inside text mode [*]. As a special exception, the special cases listed above generate an \mbox with a single command in it, corresponding to the appropriate character.

So this isn't arbitrary text mode handling at all, but I think a proper patch should start from the basis that we are going to do some text-mode processing, rather than try to coerce the characters into math mode. They should use the text font, not the MathJaX font, for example.

I think the patch suggested in https://github.com/mathjax/MathJax/issues/1236#issuecomment-129222235 is the right solution here, especially since we have already sanitized the input so the number of possible escapes you will find is limited, and there's no chance for mischief or mis-matching there.

[] The characters allowed in text mode (boxchars in the source) are `[-0-9a-zA-Z+,=():\/;?.!\'\x80-\ud7ff\ue000-\uffff]. I don't think there are any TeX meta-characters in there, correct me if I'm wrong.

pkra commented 9 years ago

I think a proper patch should start from the basis that we are going to do some text-mode processing

Just to repeat: that kind of solution won't be part of the MathJax core.

They should use the text font, not the MathJaX font, for example.

You won't get that result without more hacking since Mathoid uses MathJax-node and some additional stuff after that.

cscott commented 9 years ago

@pkra sure, but the code @dpvc provided looks like a fine thing to have in the Wikimedia extension, right?

 MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () {
  var PARSE = MathJax.InputJax.TeX.Parse;
  var INTERNALTEXT = PARSE.prototype.InternalText;
  PARSE.Augment({
    InternalText: function (text,def) {
      text = text.replace(/\\AA/g,"\u00C5");
      return INTERNALTEXT.call(this,text,def);
    }
  });
});

WRT the text font: exactly. That's why the \mathrm solution isn't right, and why they were having trouble with fonts. Let's do this correctly: in the wikimedia MathJaX extension let's make the minor tweaks needed to display the small subset of text mode that we use correctly.

dpvc commented 9 years ago

@cscott, MathJax-node can't use the surrounding text font, as it doesn't know what that is, and doesn't have the metric information for it even if it did. So it can't tell how big any of the symbols are and so can't reserve the proper space for them. (MathJax in the client can use the surrounding text font because it can use the active DOM to measure the size of the characters when it needs them, but MathJax on the server can't.) That is why \mathrm or \mathsf are probably your best choices.

As Peter points out, to do better than that would require substantially more work on your part, and would involve changes that would make the mediawiki-texvc extension no longer appropriate for inclusion in the core MathJax distribution.

physikerwelt commented 9 years ago

@pkra @dpvc would it be possible to overwrite mbox in mathjax and define a new command mbox that calls mathrm?

pkra commented 9 years ago

Sure, anything can be redefined. But wouldn't this wreck things for complex Unicode constructs in mbox/text/etc (cf., #474)?

dpvc commented 9 years ago

As Peter says, you can certainly do that. But note that \mbox does more than just set the font; among other things, it also sets the style to \textstyle, so if used in a superscript or an in-line fraction, the result will be larger than usual. If you want the MathJax output to be consistent with the PDF output, you will need to deal with that, too. (And if you don't want superscripts and in-line fractions to be too large, then you don't want to use \mbox in your PDF output, either.)

Again, as Peter points out, converting \mbox to \mathrm will mess up any other use of \mbox that your page authors have put it to. I haven't looked, but I would not be surprised to see \mbox in regular use in Wikipedia. I suppose you could have your replacement \mbox read the argument and check it against the list of single-control-sequence replacements that you care about, and if one is found do the \mathrm version otherwise do the usual \mbox thing.

Alternatively, you could use a TeX input jax pre-filter to do a regex substitution like /\\mbox\{(\\AA|\\koppa)\\}/ going to {\\textstyle\\mathrm{$1}}, where you list all the control sequences that you want to convert along with the \\AA and \\koppa, separated by |. This would require the regex to be run on every math equation, however, which is less efficient, though it would be easier for you to write.

physikerwelt commented 9 years ago

@cscott @d00rman what do you think? How should we make progress with the broken commands?

cscott commented 9 years ago

Option one: https://github.com/mathjax/MathJax/issues/1236#issuecomment-130845773 (And it's certainly possible to expose the text font to mathjax, with a little work. But that isn't strictly necessary.) The objections to that seem pedantic.

Option two: add a special "mathjax mode" to texvcjs which does https://github.com/mathjax/MathJax/issues/1236#issuecomment-129202199 -- but this has to be optional (ie a command-line option), since including unicode characters in the TeX output will break many TeX installations.

d00rman commented 9 years ago

Option two: add a special "mathjax mode" to texvcjs which does #1236 (comment) -- but this has to be optional (ie a command-line option), since including unicode characters in the TeX output will break many TeX installations.

I'd be in favour of this option, as it seems to me it's easier to maintain. However, we need to keep in mind that we'd need to have a list of symbols we want to translate, not just \AA (just stating the obvious).

cscott commented 9 years ago

I'd reluctantly accept the special "mathjax" mode, but I strongly prefer option one because it's more honest. There's an inherent conflict between MathJax (which goes out of its way not to support math mode) and the existing <math> content on Wikipedia (which in my experiences tends to use text mode extensively, probably more than it should). This is going to be a continual source of rendering issues. We should bite the bullet and admit that WMF will need greater support for text mode than mathjax does, and implement the basic infrastructure needed to do so. The code in https://github.com/mathjax/MathJax/issues/1236#issuecomment-130845773 is a fine first step: it will allow us to implement future text mode hacks and tweaks as necessary when they arise.

The special "mathjax" mode just kicks the can down the road a bit. We solve one immediate issue, but will have made no progress on the larger issue, and have no means to address the next text mode rendering problem we will encounter.

physikerwelt commented 9 years ago

For the reference https://github.com/wikimedia/texvcjs/commit/e333ceb6f90f75236c7cf4b632f18e939ec315fd