mathjax / MathJax-node

MathJax for Node
Apache License 2.0
616 stars 97 forks source link

Specifying a fallback font? #145

Closed Pomax closed 8 years ago

Pomax commented 8 years ago

If I run the following LaTeX through mathjax-node, I get an error "SVG - Unknown character: U+E9 in MathJax_Main,MathJax_Size1,MathJax_AMS":

Bézier(n,t) = \sum_{i=0}^{n}
\underset{binomial\ term}{\underbrace{\binom{n}{i}}}
\cdot\
\underset{polynomial\ term}{\underbrace{(1-t)^{n-i} \cdot t^{i}}}
\cdot\
\underset{weight}{\underbrace{w_i}}

The U+E9 code is that "é", and the resulting SVG has a plain text node that looks fairly different from the SVG around it:

...
<text font-family="STIXGeneral,'Arial Unicode MS',serif" font-style="" font-weight="" stroke="none" style="font-family: monospace" transform="scale(71.759) matrix(1 0 0 -1 0 0)">é</text>
...

Is there a way to manually specify the fallback font stack? (also, font-style and font-weight are both empty, which feels like it might be wrong?)

dpvc commented 8 years ago

MathJax must know the bounding box for each character that it typesets in order to be able to leave the correct room for the character in the formula. MathJax only knows those bounding boxes for the glyphs that are actually in its fonts. In the case of an unknown character, like the accented character that you used, MathJax does not have the required information to do the typesetting.

In MathJax in a browser, unknown characters are handled by asking for the character from any font that has it (using a likely list as suggestions), and then trying to measure the width of the character by hand using the DOM. But MathJax-node doesn't have a true DOM; it uses jsdom, which does not provide the element sizes that MathJax needs (it does not implement offsetHeight and offsetWidth or getClientBoundingRect() or the SVG getBBox(), for example). So MathJax-node doesn't know the bounding box and can't determine it empirically.

So as a last resort in these cases, MathJax-node uses the monospace font and assumes the characters are all the same width. This is sub-optimal, certainly, but it does allow MathJax-node to typeset the characters with some hope of success.

For your particular case, there are two options: you can ask MathJax-node to use one of its fonts that has larger coverage, or you can produce the character in another way. The STIX font has the desired character, but not everyone likes the way it looks. You might try Latin-Modern, which (I believe) does include the accented letters but is based on the TeX Computer Modern fonts, so will look more familiar. This font has other limitations, but it should be pretty good if you like the TeX look rather than the STIX look.

Alternatively, you can use \operatorname{B\acute{e}tier} to obtain the identifier for the function (and I would recommend using \operatorname{} in either case, to get the spacing and font-style correct).

Pomax commented 8 years ago

Hm, as a long time users of Xe(La)TeX, and thus being used to writing my LaTeX with full support of every language under the rainbow, I'm in the other camp: TeX is for writing beautiful maths, and that includes being able to use beautiful letters rather than trying to fake them using imprecise calls like \acute{e} (while I can, of course, write a trivial preprocessing step that replaces accented letters with that form, I'd rather get the typeface designer's much better thought out version).

Given that this runs offline/server-side, where there is no pressure on using fonts that are as small as possible, is there a way to tell mathjax-node to use a custom font as its main typesetting font instead, like the STIX or XITS fonts?

I am somewhat assuming it loads glyph metrics based on the actual font it's provided here (like XeTeX), rather than on precomputed metrics files (like original TeX's MF files); if it needs the latter, then I guess the question would also be "and how can one pregenerate the metrics file as well?"

(I'm sure it's for speed reasons, but the STIXMathJax fonts in the MathJax repository are extermely reduced compared to the official STIX fonts, which are 150~200kb each and have full Latin and Latin Extended-A support)

dpvc commented 8 years ago

As far as I know, there is no "other camp". I gave you two possible approaches: changing the font to one that includes the characters you need (like the STIX font), or making the character within the default font using the glyphs available. I wasn't recommending either choice.

is there a way to tell mathjax-node to use a custom font as its main typesetting font instead, like the STIX or XITS fonts?

It was, in fact, my suggestion that you do just that. I recommended either the STIX or the Latin-Modern font. I did neglect to tell you how to do that, however. Use the --font option, as in --font STIX or --font Latin-Modern. Both of these include good coverage of the Latin-1 and Latin Extended-A ranges. The STIX font is taken from the original STIX distribution (STIXGeneral, etc), and I believe includes all the glyphs in those original fonts. Note that the STIX-Web fonts are broken up into a large number of smaller fonts that are called in when needed.

I am somewhat assuming it loads glyph metrics based on the actual font it's provided here (like XeTeX), rather than on precomputed metrics files

Unfortunately, your assumption is incorrect. Since MathJax-node is javascript, and javascript doesn't have support for obtaining font metrics, MathJax has to rely on pre-built metric files. Those are built from the original font files, but the metric need to be in a javascript form in order to be used by MathJax.

then I guess the question would also be "and how can one pregenerate the metrics file as well?"

The tools we use for this are in the mathjax/MathJax-dev repository, but they are not really in shape for others to use easily, and there is still some hand work that needs to be done in most cases to build the master font data file.

Beter font support is certainly on our to-do list, but we have limited resources and can't cover everything that we would like to.

Pomax commented 8 years ago

Thanks! Yeah, I was missing the bit that explained the how of changing the font, with that I should be good to go! Does the CLI --font option correspond to an options object {..., font: "STIX, ... } format?

Also, allow me to (as someone who's written multiple OpenType parsers and JS-based generators, writing about OpenType a lot) refute the "javascript doesn't have support for obtaining font metrics" claim: both OpenType.js (excellent for client-side) and Fontkit (excellent for server-side) do bang-up jobs at being full-fledged font loaders for extracting metrics (and many other things). I've used the latter to do glyph extraction for huge fonts (e.g. 12MB Japanese calligraphy fonts) without any issues.

dpvc commented 8 years ago

Use

mjAPI.config({MathJax: {SVG: {font: "STIX"}}});

to set the font if you are writing your own tool.

Yes, we looked at OpenType.js when it first came out. Very clever and nicely done. I almost said "no native font support" because of this, but didn't want to cloud the waters. Guess I should have. Of course, MathJax predates OpenType.js, so it wasn't available when MathJax was originally developed. Also, as I recall, it only works with web-based fonts, which is not the use-case that we needed, since we already had the required data for our web-based fonts. What we needed was information about the local fonts on the user's system. Also, I think it uses XmlHttpRequest to get the fonts, and that has security controls that make it more difficult to load the files from sites other than the one where the main page is loaded, which would have made the CDN more difficult. We also still had to support IE6, IE7, and IE8 at the time. Finally, it requires loading the font twice (once through XmlHttpRequest, and once as a normal font). Although there are ways to address the last three problems, we did not pursue it further at the time.

I have not really looked closely at Fontkit, but it does look like a very useful tool for node-based MathJax. Right now, we are not doing a separate code-base for MathJax-node, other than some glue to hook it into node, so using a separate font system on MathJax-node is beyond the current approach.

This coming year, we will be doing a major rewrite of MathJax for MathJax 3.0, and these packages will certainly be something to consider for that. The server-side end of things is going to be part of the design from the outset, rather than a second-class citizen.

Pomax commented 8 years ago

That's great! I'm looking forward to MathJax 3 =)

I seem to have an issue with the SVG option, though; I use the following code to use the STIX font, but it's throwing terminal errors rather than warnings now:

var API = require("mathjax-node/lib/mj-single");

// Set up the MathJax processor
API.config({
  MathJax: {
    SVG: {
      font: "STIX"
    },
    TeX: {
      extensions: [
        "AMSmath.js",
        "AMSsymbols.js",
        "autoload-all.js",
        "color.js"
      ]
    }
  }
});
API.start();

// convert the passed LaTeX to SVG form
API.typeset({
  math: "Bézier",
  format: "TeX",
  svg: true
}, function (data) {
  if (data.errors) {
    console.error(data.errors);
  }
  console.log(data.svg);
});

When this hits the LaTeX with é it no longer claims U+E9 is a problem, but instead reports:

Math Processing Error: Cannot read property 'VARIANT' of undefined
Math Processing Error: Cannot read property 'VARIANT' of undefined
Math Processing Error: Cannot read property 'VARIANT' of undefined
Math Processing Error: Cannot read property 'VARIANT' of undefined
Error: Cannot read property 'setAttribute' of undefined

and then doesn't yield SVG output. If I use the Latin-Modern font instead, there do not appear to be any errors and the output looks correct from what I can tell.

dpvc commented 8 years ago

Sorry, my fault. Switch "STIX" to "STIX-Web" and that should take care of it.

Pomax commented 8 years ago

Thanks so much!