mathjax / MathJax

Beautiful and accessible math in all browsers
http://www.mathjax.org/
Apache License 2.0
10.25k stars 1.16k forks source link

MathML converted to Svg doesn't display the umlaut German character with an appropriate font #1425

Closed encephalopathy closed 8 years ago

encephalopathy commented 8 years ago

My Goal: I would like to convert the MathML into svg data and save that to a file so I can use it for authoring purposes later.

The Problem The MathML gets converted from an svg to jax just fine and displays in the browser with an approximate font close enough to font of the surrounding characters. However, if the svg data is saved directly to a file before it is converted to jax and viewed directly in a browser, the svg data displays with an invalid font.

The Details I have added a zip attachment that includes screenshots and the svg data to demonstrate this. The MathML data I used for the conversion is provided below. I have tried both configs, TeX-AMS-MML_SVG-full and TeX-AMS-MML_HTMLorMML-full; both don't seem to work.

<math>
<mstyle mathsize="normal">
    <mrow>
      <mspace width=".2em" />
      <mi>Umhüllungszustand</mi>
      <mspace indentalign="left" linebreak="newline" />
      <mo>+</mo>
      <mi>Punkte</mi>
      <mspace width=".2em" />
      <mi>Überflutungsgebiet</mi>
    </mrow>
  </mstyle>
</math>

Mathjax_Umlaut_Bug.zip

pkra commented 8 years ago

I'm afraid I can't reproduce the problem. In any case, you might want to use the STIX fonts instead of the default fonts which include more glyphs. You might also want to check out mathjax-node.

pkra commented 8 years ago

Oh, nevermind. I saw the screenshots. That's simply expected behavior, I'm afraid. Since the default fonts don't cover those glyphs, MathJax has to resort to text elements and specify a fonts fallback suggestion. It's up to the system to load a font and those are outside MathJax's control.

encephalopathy commented 8 years ago

Woah completely forgot about this while I was sick. I uploaded a project to reproduce this issue. The steps to reproduce this are.

  1. Open up Svg_Test.html
  2. Open up the dev console.
  3. Write svg.Create("<math><mstyle mathsize=\"normal\"><mrow><mspace width=\".2em\" /><mi>Umhüllungszustand</mi><mspace indentalign=\"left\" linebreak=\"newline\" /><mo>+</mo><mi>Punkte</mi><mspace width=\".2em\" /><mi>Überflutungsgebiet</mi></mrow></mstyle></math>"); into the console.
  4. See the text is displayed in the browser correctly but copy the svg data is printed in the console and save that to a file with the .svg extention.
  5. Open that file and notice that umulaut text is different.

SVG_Umulaut_Issue.zip

dpvc commented 8 years ago

MathJax tries to match its output to the surrounding text size, and it scales the the SVG to match the ex-height of that font. This works fine for the paths that make the glyphs for the characters in the MathJax fonts, but when you use a character that isn't in the font, MathJax uses a <text> node in the SVG to try to get a character from one of the fonts installed on your system. But because the SVG is scaled, that would scale the <text> element as well, so MathJax must undo the scaling used for the SVG itself. Since the scaling is in ex-units, the unsealing is based on the actual size of the surrounding font.

In your example, you have set the body font to 20pt, and so MathJax's SVG output is set up to scale the unknown characters based on that size. But when you load the SVG as a raw SVG file, the default font size is not the same. If I remove the font-size styling, then the SVG loads properly as a separate file.

dpvc commented 8 years ago

Note that you can take advantage of the SVG output's useGlobalCache (by setting it to false) to simplify your ParseSvg() function. Adding

SVG: {
  useGlobalCache:false,
  scale: 100
}

to your MathJax configuration lets you simply the ParseSvg() function to be

Svg.prototype.ParseSvg = function ParseSvg() {
    var root = this.buffer;
    var content = null;
    var svgImg = root.getElementsByTagName('svg');
    if (svgImg.length > 0) {
        content = (new XMLSerializer()).serializeToString(svgImg[0]);
        // patch IE serialization issue:
        // http://stackoverflow.com/questions/19610089/unwanted-namespaces-on-svg-markup-when-using-xmlserializer-in-javascript-with-ie
        if (content) {
            content = content.replace('xmlns:NS1="" ', '');
            content = content.replace('NS1:xmlns:xlink="http://www.w3.org/1999/xlink"', 'xmlns:xlink="http://www.w3.org/1999/xlink"');
        }
        console.log(content);
    }
}

Just though you might want to take advantage of that.

oelgoetz commented 7 years ago

I have the same problem as encephalopathy has. Installing the STIX fonts didn't fix the problem. They don't contain glyphs for "äöüÄÖÜß" either. Can anybody suggest a font that includes them ?

pkra commented 7 years ago

Ah, that was my mistake earlier on. Asana-Math, Gyre-Pagella, Gyre-Termes and Latin-Modern have the necessary glyphs.

oelgoetz commented 7 years ago

Thank you pkra for your quick answer. But the svg Processor still does not put glyphs in the output for these caharcters. Example: An "a" is included as the following path in the <defs>-section of the svg: <path id="MJMAIN-61" stroke-width="1" d="M 137 305 T 115 305 T 78 320 T 63 359 Q 63 394 97 421 T 218 448 Q 291 448 336 416 T 396 340 Q 401 326 401 309 T 402 194 V 124 Q 402 76 407 58 T 428 40 Q 443 40 448 56 T 453 109 V 145 H 493 V 106 Q 492 66 490 59 Q 481 29 455 12 T 400 -6 T 353 12 T 329 54 V 58 L 327 55 Q 325 52 322 49 T 314 40 T 302 29 T 287 17 T 269 6 T 247 -2 T 221 -8 T 190 -11 Q 130 -11 82 20 T 34 107 Q 34 128 41 147 T 68 188 T 116 225 T 194 253 T 304 268 H 318 V 290 Q 318 324 312 340 Q 290 411 215 411 Q 197 411 181 410 T 156 406 T 148 403 Q 170 388 170 359 Q 170 334 154 320 Z M 126 106 Q 126 75 150 51 T 209 26 Q 247 26 276 49 T 315 109 Q 317 116 318 175 Q 318 233 317 233 Q 309 233 296 232 T 251 223 T 193 203 T 147 166 T 126 106 Z" /> And later this is referenced to as: <use xmlns:NS2="http://www.w3.org/1999/xlink" NS2:href="#MJMAIN-61" /> In contrast this is what the svg output contains for the sequence "äöüÄÖÜß":

<g transform="translate(1501)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
ä
</text>
</g>
<g transform="translate(1930)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
ö
</text>
</g>
<g transform="translate(2412)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
ü
</text>
</g>
<g transform="translate(2895)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
Ä
</text>
</g>
<g transform="translate(3592)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
Ö
</text>
</g>
<g transform="translate(4290)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
Ü
</text>
</g>
<g transform="translate(4987)">
<text font-family="STIXGeneral,'Arial Unicode MS',serif" stroke="none" transform="scale(60.358) matrix(1 0 0 -1 0 0)">
ß
</text>
</g>

I'm also puzzled by the value of the font-family attribute in the svg: Is it now "STIXGeneral" or "Arial Unicode MS"? Finally: are you really sure that all the fonts you mentioned really contain umlaute? I've downloaded several of them and tried them out but when I take a look at the OTF file the specimen text is always "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern" - based on this it is impossible to determine whether these font sets include umlaute. Could you give me a hint where my mistake is? Did I miss something I have to change in the MathJax settings?

pkra commented 7 years ago

Your code excerpt indicates that MathJax is still using the default fonts (since MJMAIN is mentioend).

You'll need to share a live code sample if you want more help.

The documentation for configuring the SVG output is at http://docs.mathjax.org/en/latest/options/SVG.html.

A demo that (among other things) shows all fonts at once can be found at https://codepen.io/pkra/pen/pjzVej.

dpvc commented 7 years ago

Peter is correct that you have not actually switched the font properly (your configuration may be in error). As he says, we can't help much without seeing the actual page.

But referring to your comment about paths versus glyphs, MathJax's SVG output uses paths for the characters that it knows (the ones in its fonts) in order to avoid issues with downloading web fonts (and detecting when they are available), and font rendering differences between browsers. So the path that you cite below is the correct output for the letter "a" in the MathJax TeX font.

For characters that aren't in its fonts, of course it doesn't have the paths for those, and so uses elements in hopes that you will have a font on your system that has the desired characters. It uses a default set of fonts known to have good Unicode coverage (STIXGeneral and Arial Unicode MS, as you have noticed), but if you don't have these on your computer, they may be found in other fonts on your system.

Once you correctly change from the MathJax TeX font to one of the others, then those characters will also be represented via paths.

Yes, the fonts Peter mentioned include the umlauted versions of the characters you are using. I have verified that when properly configured, SVG does output the proper paths (not elements for these characters).

Davide

On Mar 25, 2017, at 2:31 AM, oelgoetz notifications@github.com wrote:

Thank you pta for your quick answer. But the svg Processor still does not put glyphs in the output for these caharcters. Example: An "a" is included as the following path in the -section of the svg:

Later this is referenced as:

In contrast this is what the svg output contains for the sequence "äöüÄÖÜß":

ä ö ü Ä Ö Ü ß

I'm also puzzled by the font-family-Attribute: Is it now "STIX General" or "Arial Unicode"? Finally: are you really sure that all the fonts you mentioned really contain Umalute? I've downloaded several of them and tried them out but when I take a look at the OTF file the specimen text is always "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern" - based on this it is impossible to determine whether these fonts contain umlaute. Could you give me a hint where my mistake is? Did I miss something I have to change in the MathJax settings?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mathjax/MathJax/issues/1425#issuecomment-289192637, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaajmdwckaxik1FHelU18WTzSoB8yOwks5rpLSqgaJpZM4H4-vT.

oelgoetz commented 7 years ago

Thanks Davide, thanks Peter,

unfortunately I'm not familiar enough with JavaScript and the MathJax package to understand in which file I have to configure the "MathJax.Hub.Config() call". Neither do I really understand, where the "jax array of my configuration" is located as it is mentioned in Peter's link http://docs.mathjax.org/en/latest/options/SVG.html. I admit that I am a complete newbie in MathJax and I did not decide to use MathJax myself - I just use a third party software called "MadCap Flare" which again uses the MathJax package since a recent software update to render equations. In earlier versions they used MathType which rendered MathML into png files and everything worked perfect. Now they use MathJax to render the MathML into SVG files and for me as a user the trouble with the umlaute started. So far I found out where the "path-to-MathJax" is in MadCap Flare's Program directory (it is a package with the MathJax.fileversion="2.6.1") but I have no idea how they finally call the MathJax package to produce the svg from my MathML source in the output.

Davide mentioned "SVG does output the proper paths (not <text> elements for these characters)." Are you referring to <text> or <mtext>? Because the MathML source I use looks like this:

<MadCap:equation xmlns:dsi="http://www.w3.org/1998/Math/MathML">
   <math xmlns="http://www.w3.org/1998/Math/MathML" display="block" id="mathobjid">
      <mrow>
         <mstyle mathsize="400%">
            <mtext mathvariant="normal">abcäöüÄÖÜß</mtext>
         </mstyle>
      </mrow>
   </math>
</MadCap:equation>

This piece of MathML can be located anywhere in my source XML file. The MadCap compiler will compile the xml file into an html-file that refers to an svg file containing the code I described above when the equation is displayed on the web page.

I've already reported the problem to the MadCap support but so far nothing happened for a number of weeks - which is usually a good predictor that nothing else will happen in the future ... This is why I started to search for a solution to the case on my own. I'd be happy if you could give me a hint in which files I have to do the changes that are mentioned in the MathJax doc. As I mentioned already - I don't think I'm able to find it on my own.

pkra commented 7 years ago

@oelgoetz you'll have to follow up with Mac Cap, I'm afraid.

oelgoetz commented 7 years ago

OK ... I'll try my best ... Thanks a lot for your patience and the quick replies anyways.