mathjax / MathJax-node

MathJax for Node
Apache License 2.0
614 stars 96 forks source link

Problem parsing Tex #440

Closed aarjan closed 5 years ago

aarjan commented 5 years ago

I have a simple latex forumula which can be parsed through browser Mathjax library but gives out error in the node implementation.

const mathjax = require('mathjax-node');
mathjax.start();
mathjax.config({
  jax: ['input/TeX', 'output/CommonHTML'],
  extensions: [
    'tex2jax.js',
  ],
  TeX: {
    extensions: ['AMSmath.js', 'AMSsymbols.js', 'noErrors.js', 'noUndefined.js'],
  },
  displayAlign: 'left',
  tex2jax: {
    inlineMath: [['$', '$'], ['\\(', '\\)']],
    processEscapes: true,
  },
});
const input = `\[ A = \begin{bmatrix} 1 & 3 & 5 \\ 2 & 5 & 4 \\ -2 & 3 & -1 \end{bmatrix} \]`;
mathjax.typeset({ math: input, html: true, speakText: false, format: ['TeX'] }, data => console.log(data));

It returns error parsing ampersand (&): TeX parse error: Misplaced &

pkra commented 5 years ago

You have to be careful with backslashes in template literals, too. Cf. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#Raw_strings

pkra commented 5 years ago

The template literal will expand to

[ A = \u0008egin{bmatrix} 1 & 3 & 5 \\ 2 & 5 & 4 \\ -2 & 3 & -1 end{bmatrix} ]

aarjan commented 5 years ago

@pkra Used String.raw`\[ A = \begin{bmatrix} 1 & 3 & 5 \\ 2 & 5 & 4 \\ -2 & 3 & -1 \end{bmatrix} \]` But didn't work.

dpvc commented 5 years ago

There are a number of problems with your script. The immediate problem is the TeX string, which you have tried to address with the String.raw construct above. That does take care of the backslashes properly, but you have included \[ and \] in the string, which are not needed. Since you are passing the actual math string to mathjax-node, there is no need for math delimiters. These are probably produce the error Undefined control sequence \[.

The other way to handle the string, of course, is to double all the backslashes, as in

const input = 'A = \\begin{bmatrix} 1 & 3 & 5 \\\\ 2 & 5 & 4 \\\\ -2 & 3 & -1 \\end{bmatrix}';

Note that you specify if the math is display or not using the format property of the mathjax.typeset() call by setting it to TeX fior display-style TeX and inline-TeX for in-line style TeX.

A second issue is that your format is incorrect. it should not be an array, but just a string, so

format: 'TeX'

not

format: ['TeX']

as you current show it.

Third, you should configure mathjax-node before starting it, so your mathjax.start() should come after the mathjax.config() call. If you call mathjax.config() after MathJax is started, most of the configuration will have no effect.

Finally, mathjax.config() is configuring mathjax-node, not MathJax itself, so most of your configuration is ignored because the properties are not mathjax-node configuration properties. The only one that is an actual mathjax-node configuration property is extensions, but in mathjax-node, this is supposed to be a comma-separated string, not an array, so it is not having the desired effect in any case.

If you want to set the configuration for the copy of MathJax that is used by mathjax-node, you can use a MathJax property in the mathjax.config() call. For example:

mathjax.config({
    MathJax: {
        displayAlign: 'left'
    }
});

Not, however, that most of the properties you are setting are inappropriate for mathjax-node in any case. First, you should not specify a jax property, since mathjax-node already loads all the jax that it can use (and that includes all the ones you have requested anyway). Second, since mathjax-node doesn't need to search a page for math (since you are passing it the math strings directly), there is no need for any of tex2jax, mml2jax, or asciimath2jax, as these will never be used. Similarly, mathjax-node has no menu or zoom interaction (there is no browser involved to perform the interaction), so you don't need MathMenu or MathZoom. Since mathjax-node does not return the container element that would include the assistive MathML, there is no use in loading AssistiveMML, either. And because there is no menu, a11y/accessibility-menu is also meaningless here.

The upshot is, your entire extensions array is unnecessary, and should be dropped.

As for the TeX configuration, mathjax-node already includes the AMSmath and AMSsymbols extensions. You could request the noErrors and noUndefined extensions, though I would not recommend using noErrors. In that case, you would use

     extensions: 'tex/noErrors.js, tex/noUndefined.js',

instead.

The displayAlign property needs to be in the MathJax block, not at the top level, as I illustrate above.

Finally, the tex2jax configuration is not needed, since tex2jax is not used in mathjax-node. (As I mentioned, you give mathjax-node the TeX strings directly, so there is no need for mathjax-node to look for delimiters.)

So most of what you have is superfluous. A reduced script that does what you want is

const mathjax = require('mathjax-node');
mathjax.config({
  extensions: 'TeX/noUndefined.js, AssistiveMML.js',
  MathJax: {displayAlign: 'left'}
});
mathjax.start();
const input = 'A = \\begin{bmatrix} 1 & 3 & 5 \\\\ 2 & 5 & 4 \\\\ -2 & 3 & -1 \\end{bmatrix}';
mathjax.typeset({ math: input, html: true, speakText: false, format: 'TeX' }, data => console.log(data));

and the mathjax.start(); is not strictly necessary, as mathjax.typeset() will perform the start automatically, if needed.

aarjan commented 5 years ago

@dpvc @pkra I am very much thankful for your elaborate description. I am sorry that, i tried to jumble up every combination in without going thoroughly through the docs.
One last thing i wanted to ask. The browser version renders this text perfectly like in here https://www.mathjax.org/#demo

<p><strong>Diagonal matrix:</strong> The matrix whose elements except those in leading/principle/main diagonal are all zero. It is both upper and lower triangular matrix.</p>
<p>Example:</p>
<p>\[ D = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & -3 \end{pmatrix} \]</p>
<p><strong>Scalar matrix:</strong> The diagonal matrix with all non-zero elements equal is scalar matrix. A scalar matrix with all non-zero elements 1 is called <strong>identity matrix</strong>. (multiplicative identity of matrices). All other scalar matrices are 'scalar' times the identity, hence the name.</p>
<p>Example:</p>
<p>\[ F = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{pmatrix} \]</p>
<p>is an scalar matrix, which can be written as:</p>
<p>\[ F = 2 \times \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} = 2I_3 \]</p>
<p><br />where, \( I_3 \) denotes identity matrix of order 3 (3 x 3).</p>

When I tried escaping all the backslashes and set the output as html the output expression are not rendered correctly, whereas as in svg they are correct. But, again all the normal text are concatenated without spaces. Do I need to escape the spaces too ? I also added this config

tex2jax: {
    inlineMath: [['\\(', '\\)']],
    processEscapes: true,
  }

which is needed i suppose for searching inline math expressions, but it didn't work either.

P.S. I wanted to send such math expression directly as html if possible or as png/svg to a Flutter mobile application which are displayed then in WebViews, since there is no support for latex plugin in it.

Thank you again for your warm support.

aarjan commented 5 years ago

Finally got it working through https://github.com/pkra/mathjax-node-page. Thank you for your support!

aarjan commented 5 years ago

@dpvc The mathjax-node-page lib works well. But, i cannot render the same using this lib. Like, this input \\[ A = \\begin{pmatrix} a_{11} & a_{12} \\\\ a_{21} &a _{22} \\end{pmatrix} \\] It has trouble parsing, the escaped brackets like, \\( or \\[ but works well with mathjax-node-page. Even if i remove the brackets, as you have pointed in your example above, it doesn't render well.

The problem is that mathjax-node-page package returns redundant information even for small formulas.

dpvc commented 5 years ago

It is not clear what "it doesn't render well" means. Can you be more specific about the issue you are seeing? Also, I'm not sure what "redundant information" you are talking about in mathjax-node-page; can you clarify what that is?

aarjan commented 5 years ago

@dpvc I was trying minimize or compress the html, but found out, svg is better choice than html for compressed data.

I don't understand, why i get Tex parse error: Misplaced & with this input <p>\\( &gt; { h \\over {2\\pi}}\\)</p> using mathjax-node but works well for mathjax-node-page. I tried it with your given configuration, but it didn't work.

Also, I wanted to convert it to png, but is it possible with mathjax-node-page using this https://github.com/pkra/mathjax-node-svg2png ? But, I am happy if i am able to convert all the forumulas in my html data, and replace them correspondingly with png encoded data. For example:

<p style="text-align: justify;">A matrix is an rectangular array of numbers. The numbers may be real, complex, constants or variables. Each 'number' is called an 'element' of the matrix. It is customary to denote a matrix by capital letter of english alphabet and its elements by subscripted small letter of same alpjabet. The subscripts represent the 'location' of the element in the matrix. And the whole arrangement is enclosed in brackets. (or parenthesis). Here is an example:</p>
<p style="text-align: justify;">\\[ A = \\begin{pmatrix} a_{11} & a_{12} \\\\ a_{21} &a _{22} \\end{pmatrix} \\]</p>
<p style="text-align: justify;">Such matrix is written more compactly as \\( A = \\{a_{ij} \\} \\), and \\(i \\in \\{1,2,...m\\}, j \\in \\{1,2,...,n\\} \\) being understood, \\(m\\) being the number of rows and \\(n\\) being number of columns.</p>
<p style="text-align: justify;">Row of a matrix: A row of numbers means 'horizontal' linear arrangement of numbers. The first index of subscript denotes, which row the element belongs to.</p>
<p style="text-align: justify;">Column of a matrix: A column of numbers means 'vertical' linear arrangement of numbers.The second index of subscript denotes, which column the element belongs to.</p>
<p style="text-align: justify;">A matrix can be thought of a number of rows or columns. If there are 'm' rows in a matrix, each with 'n' elements, this is equivalent to saying, the matrix has 'n' columns with 'm' elements each. This is simply concluded as:</p>
<p style="text-align: justify;"><em>The matrix has 'm' rows and 'n' columns. Its <strong>order</strong></em><strong> </strong><em>is \\(' m \\times n' \\). </em></p>
<p style="text-align: justify;">Consider a matrix,</p>
<p style="text-align: justify;">\\[ A = \\begin{pmatrix} 1& 2 \\\\ 3 & -2 \\\\ 2 & 3  \\end{pmatrix} \\]</p>
<p style="text-align: justify;">The number of rows is 3, number of columns is 2. The matrix has order \\(3 \\times 2 \\), or \\(A\\) is a \\( 3 \\times 2\\) matrix.</p>
<p style="text-align: justify;">Also, \\( a_{11} = 1, a_{12} = 2, a_{21} = 3, a_{22} = -2, a_{31}=2, a_{32} = 3 \\)</p>
<p style="text-align: justify;">It is customary to denote \\(i^{th}\\) row of matrix by \\(R_i \\) and \\(j^{th}\\) column by \\(C_j\\). Any row of a matrix may be changed by adding a linear combination of other rows, called <strong>Elementary Row Operations.</strong> Of course, these change the original matrix, but are of great importance in solving simultaneous equations, and evaluating determinants. </p>
<p style="text-align: justify;">If all the elements of second row are changed by adding twice the elements of first row and subtracting thrice the elements of third row (of same column), we write: \\( R_2 \\to R_2 + 2R_1 - 3R_3 \\).</p>
<p style="text-align: justify;">Exapmle:</p>
<p style="text-align: justify;">\\[ A = \\begin{bmatrix} 1 & 3 & 5 \\\\ 2 & 5 & 4 \\\\ -2 & 3 & -1 \\end{bmatrix} \\]</p>
<p style="text-align: justify;">Then operating, \\( C_1 \\to C_1 - 2C_2  + 5 C_3 \\), the new matrix is:</p>
<p style="text-align: justify;">\\[B = \\begin{bmatrix} 1 - 2 \\times 3 + 5 \\times 5 & 3 & 5 \\\\ 2 - 2 \\times 5 + 5 \\times 4 & 5 & 4 \\\\ -2 - 2 \\times 3 + 5 \\times (-1) & 3 & -1 \\end{bmatrix} \\]</p>
<p style="text-align: justify;">\\[B = \\begin{bmatrix} 20 & 3 & 5 \\\\ 12 & 5 & 4 \\\\ -13 & 3 & -1 \\end{bmatrix} \\]</p>

P.S. I converting whole data into SVG with mathjax-node-page which are displayed in a Flutter mobile application; the problem is that, it doesn't support SVG, so need to convert them into PNGs.

dpvc commented 5 years ago

I don't understand, why i get Tex parse error: Misplaced & with this input <p>\\( &gt; { h \\over {2\\pi}}\\)</p> using mathjax-node but works well for mathjax-node-page.

This is because mathjax-node takes a TeX string, not an HTML string, as its input. The &gt; is an HTML entity for >, but it is not valid TeX; the browser would turn &gt; into > as it processes the HTML page, which happens before MathJax runs. If you are extracting the math from an unprocessed HTML page, you will need to translate the entities yourself before passing them to mathjax-node.

The reason it works in mathjax-node-page is because mathjax-node-page uses a virtual DOM to process the page as HTML (just as it would be in a browser), and so &gt; is converted to > before it is processed by MathJax within that virtual DOM.

aarjan commented 5 years ago

@dpvc Thank you very much. I got it working.

chaosforfun commented 4 years ago

I don't understand, why i get Tex parse error: Misplaced & with this input <p>\\( &gt; { h \\over {2\\pi}}\\)</p> using mathjax-node but works well for mathjax-node-page.

This is because mathjax-node takes a TeX string, not an HTML string, as its input. The &gt; is an HTML entity for >, but it is not valid TeX; the browser would turn &gt; into > as it processes the HTML page, which happens before MathJax runs. If you are extracting the math from an unprocessed HTML page, you will need to translate the entities yourself before passing them to mathjax-node.

The reason it works in mathjax-node-page is because mathjax-node-page uses a virtual DOM to process the page as HTML (just as it would be in a browser), and so &gt; is converted to > before it is processed by MathJax within that virtual DOM.

Thank you , this help me much.

smalltimer commented 3 years ago

Hi! Thanks @dpvc , your responses have been very helpful in diagnosing and resolving the issues I am having with my gatsbyjs website... almost. I am not able to make the $\ref{}$ tags work. Is this a known limitation of mathjax-node ? I am using the gatsby ssr plugin (link, repo), or is it something to do with the use of mathjax-node in the plugin?

Cheers!

dpvc commented 3 years ago

In order to handle references, MathJax maintains a list of the \tag{} values that are associated to \label{} values on the given page. But mathjax-node doesn't have a "page" to work with, it just gets individual expressions, so that list is not in place, and all references will be undefined. Mathjax-node does allow you to maintain a page "state" by passing a state parameter that it will use to keep the list of labels (and other similar data), but your plugin does not have that.

You would have to modify the plug in to use the state parameter, and even with that, you would only be able to refer to equations that had already been processed (MathJax itself sets aside equations that have undefined references and reprocesses them at the end in order to provide for forward references).

It turns out that the plugin handles all the in-line math first, then all the display math, but since most \ref{} calls are found in in-line math, that means even if they are "backward" links in the document, the display equations to which they link won't have been processed yet, so they will be undefined when the plugin processes them. So in addition to using the state variable, you should also switch the order of processing so display equations are done first, and then inline ones. That will give you the best chance of having your references be defined. (Using \ref within a displayed equation for a forward link would then be the only situation that wouldn't work.)

So modifying the plugin's index.js file to do something like

module.exports = async ({ markdownAST }, pluginOptions = {}) => {
  const nodes = [];
  const state = {};

  // for some reason this doesn't work:
  /*mjAPI.config({
    MathJax: pluginOptions
  });
  mjAPI.start();*/

  visit(markdownAST, `math`, node => {
    nodes.push({ node, format: 'TeX' })
  });

  visit(markdownAST, `inlineMath`, node => {
    nodes.push({ node, format: 'inline-TeX' })
  });

  for (const { node, format } of nodes) {
    await new Promise((resolve, reject) => {
      mjAPI.typeset({
        math: node.value,
        format: format,
        html: true,
        state: state
      }, data => {
        if (!data.errors) {
          node.type = `html`;
          node.value = data.html;
          resolve();
        } else
          reject(data.errors);
      });
    });
  }
}

might do the trick (this is untested, but should point you in the right direction).

smalltimer commented 3 years ago

Hi @dpvc , you were right on both counts. 1. switching the order of the inline and display nodes 2. using the state. Now the plugin works as it should. A huge huge thanks for your time and your effort in figuring all of this out - I really appreciate it. I don't think I would have managed to do this as I am not a node programmer and simply want to make a website where I can write my science and thoughts down :)

But to give back to the community I will try and fork the original plugin (the maintainer has not responded to the issue I raised for a while now) and make a v2 or something - giving credit to both the original developer, and and to your post here. Of course, I have no idea how to make an 'official' plugin for gatsby so it will take some time :D

Thanks once again for sharing the results of your time and effort.

Cheers!

dpvc commented 3 years ago

Thanks for letting us know that it worked. Glad you were able to get your site running the way you wanted to. And good luck with the plugin.

You could try to make a pull request to the original repository from your modified fork. Perhaps the author would be willing to incorporate your changes.

smalltimer commented 3 years ago

Yup, that's the plan - I will try to push to the current repo before trying to make a new plugin. I think I will not have time to maintain the code, so I'd rather the original author took it under their wing.