jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.52k stars 3.37k forks source link

RST Reader ignores isolated inline math in grid table cell #5708

Closed ociule closed 5 years ago

ociule commented 5 years ago

If we put only an inline math tag as the only content of a grid table cell, it's not seen as inline math in most situations (but it is in others, see below!).

Adding any text before the inline math ~or dividing it into two math inline tags~ fixes the issue.

Seen on pandoc 2.7.1 on windows.

Result:

test

Here's the rst:


This inline math does work here (which is disturbing!):

+--------+-----------------------------+
| Test   | Test                        |
+========+=============================+
| Test   | :math:`SMYS+2 \cdot \sigma` |
+--------+-----------------------------+

But it's broken here:

+-------------------+------------------------------+
| Distribution type | Mean                         |
+===================+==============================+
| normal            | :math:`SMYS+2 \cdot \sigma`  |
+-------------------+------------------------------+

Also broken without the header:

+--------+-----------------------------+
| normal | :math:`SMYS+2 \cdot \sigma` |
+--------+-----------------------------+

But works with a different text in the first col:

+---+-----------------------------+
| a | :math:`SMYS+2 \cdot \sigma` |
+---+-----------------------------+

Also broken with any other latex inside that math:

+--------+-----------------------------+
| normal | :math:`1234`                |
+--------+-----------------------------+

Works with simple table:

==================== ===================
 normal                :math:`1234`
==================== ===================

The math works anywhere else: :math:`SMYS+2 \cdot \sigma`

.. math::
  SMYS+2 \cdot \sigma

Current workaround that works in table cells is to divide it into two inline math pieces:

:math:`SMYS` :math:`+2 \cdot \sigma`

To test, simply copy the sample rst code in the raw code block above to test.rst and run:

$ pandoc test.rst -o test.html

Points to a bug in the RST reader. The whitespace inside the cell content seems significant, depending on that the parsing can work sometimes.

ociule commented 5 years ago

The correct workaround is:

*SMYS*:math:`+2 \cdot \sigma`

Simply dividing the inline math content into two will not help, the first one is ignored.

mb21 commented 5 years ago

Happens only if the first cell has exactly 6 characters. This fails:

+--------+----------------+
| 123456 | :math:`a + b`  |
+--------+----------------+

while this suceeds:

+--------+----------------+
| 12345  | :math:`a + b`  |
+--------+----------------+
jgm commented 5 years ago

It is using the default role, title-reference, in the first case, instead of math. It's hard to see why. The raw contents of the cell are the same in both cases (verified with trace): ":math:`a + b`\n\n".

This is really mysterious! parseFromString should return the same output with the same input. Unless something is different in parser state?

jgm commented 5 years ago

Aha. I think it must be due to the use of atStart in unmarkedInterpretedText. (This is to prevent backticks in the middle of a word from being parsed as interpreted text.)

atStart is a bit of a kludge; it works like this:

-- succeeds only if we're not right after a str (ie. in middle of word)
atStart :: Monad m => RSTParser m a -> RSTParser m a
atStart p = do
  pos <- getPosition
  st <- getState
  -- single quote start can't be right after str
  guard $ stateLastStrPos st /= Just pos
  p

So I think what's happening is that stateLastStrPos is not being reset when we do parseFromString, so it's affected by the result of parsing the other cell!

jgm commented 5 years ago

We're using the generic gridTable defined in T.P.Parsing, and it can't reset stateLastStrPos because it's agnostic about the state type. Hm.

mb21 commented 5 years ago

ha, that was odd 😅 good old state ;)

ociule commented 5 years ago

Thanks for the great work guys! @mb21 good catch on it being broken with text of length six.