plastex / plastex

plasTeX is a Python package that processes LaTeX documents into an XML-DOM-like object which can be used to generate various types of output.
Other
293 stars 78 forks source link

How to get "inner tex"? #48

Closed kskyten closed 7 years ago

kskyten commented 7 years ago

I'm trying to write a renderer for MathBook XML and I need to be able to get the "inner tex" from the latex source. So for example if I have $a = 1$ or \begin{equation}a = 1\end{equation} I would get a = 1 from both of them. I also need to be able to split multiline math into individual lines such that I can render

\begin{eqnarray}
a = 1\\
b = 2\\
c = 3
\end{eqnarray}

as

<md>
<mrow>a = 1</mrow>
<mrow>b = 2</mrow>
<mrow>c = 3</mrow>
</md>

How do I accomplish this? Using node.source gets the whole snippet including the environment tags. Currently I use this and regexp to clean it up. This seems like a pretty ugly solution considering plasTeX can parse latex. Using node.textContent doesn't seem preserve the latex in valid format. Joining the child nodes with ''.join(node.childNodes) comes close but doesn't work in all cases.

One other thing I noticed is that node.source doesn't preserve the exact source but instead adds whitespace. For example the source from $X_i$ is $X_ i$.

kesmit13 commented 7 years ago

To get the source of just what is inside of an environment, you can use the childrenSource property rather than source.

As for eqnarray, you can follow the pattern in the Renderers/XHTML/Math.zpts file for that macro. You'll see that it consists of rows and cells.

As far as the source containing extra spaces, that is because plasTeX doesn't store the actual source. It has to tokenize, parse, and expand all macros then reconstitute the source from that. So it isn't always a perfect match, but it should always compile and render in LaTeX the same way.