jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.56k stars 3.38k forks source link

LaTeX reader throwing error on valid LaTeX #2108

Closed pharpend closed 9 years ago

pharpend commented 9 years ago

I believe this is a resurfacing of #1866 .

$ pandoc lysa.ltx -f latex -t epub -o lysa.epub
pandoc: 
Error at "input" (line 409, column 39):
unexpected "{"
expecting letter or lf new-line
\ch{Answers to the exercises}
                                      ^

I get the same thing when trying to convert to HTML or Markdown.

Reproduction instructions

jgm commented 9 years ago

It would help if you gave the version of pandoc you're using, and a complete input sample we could use to reproduce this.

+++ Peter Harpending [Apr 24 15 22:03 ]:

I believe this is a resurfacing of #1866 .

$ pandoc lysa.ltx -f latex -t epub -o lysa.epub
pandoc:
Error at "input" (line 409, column 39):
unexpected "{"
expecting letter or lf new-line
\ch{Answers to the exercises}
                                     ^

I get the same thing when trying to convert to HTML or Markdown.


Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/2108

pharpend commented 9 years ago

Oops, sorry.

Pandoc 1.13.2.1

There is a whole suite of imported files in the git repo, but the root file is:

https://github.com/learnyou/lysa/blob/master/en/book/lysa.ltx

jgm commented 9 years ago

I can't reproduce this with the dev version (which will be 1.14), so I suspect this has already been fixed. You might try compiling pandoc from source.

pharpend commented 9 years ago

I can, albeit with a different error

pandoc lysa.ltx -f latex -t epub -o lysa.epub
pandoc: 
Error at "5-more-sets.ltx" (line 59, column 65):
unexpected "}"
expecting "[", "{", "\\", "=" or digit
\addbibresource{lysa.bib}

Here is the bibliography file: https://github.com/learnyou/lysa/blob/master/en/book/bibliographies/lysa.bib

And here is the latex file about which pandoc is complaining: https://github.com/learnyou/lysa/blob/master/en/book/chapters/5-more-sets.ltx

nkalvi commented 9 years ago

I can also confirm that it compiles fine with 1.14 on Mac & Windows 8.1.

@pharpend I cannot find \addbibresource{lysa.bib} in https://github.com/learnyou/lysa/blob/master/en/book/chapters/5-more-sets.ltx.

nkalvi commented 9 years ago

@pharpend 5-more-sets.ltx by itself compiles fine; the error is coming when it is included as part of lysa.ltx. I'll investigate further later.

pharpend commented 9 years ago

@nkalvi Yes, I should have been more clear.

I'm on Arch Linux, if that matters.

lierdakil commented 9 years ago

I'd like to point out that this is not valid LaTeX, so Pandoc is fine. Error messages print wrong line if error happens in included file though, so we'll have to look into that.

As for problem at hand, consider https://github.com/learnyou/lysa/blob/master/en/book/lysa.ltx#L246

Here, newcommand is created without arguments, but obviously later used with a single argument. This is not how \newcommand is supposed to work. It should be \let, but I think Pandoc does not support \let expressions atm. Easy fix for this is as follows:

\newcommand{\inclgraph}[1]{\includegraphics[width=0.8\textwidth]{#1}}
lierdakil commented 9 years ago

P.S. Bear in mind that #1866 is fixed in master (a.k.a. 1.14), but not in 1.13.2.1, so latter won't parse this regardless.

pharpend commented 9 years ago

@lierdakil Ah stupid me and my Haskell background. I'll fix that, see if this works.

BTW, my file uses a ton of \let declarations already, so if pandoc doesn't support \let, then that's another problem entirely.

lierdakil commented 9 years ago

Well, Pandoc will parse let expressions as raw TeX commands (with --parse-raw option), so writing a simple find-and-replace filter should not be too hard, if you're somewhat familiar with Haskell (math expressions are represented verbatim in AST though, so some string manipulation will be necessary). You probably could also replace most if not all \lets with \newcommands, since \let is TeX anyway and is all but deprecated in LaTeX, afaik.

Bear in mind that Pandoc has limited support for LaTeX, so there may be other problems when converting somewhat complicated LaTeX documents (and since LaTeX is a turing-complete language, full support would be extremely hard to implement or even fathom)

pharpend commented 9 years ago

IIRC, \let uses a somewhat different macro-expansion algorithm than \newcommand. Thus, there are some cases where I actually do want \let instead of \newcommand.

At least, that's what whomever added all the first of those \let declarations told me.

pharpend commented 9 years ago

Okay, I improved my build script quite significantly, so testing this should be easier.

@lierdakil
Even if I fix the \newcommand issue you pointed out, this happens

$ ./lysabuild --clean
$ ./lysabuild --sandbox >& /dev/null
$ cd tmp
$ pandoc lysa.ltx -f latex -t epub -o lysa.epub
pandoc: 
Error at "input" (line 409, column 39):
unexpected "{"
expecting letter or lf new-line
\tableofcontents
                                      ^

It appears to me that the error has nothing --- or very little --- to do with the actual error, seeing as line 409 is

\ch{Answers to the exercises}

and the \tableofcontents line doesn't have trailing whitespace.

lierdakil commented 9 years ago
livid@livid /tmp/test/lysa/en/book/tmp $  ~/work/pandoc/dist/build/pandoc/pandoc -v
pandoc 1.14
Compiled with texmath 0.8.1, highlighting-kate 0.5.14.
...
livid@livid /tmp/test/lysa/en/book/tmp $ git rev-parse HEAD
33624f9e1f779afd1e2a05e180556c2b07286b82
livid@livid /tmp/test/lysa/en/book/tmp $ TEXINPUTS="." ~/work/pandoc/dist/build/pandoc/pandoc -f latex -t epub3 -o /tmp/lysa.epub lysa.ltx 
pandoc: Could not find media `{nq-bijection.png}', skipping...
pandoc: Could not find media `{nq-bijection-naive.png}', skipping...
pandoc: Could not find media `{nq-bijection-nolines.png}', skipping...
pandoc: Could not find media `{nz-bijection-joined.png}', skipping...
pandoc: Could not find media `{nz-bijection.png}', skipping...
pandoc: Could not find media `{x-squared-curve.png}', skipping...
pandoc: Could not find media `{VectorGraph2.png}', skipping...
pandoc: Could not find media `{VectorGraph1.png}', skipping...

Note that I have to define TEXINPUTS only because my system has it set. If TEXINPUTS is not set, pandoc defaults to ".".

lierdakil commented 9 years ago

Do confirm that you're using master pandoc, and not 1.13.2.1. As I said, 1.13.2.1 will not parse this regardless, due to #1866.

pharpend commented 9 years ago

@lierdakil With regard to the files-not-found, that's my fault for not being more verbose. See my edits to the original post.

I was using 1.13.2, apparently. I installed pandoc from master earlier in the day, but some other package I was working on uninstalled that, and instead installed 1.13.2.

headdesk

Okay, now I get this error:

% pandoc lysa.ltx -f latex -t epub3 -o lysa.epub
pandoc: Could not find media `{nq-bijection.png}', skipping...
pandoc: Could not find media `{nq-bijection-naive.png}', skipping...
pandoc: Could not find media `{nq-bijection-nolines.png}', skipping...
pandoc: Could not find media `{nz-bijection-joined.png}', skipping...
pandoc: Could not find media `{nz-bijection.png}', skipping...
pandoc: Could not find media `{x-squared-curve.png}', skipping...
pandoc: Could not find media `{VectorGraph2.png}', skipping...
pandoc: Could not find media `{VectorGraph1.png}', skipping...

Now, that's odd, because all of those files are in the working directory.

Pandoc now makes a very crappy EPub, so I guess that's an improvement!

pharpend commented 9 years ago

Ditto for HTML: it makes a crappy HTML file, which I guess is better than no HTML file.

I have to do some tetris with Hakyll, but I'll see if I can put the HTML file up on my website.

Hakyll doesn't support pandoc-1.14 (I tried), so I have to cabal install hakyll, edit my website.

lierdakil commented 9 years ago

Could not find media errors are happening because pandoc is being somewhat stupid. Root of the problem is

\newcommand{\answergraph}[1]{\begin{center}\inclgraph{{#1}}\end{center}}

Note extra curly braces around #1 -- pandoc parses this verbatim as {filename.png}, which is obviously not present in current directory. Try removing that. You can \usepackage{grffile} if you want filenames with dots/spaces/etc.

pharpend commented 9 years ago

Okay, that seems like another bug with pandoc, should I open a new issue?

pharpend commented 9 years ago

Here's the HTML and EPub files pandoc is generating: http://learnyou.org/lysa-dist/

(see lysa.epub and lysa.ltx)

lierdakil commented 9 years ago

You are free to. That said, I'm unsure on how curly braces in arguments should be handled -- I'm no LaTeX expert.

lierdakil commented 9 years ago

BTW, you'll want to generate HTML with pandoc -s, since it adds head, html, etc.

lierdakil commented 9 years ago

Looking at your filenames, you don't need curly braces at all, at least not with pdflatex. So you can safely remove those for the time being.

\newcommand{\answergraph}[1]{\begin{center}\inclgraph{#1}\end{center}}
nkalvi commented 9 years ago

@pharpend @lierdakil

Looking at http://pandoc.org/README.html#latex-macros

\newcommand{\tuple}[1]{\langle #1 \rangle}

$\tuple{a, b, c}$

The macros needs to be surrounded by $ when used. So surrounding all the macros in 5-more-sets.ltx eliminates all errors and the EPUB is output as expected.

lierdakil commented 9 years ago

@nkalvi that's about Markdown parser. LaTeX parser is different.

pharpend commented 9 years ago

Here is a sample of the HTML pandoc is generating: http://ix.io/i7a . Download that file, and view it in a web browser.

Here: http://learnyou.org/raw/lysa.html

It's not perfect, especially for tables.

The pandoc command I'm using is

        pandoc lysa.ltx -f latex -s --mathjax -t html5 \
            | sed 's,"//,"http://,' \
            | sed 's,http:,https:,' \
            > lysa.html
nkalvi commented 9 years ago

@lierdakil Thanks and welcome back :smile:

The syntax is applicable for LaTex math too, right? (though the macro will not be expanded): http://pandoc.org/README.html#latex-macros

For output formats other than LaTeX, pandoc will parse LaTeX \newcommand and \renewcommand definitions and apply the resulting macros to all LaTeX math. So, for example, the following will work in all output formats, not just LaTeX:

pharpend commented 9 years ago

@nkalvi No no, that file is for Markdown input. We are talking about LaTeX input.

lierdakil commented 9 years ago

@pharpend, well, with tables it's a given, since pandoc doesn't know how to parse tabu environment. You can see list of all supported block-level environments in https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Readers/LaTeX.hs#L1019

You may also want to take a look at inlineEnvironments, blockCommands and inlineCommands (near lines 279 and 413)

As I said, LaTeX support is very limited.

pharpend commented 9 years ago

@lierdakil Thank you.

I'm not particularly willing to spend hours editing my book just to appease pandoc. (Rather, it seems more work needs to be done on Pandoc so it will work with stuff like my book).

I'm not in any rush to convert this to an EPub or anything, it would just be nice.

I would vote for leaving this issue open, and fixing pandoc incrementally until it can parse a rather large and complicated LaTeX document (such as my book) flawlessly.

nkalvi commented 9 years ago

@pharpend @lierdakil I'm sorry about the confusion.

lierdakil commented 9 years ago

@pharpend If you really want better LaTeX support, consider contributing. Since core team is shorthanded as is, LaTeX Reader is not very likely to dramatically improve anytime soon. At least, if you have any ideas on how to parse complicated LaTeX without basically reimplementing texlive in Haskell, please share those. Sorry if this sounds a little harsh/demanding/disrespectful, English is not my native language, so I struggle a bit with wording. No offence meant.

If you want to leave this issue open, please edit title and first post to reflect that this is now a feature request. Otherwise, it might end up closed as resolved/duplicate.

Anyway, good luck with your book.

P.S. BTW, I skimmed over Russian translation, and it seems a little bit... odd. Like it was translated by a non-native speaker. Just an FYI.

jgm commented 9 years ago

+++ Nikolay Yakimov [Apr 25 15 19:40 ]:

Could not find media errors are happening because pandoc is being somewhat stupid. Root of the problem is

\newcommand{\answergraph}[1]{\begin{center}\inclgraph{{#1}}\end{center}}

Note extra curly braces around #1 -- pandoc parses this verbatim as {filename.png}

This would be worth fixing.

jgm commented 9 years ago

+++ Peter Harpending [Apr 25 15 19:41 ]:

Okay, that seems like another bug with pandoc, should I open a new issue?

Yes.

jgm commented 9 years ago

+++ Peter Harpending [Apr 25 15 20:48 ]:

I would vote for leaving this issue open, and fixing pandoc incrementally until it can parse a rather large and complicated LaTeX document (such as my book) flawlessly.

No, it's more useful to have bug reports that are focused more precisely on small issues that can be fixed with a relatively small amount of work.

Note that pandoc will do best on latex documents that don't use lower-level tex primitives like \let, which for the most part it doesn't support, and that stick to basic latex + common packages like amsmath. You can't expect to use arbitrary packages and have pandoc know about them.

I think adding support for \let might not be too hard, so this is a good thing to add an issue for. Ditto the braces in filenames issue. But I think it's better to close big, vague issues.

jgm commented 9 years ago

One more thing on this: if you really want to be able to produce PDF and EPUB versions of your book, then I'd suggest writing it in pandoc's Markdown, which will give you perfect translations to both those formats. You could use pandoc to get a rough import, which would require manual fixing up.

This sacrifices some control, since LaTeX is much more expressive than pandoc's Markdown, but you can use filters to add little bits of expressive power when needed.

pharpend commented 9 years ago

Thank you, @jgm

nkalvi commented 9 years ago

@jgm Though this issue is closed, I just want to understand handling of newcommand in Pandoc's LaTex reader.

In Pandoc's LaTex reader, a macro which expands to something that requires parameter(s), is expected to be defined with the required parameters. Failure to do so will result in parse error (such as in this case initially). LaTex doesn't seem to have this requirement. I use the example in this issue:

\documentclass{article}
\usepackage{graphicx} 
\begin{document}
\thispagestyle{empty}
% Following will work fine in LaTex, but will fail with parse error in Pandoc 
% since a parameter is expected (msg: unexpected "}" expecting "[", "{", "\\", "=" or digit)
\newcommand{\inclgraph}{\includegraphics[width=0.8\textwidth]}
\begin{figure}[ht]
  \centering
  \inclgraph{setminus.png}
  \caption{Set subtraction}
  \label{fig:setminus}
\end{figure}
\end{document}

It this is a requirement, it would be helpful to have it in the documentation.

jgm commented 9 years ago

@nkalvi, this looks like a bug, maybe open a separate issue with this example?

+++ nkalvi [Apr 26 15 17:59 ]:

@jgm Though this issue is closed, I just want to understand handling of newcommand in Pandoc's LaTex reader.

In Pandoc's LaTex reader, a macro which expands to something that requires parameter(s), is expected to be defined with the required parameters. Failure to do so will result in parse error (such as in this case initially). LaTex doesn't seem to have this requirement. I use the example in this issue:

\documentclass{article}
\usepackage{graphicx}
\begin{document}
\thispagestyle{empty}
% Following will work fine in LaTex, but will fail with parse error in Pandoc
% since a parameter is expected (msg: unexpected "}" expecting "[", "{", "\\", "=" or digit)
\newcommand{\inclgraph}{\includegraphics[width=0.8\textwidth]}
\begin{figure}[ht]
 \centering
 \inclgraph{setminus.png}
 \caption{Set subtraction}
 \label{fig:setminus}
\end{figure}
\end{document}

It this is a requirement, it would be helpful to have it in the documentation.


Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/2108#issuecomment-96454701

nkalvi commented 9 years ago

Opened one. Hope it is properly written.