pkubowicz / opendetex

Improved version of Detex - tool for extracting plain text from TeX and LaTeX sources
Other
236 stars 34 forks source link

Outputs a largely empty file #36

Closed mashaalmemon closed 6 years ago

mashaalmemon commented 7 years ago

Hi There,

I am running a latex file through opendetex but getting a largely empty file as a result (I saw largely because it seems to have newlines in it but that is all).

I am running it on Max OSX with opendetex installed via homebrew. Appears to have installed v2.8.1.

Running the simple plain vanilla command like so:

detex input_astronomy_3.tex > input_astronomy_3.txt

Input file and resultant output files attached. Any help with this would be much appreciated.

input_astronomy_3.tex.zip input_astronomy_3.txt

mashaalmemon commented 7 years ago

Note I also tried installing from the source in this repo. Having exactly the same result.

williardx commented 7 years ago

Also experiencing the same problem on Mac OS X.

diego898 commented 7 years ago

any update on this?

JoostHuizinga commented 6 years ago

I can confirm that a latex file that includes any instances of \usepackage or \newcommand (and probably all recognized commands) in the preamble will cause a mostly empty file to be written. Uncommented text that occurs before the \usepackage or \newcommand statements will be printed, but everything after it will not.

This occurs on Mac OS X 10.12.6 with Apple LLVM version 9.0.0. I installed opendetex by cloning revision 08b045a1feea91d132bf3b228ba9ade969726f88 and calling make.

JoostHuizinga commented 6 years ago

Some more experimentation reveals that the error only seems to occur when there are one or more characters (including spaces and tabs!) after the first argument of \usepackage or \newcommand.

Thus, this works (provided there are no spaces after the last bracket):

\usepackage{babel}

But this does not:

\usepackage{babel}%

Also, this works:

\newcommand{\test}
{test}

But this does not:

\newcommand{\test}{test}
JoostHuizinga commented 6 years ago

After even more experimentation, I suspect that lex isn't properly handling the following in detex.l: <LaMacro>"}""\n"{0,1}. I think this line is supposed to indicate that a Latex macro ends with brace, some amount of whitespace, and then a newline. However, lex seems to be interpreting it as a brace, immediately followed by a new line. Hence, if there is any character after the brace, the latex macro is never finished according to lex and, as a result, all content in the document is ignored.

However, flex does seem to work.

So, if you have this issue, you could try the following:

pkubowicz commented 6 years ago

Should be fixed when 2.8.3 is released.