pkubowicz / opendetex

Improved version of Detex - tool for extracting plain text from TeX and LaTeX sources
Other
236 stars 35 forks source link

handle unterminated \verb #82

Closed kberry closed 10 months ago

kberry commented 2 years ago

If \verb is terminated by eof, the program doesn't notice, and keeps reading past eof, forever. The result is a crash (on x86_64-linux, compiled by gcc) in __memmove_avx_unaligned_erms after tons of craziness.

This patch rewrites the \verb handling to notice eof, either in the case of the file ending with "\verb" (no delimiter character), or "\verb|" (ends after the first delimiter). It also recognizes \n and gives a warning if seen, since LaTeX requires the argument to \verb to be on one line.

+++ detex-src/detex.l   (working copy)
@@ -346,15 +349,30 @@ VERBSYMBOL        =|\\leq|\\geq|\\in|>|<|\\subseteq|\subs
                                                        footnoteLevel = currBracesLevel;
                                                        ++currBracesLevel;
                                                        }
-<Normal>"\\verb" /* ignore \verb<ch>...<ch> */ {   if (fLatex) {
-                                                       char verbchar, c;
-                                                       verbchar = input();
-                                                       while ((c = input()) != verbchar)
-                                                           /*if (c == '\n')
-                                                               NEWLINE;*/
-                                                               putchar(c);
-                                                   }
-                                                   IGNORE;
+<Normal>"\\verb" /* ignore \verb<ch>...<ch> */ {
+  /* Sorry to use different formatting, but it seemed better not
+     to cram all this code over in the rightmost 20 chars. */
+  if (fLatex) {
+    char verbchar, c;
+    verbchar = input();
+    if (verbchar != EOF) {
+      while ((c = input()) != verbchar && c != '\n' && c != EOF) {
+        putchar(c);
+      }
+    }
+    /* would be nice to include input filenames and line numbers */
+    if (verbchar == EOF || c == EOF) {
+      /* do this test first in case verbchar is eof */
+      ErrorExit("\\verb not complete before eof");
+    }
+    if (c == '\n') {
+      char delim[2];
+      delim[0] = verbchar;
+      delim[1] = 0;
+      Warning("\\verb not terminated before eol, delimiter", delim);
+    }
+  }

Test input file:
```\begin{document} % force interpretation as LaTeX
\verb+ok+
\verb|
\verb

FYI, I committed this (and the other changes I just submitted) to TeX Live in r64408. Build/source/texk/detex/detex-src/