thutt / lrstar

Port of lrstar parser generator to Linux
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

Missing line termination/length on lrstar_lexer::print_line #11

Open mingodad opened 1 month ago

mingodad commented 1 month ago

When testing grammars/ANTLR we get incorrect output from lrstar_lexer::print_line due to missing line termination/length see bello a possible fix for it: Possible fix:

-------------------------- src/include/lrstar_lexer.h --------------------------
index f02d187..57d3802 100644
@@ -96,8 +96,10 @@ public:
             linenumb_printed = linenumb;

             if (*ls != 26) {    // Not end of file?
-               printf("%6d  %s\n", linenumb, ls);
-               fprintf(stderr, "%6d  %s\n", linenumb, ls);
+               char *eol = ls;
+               while(*eol && *eol != '\n' &&  *eol != 26 ) ++eol;
+               printf("%6d  %.*s\n", linenumb, eol - ls, ls);
+               fprintf(stderr, "%6d  %.*s\n", linenumb, eol - ls, ls);
             }
          }
       }

Using the https://github.com/thutt/lrstar/blob/24.0.018/grammars/ANTLR/test.input.txt :

./antlr --output /dev/null test.input.txt
==26303== 

Input File ...

ANTLR parser.

     1  /*
 * [The "BSD license"]
 *  Copyright (c) 2013 Terence Parr
 *  Copyright (c) 2013 Sam Harwell
 *  All rights reserved.
...

With the fix:

./antlr --output /dev/null test.input.txt
==26804== 

Input File ...

     1  /*
     2   * [The "BSD license"]
     3   *  Copyright (c) 2013 Terence Parr
     4   *  Copyright (c) 2013 Sam Harwell
     5   *  All rights reserved.
...

But still there is something else going on because after line 118 it start repeating again once more:

   114     :  AT (actionScopeName COLONCOLON)? identifier actionBlock
   115     ;
   116  
   117     // Scope names could collide with keywords; allow them as ids for action scopes
   118  actionScopeName
ANTLR parser.

     1  /*
     2   * [The "BSD license"]
     3   *  Copyright (c) 2013 Terence Parr
     4   *  Copyright (c) 2013 Sam Harwell
     5   *  All rights reserved.
mingodad commented 1 month ago

The problem of start repeating again is due to output to both stdout and stderr but probably we only want stderr see the complete possible fix bellow:

-------------------------- src/include/lrstar_lexer.h --------------------------
index f02d187..8e42cc5 100644
@@ -88,7 +88,6 @@ public:
          char *ls;
          if (linenumb > linenumb_printed) {
             if (linenumb == 1) {
-               printf  (        "\n");
                fprintf (stderr, "\nInput File ...\n\n");
             }

@@ -96,8 +95,9 @@ public:
             linenumb_printed = linenumb;

             if (*ls != 26) {    // Not end of file?
-               printf("%6d  %s\n", linenumb, ls);
-               fprintf(stderr, "%6d  %s\n", linenumb, ls);
+               char *eol = ls;
+               while(*eol && *eol != '\n' &&  *eol != 26 ) ++eol;
+               fprintf(stderr, "%6d  %.*s\n", linenumb, eol - ls, ls);
             }
          }
       }
thutt commented 1 month ago

The way that Paul originally wrote the logging facilities is certainly unique. I haven't looked at it in any detail, but I'm sure there's room for some pretty extensive cleanup. I notice that in your proposed patch, ASCII 25 is being used to determine EOF; that's non-portable (but, IIRC, the original code also puts ASCII 26 at the end of the file when it is read into memory, so it should be ok... though very Windows specific). It would be nice if these uses of 26 were turned into a preprocessor symbol, or a constant value present in a header file.