uwol / proleap-cobol-parser

ProLeap ANTLR4-based parser for COBOL
MIT License
137 stars 76 forks source link

Inline comments not recognized when there is no character after *> #81

Open maximilianocatarinozup opened 4 years ago

maximilianocatarinozup commented 4 years ago

Hello,

First, thank you for the awesome job. I'm trying to parse a Cobol code with a header with the structure below:

   *>************************************************************************
   *> Program:    Program name
   *>
   *> Purpose:    Description
   *>
   *> Author:       Name
   *>
   *> Date-Written: Date
   *>
   *> Tectonics:    compile command
   *>
   *> Usage:        Usabe example
   *>
   *>************************************************************************

In the lines with only >, followed or not by spaces, I get an error java.lang.RuntimeException: syntax error in line 8:7 extraneous input > expecting.

I made a test setting some text after the "*>" and the exception do not occur. I'm using the 2.4.0 version with TANDEM format.

I'm new to Cobol and I'm not sure if something is wrong. It's the expected behavior?

sjbutler commented 4 years ago

The exception you see is from the Antlr lexer during the second stage of parsing, and is a symptom of a problem in the preprocessor.

I've had the same problem, but haven't had chance to refine the fix and contribute it with tests.

on line 22 of io/proleap/cobol/preprocessor/sub/line/rewriter/impl/CobolInlineCommentEntriesNormalizerImpl.java if you replace the regular expression with \\*>(\\S+.*)?$ and recompile then the parser will work with '*>' comments without spaces. However, it is not a clean solution as additional stars are introduced somewhere else in the preprocessor (I think).

Maybe @uwol can see a better solution?

maximilianocatarinozup commented 4 years ago

Hi @sjbutler, thank you for the answer. I'll change the regular expression. I think the patter \\*>[^\n\r]\\* could be better. I'll do some tests to check.

I believe I found an unsupported syntax to PIC. I'll check the docs first.