uwol / proleap-cobol-parser

ProLeap ANTLR4-based parser for COBOL
MIT License
136 stars 74 forks source link

Parser returns free-format only regardless of the fixed-format input #37

Open Reinhard-Prehofer opened 6 years ago

Reinhard-Prehofer commented 6 years ago

We experience a nasty side effect when parsing a File with the following statement: replacedString = new CobolPreprocessorImpl().process(inputFile, copyBookFiles, COBOL_FIXED_FORMAT); The replacedString is an "optimized" Cobol-String which can rather be used for free-cobol Format only, at least a Cobol which allows the processing of lines with more than 80 (or rather 72) columns. When using the Cobol85Parser for some Cobol-Optimisations (for example goto-elimination algorithms/improvements) and then sending these optimised Cobol files back to the ibm-host, this IS a problem.

Just have a look at "line 001800 and 001810", where even the continuation sign in column 7 ist used since the line had been too long.

001770 01  ZEILE3B.                                                     abc11S05
001780     05  Z3B-VS              PIC X       VALUE ' '.               abc11S05
001790     05  FILLER              PIC X(11)   VALUE SPACE.             abc11S05
001800     05  FILLER              PIC X(50)   VALUE 'für ausgeschiedeneabc11S05
001810-                                        ' Besoldungsfälle'.      abc11S05
001820     05  FILLER              PIC X(71)   VALUE SPACE.             abc11S05
001830     05  Z3B-FIB             PIC X       VALUE '2'.               abc11S05

We are fully aware of the underlying parsing reasons and agree on a much better readability etc etc. the only "BUT/HOWEVER" is: the resulting replacement string exceeds the 80column limit and will be rejected on the host. Second sideeffect, which is an additional matter or discussion: these (in)famous columns 1-6 and 73-80 very often are used for taggings or even rudimentary version-control (who applied these changes/when) This information also gets lost.

       01  ZEILE3B.
           05  Z3B-VS              PIC X       VALUE ' '.
           05  FILLER              PIC X(11)   VALUE SPACE.
           05  FILLER              PIC X(50)   VALUE 'für ausgeschiedene Besoldungsfälle'.
           05  FILLER              PIC X(71)   VALUE SPACE.
           05  Z3B-FIB             PIC X       VALUE '2'.

Any workaround to be suggested other than reformatting the returned String into the fixed-format corsett ?

Reinhard-Prehofer commented 6 years ago

The Parser changes comments "too much" - at least on the mainframe this leads to an avalanche of warnings since the line exceed the column 80!

       ID DIVISION.
      ******************************************************************
      *************  Y2600120  Version 008  VOM 18.01.95  **************
      ******************************************************************
       PROGRAM-ID.   Y2600120.
       AUTHOR. *      B. Rxxer.
       DATE-WRITTEN. *06.1988.
      ******************************************************************
      *                                                                *
      *       Dieses Programm stzt die Lieferscheinkennzeichen in      *
      *       der Lieferscheinfile D2600-Liefer.                       *

is changed to

       ID DIVISION.
      *> *****************************************************************
      *> ************  Y2600120  Version 008  VOM 18.01.95  **************
      *> *****************************************************************
       PROGRAM-ID.   Y2600120.
       AUTHOR. *>CE       B. Rxxer.
       DATE-WRITTEN. *>CE 06.1988.
      *>CE *****************************************************************
      *>CE                                                                 *
      *>CE        Dieses Programm stzt die Lieferscheinkennzeichen in      *
      *>CE        der Lieferscheinfile D2600-Liefer.                       *

Having a close look you see that the pseudo-ANSI-comments are being used > with a space, thus each former length80 comment is a length82 comment now, with the >CE convention used the linesize is increased even more ... If the parser is used for cobol-enhancements and the changed sources have to be brought back to the mainframe, a CobolReformatter has to be used/implemented to get out all these extra formatting issues. I would opt for strictly adhering to the input format : Specially when using fixed format, then please dont break the given barriers !

Reinhard-Prehofer commented 6 years ago

Additional and more nasty sideffect of breaking the fixed format :+1:

The original inputformat already considers the linesize-barriers (of course) and is "nicely" formatted to fit into these column-swimlanes, like:

022900     IF S1-RSP-NORMAL                                             Y2600120
023000                 NEXT SENTENCE                                    Y2600120
023100     ELSE        DISPLAY 'Y2600120 - Fehler beim suchen Liefersche22.12.89
023200-                        'in' S1-RSP ' ' VIEW-VB                  22.12.89
023300                 GO TO B51000-EX.                                 Y2600120
002400                                                                  22.12.89

Having a closer look at the outcome of the parser

           IF S1-RSP-NORMAL
                       NEXT SENTENCE
        ELSE        DISPLAY 'Y2600120 - Fehler beim suchen Lieferschein' S1-RSP ' ' VIEW-VB
                       GO TO B51000-EX.

we of course see, that this long line breaks the 72/80 column barrier by far. And even worse, logic is needed now to break up the string into chunks fitting into the col8-72 area, since this display-statement will lead to a compile ERROR now and not a warning any more. I am fully aware that the parser needs concatenated strings/literals for ease of parsing, but it creates unwanted side effects like mentioned above. In addition, the col1-6 and col73-80 are often used for some very rudimentary old version-control tags or information on who changed the line etc etc. I am not really asking for the feature of maintaining these col1-6 and col73-80 swimlanes, but ... [I had been asked already and could turn the question down ...]