uwol / proleap-cobol-parser

ProLeap ANTLR4-based parser for COBOL
MIT License
136 stars 74 forks source link

tagging the expansion of copy-books (begin-copy-book/end-copy-book) #38

Open Reinhard-Prehofer opened 6 years ago

Reinhard-Prehofer commented 6 years ago

This is more an architectural issue and "no real error". I am fully aware that the contents of any copybook must be included into the parsing algorithm to avoid undeclared variables etc etc. Currently a call like replacedString = new CobolPreprocessorImpl().process(inputFile, copyBookFiles, COBOL_FIXED_FORMAT); returns a String where all the COPY-book statement are expanded as if they had been written right into the cobolcode. As far as I have seen, there is no hint or tag to find out where the original copybook had started or ended.

for example:

004780 01  LA.                                                          Y26RZKGS
004790     05 LAS           PIC 9       VALUE 0.                        Y26RZKGS
004800     05 LADST         PIC 999.                                    Y26RZKGS
004810     05 LASVNR        PIC 9(8).                                   19.08.08
004820     05 LABDST        PIC 9(8).                                   Y26RZKGS
004830     05 LAPRJ         PIC 9(2).                                   Y26RZKGS
004840     05 LAD           PIC 9(2)    VALUE 00.                       Y26RZKGS
004850                                                                  Y26RZKGS
004860     COPY VOKAKGS.                                                08.10.96
004870                                                                  Y26RZKGS
004880*=================================================================14.09.07
004890 PROCEDURE DIVISION.                                              Y26RZKGS

leads to

       01  LA.
           05 LAS           PIC 9       VALUE 0.
           05 LADST         PIC 999.
           05 LASVNR        PIC 9(8).
           05 LABDST        PIC 9(8).
           05 LAPRJ         PIC 9(2).
           05 LAD           PIC 9(2)    VALUE 00.

      *> *****            V O R L A U F K A R T E                    *****
      *> *****            xxxxxGELDSTATISTIK                        *****
      *> *****************************************************************
       01  VOKAKGS.
      *> ***   STELLE 1    = VERARBEITUNGSART , N - Nxxxxxxxxx
      *> ***                                    S - SONDERxxxxxxxx
           05  VK-VER               PIC X.
      *> ***   STELLE 2-3  = MELDEMONAT
           05  VK-MELDM             PIC 99.
      *> ***   STELLE 4-5  = MELDEJAHR
           05  VK-MELDJ.
               10  VK-MELDJJ12          PIC 99.
               10  VK-MELDJJ34          PIC 99.
      *> ***   STELLE 8-9  = LIEFERUNGSNR.
           05  VK-LNR               PIC 99.
      *> ***             VOKA-REST
           05  FILLER               PIC X(71).       
      *> =================================================================
       PROCEDURE DIVISION.

We would highly appreciate a mechanism where the Start and end of the original COPYbook could be easily recognised. This could be a naming convention in using a predefined comment like:

123456* START-of-copybook: <COPYBOOK-statement>
    <here is the content of the copybook>
123456* END-of-copybook: <COPYBOOK-statement>

Why/when would such a feature be needed? If you use the parser for various code optimization issues (for example GO TO elimination algorithms or deat code removal etc.) this approach would result in a total elimination of the copybooks in the resulting new Cobol-sources ... which of course is out of question. Using the start/end-tags for expanded copybooks in a way similar to my suggestion would be a rather straightforward approach to easily find the correct place where to re-insert the original copy-book-statement. The tagged/commented copy-book-statement should be the original copy-book statement, covering all these replacing/by aspects and multi-line statements as well - so in effect it will be more than a "single line only"