uwol / proleap-cobol-parser

ProLeap ANTLR4-based parser for COBOL
MIT License
137 stars 76 forks source link

Error in walking through AST due to comment statements #98

Open sundar-sarvam opened 1 year ago

sundar-sarvam commented 1 year ago

I am trying to run the parser with this COBOL file (with the full repo downloaded): https://github.com/aws-samples/aws-mainframe-modernization-carddemo/blob/main/app/cbl/CBACT02C.cbl ( I think this is IBM dialect only). But I get the below error due to a comment line: * You may obtain a copy of the License at):

Full error:

Exception in thread "main" io.proleap.cobol.asg.exception.CobolParserException: syntax error in line 12:33 mismatched input 'the' expecting {IN, OF, ON, REPLACING, SUPPRESS, '.', NEWLINE}
        at io.proleap.cobol.asg.runner.ThrowingErrorListener.syntaxError(ThrowingErrorListener.java:22)
        at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
        at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
        at org.antlr.v4.runtime.DefaultErrorStrategy.reportInputMismatch(DefaultErrorStrategy.java:327)
        at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:139)
        at io.proleap.cobol.CobolPreprocessorParser.copyStatement(CobolPreprocessorParser.java:3609)
        at io.proleap.cobol.CobolPreprocessorParser.startRule(CobolPreprocessorParser.java:326)
        at io.proleap.cobol.preprocessor.sub.document.impl.CobolDocumentParserImpl.processWithParser(CobolDocumentParserImpl.java:90)
        at io.proleap.cobol.preprocessor.sub.document.impl.CobolDocumentParserImpl.processLines(CobolDocumentParserImpl.java:59)
        at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.parseDocument(CobolPreprocessorImpl.java:66)
        at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:86)
        at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:78)
        at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.parseFile(CobolParserRunnerImpl.java:197)
        at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyzeFile(CobolParserRunnerImpl.java:97)
        at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyzeFile(CobolParserRunnerImpl.java:106)
        at CobolParserDemo2.parseCobolFile(App2.java:26)
        at CobolParserDemo2.main(App2.java:16)

This isn't expected right?

My Java app:

import java.io.File;
import io.proleap.cobol.asg.metamodel.Program;
import io.proleap.cobol.asg.metamodel.CompilationUnit;
import io.proleap.cobol.CobolBaseVisitor;
import io.proleap.cobol.CobolParser;
import io.proleap.cobol.asg.metamodel.data.datadescription.DataDescriptionEntry;
import io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl;
import io.proleap.cobol.preprocessor.CobolPreprocessor;
import java.io.IOException;

class CobolParserDemo2 {

public static void main(String[] args) {
        try {
           Program program = parseCobolFile("/Users/.../aws-mainframe-modernization-carddemo/app/cbl/CBACT02C.cbl"); 
           walkAST(program);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static Program parseCobolFile(String filePath) throws IOException {
        File inputFile = new File(filePath);
        CobolPreprocessor.CobolSourceFormatEnum format = CobolPreprocessor.CobolSourceFormatEnum.TANDEM;
        return new CobolParserRunnerImpl().analyzeFile(inputFile, format);
    }
    public static void walkAST(Program program) {
        CobolBaseVisitor<Boolean> visitor = new CobolBaseVisitor<Boolean>() {
            @Override
            public Boolean visitDataDescriptionEntryFormat1(final CobolParser.DataDescriptionEntryFormat1Context ctx) {
                DataDescriptionEntry entry = (DataDescriptionEntry) program.getASGElementRegistry().getASGElement(ctx);
                String name = entry.getName();
                System.out.println("DataDescriptionEntry Name: " + name); // This will print the name, if you want to see it.
                return visitChildren(ctx);
            }
        };

        for (final CompilationUnit compilationUnit : program.getCompilationUnits()) {
            visitor.visit(compilationUnit.getCtx());
        }
    }
}

Any clue why this might happen and workarounds for same? My aim is to find line range in a code file for different constructs like if-else, perform end-perform, etc.

uwol commented 1 year ago

https://github.com/aws-samples/aws-mainframe-modernization-carddemo/blob/main/app/cbl/CBACT02C.cbl#L1 is not line format TANDEM (which would start with line indicator in column 1), but seems to be line format FIXED.

Please consult https://github.com/uwol/proleap-cobol-parser/blob/a43c8c4cde99e25608f0710e67156e16057a68d3/src/main/java/io/proleap/cobol/preprocessor/CobolPreprocessor.java#L19

Best Ulrich

sundar-sarvam commented 1 year ago

Thanks this helped! To get the PERFORM and CALL statements (which change the control flow of the program), I was trying to use the below code:

        Program program = parseCobolFile("..aws-mainframe-modernization-carddemo/app/cbl/CBACT02C.cbl"); 

        List<CompilationUnit> programUnit3List = new ArrayList<>();

        programUnit3List = program.getCompilationUnits();

        for (CompilationUnit programUnit3 : programUnit3List) {
                        final ProgramUnit programUnit = programUnit3.getProgramUnit();
                        final ProcedureDivision procedureDivision = programUnit.getProcedureDivision();
                        final List<Paragraph> paragraphList = procedureDivision.getParagraphs();
                        for (Paragraph paragraph : paragraphList) {   
                                System.out.println("Name: " + paragraph.getName());
                                System.out.println("Statements: " + paragraph.getStatements() );
                                System.out.println("Calls: "+ paragraph.getCalls());

                            }
                        }

But it prints the below as output:

Name: 9910-DISPLAY-IO-STATUS
Statements: [io.proleap.cobol.asg.metamodel.procedure.ifstmt.impl.IfStatementImpl@7a606260, io.proleap.cobol.asg.metamodel.procedure.exit.impl.ExitStatementImpl@5dbab232]
Calls: [name=[9910-DISPLAY-IO-STATUS], paragraph=[name=[9910-DISPLAY-IO-STATUS]], name=[9910-DISPLAY-IO-STATUS], paragraph=[name=[9910-DISPLAY-IO-STATUS]], name=[9910-DISPLAY-IO-STATUS], paragraph=[name=[9910-DISPLAY-IO-STATUS]]]

I need to get which PERFORM statements call which and which PROGRAMs CALL which other PROGRAMs (kind of like a dictionary with key being the caller and value being the called). How can I do this using Proleap COBOL parser? I tried different functions of paragraph like getCalls, etc. but none of them helps

uwol commented 1 year ago

Hello @sundar-sarvam , so in my understanding you want to navigate in the ASG (1) from a called Paragraph to all ProcedureCalls calling the Paragraph, and then (2) from each ProcedureCall to the containing PerformStatement.

(1) already works in your example.

(2) probably should work by calling ASGElement.getParent() on each ProcedureCall, which might give you a PerformProcedureStatement, and a second getParent() might give you the PerformStatement. You could write a helper function which calls getParent recursively until a certain condition is met or certain ancestor class has been found.

I did not implement this to try it out, but I am quite sure this should work. Else you can paste your code and I can take a look.

Best

sundar-sarvam commented 1 year ago

Thanks @uwol . I will implement whatever you have mentioned. Also can you point to some examples where CALL <PROGRAM> is parsed out? Typically a program can call another program as well right (which are in two different files)? To analyse these calls, I will need to parse out CALL statements. Also, will I be able to get the (start -> end) line numbers of a PERFORM statement in a file using the parser? Visual Studio Code (and other editors) using LSP offer folding ranges using which also you get the line numbers but was just wondering if it's possible through the parser itself!

uwol commented 1 year ago

Hi,

Best Ulrich