Open sundar-sarvam opened 11 months ago
Internally che4z is using ProLeap Cobol Parser https://github.com/eclipse-che4z/che-che4z-lsp-for-cobol/blob/development/server/parser/src/main/antlr4/org/eclipse/lsp/cobol/core/parser/CobolParser.g4
Also https://github.com/BroadcomMFD/cobol-control-flow according to https://github.com/BroadcomMFD/cobol-control-flow/blob/main/LICENSE.md internally uses ProLeap Cobol Parser.
So in the end, creating JSON from the AST should be simple and the AST should work for this...
Awesome! Does proleap-cobol-parser different COBOL dialects like IBM Enterprise COBOL, GnuCOBOL, MicroFocus COBOL, etc.?
COBOL-85 and IBM
MicroFocus deviates heavily from COBOL-85 -> never tried to integrate this into the grammar -> not sure, if modularization of the grammar or separate grammar would be better option. The grammar is already quite large :-)
Oh ok thanks! Is there any plan to work on any project to do something like a control flow graph or function call graph like what Broadcom has done (and has closed sourced) from your end? It seems to be a crucial thing for any COBOL analysis project.
Currently not due to limited resources.
In general the idea behind ProLeap is to develop an open source COBOL toolchain. If a company wants certain features to be implemented as open source into the toolchain, it could contribute this feature or ask us to implement such a feature with a project budget -> function call graph until now was not requested, however is an interesting idea.
https://tree-sitter.github.io/tree-sitter/ - This is the general tree parser used for many languages. Integrating the SOTA pro leap's parser with the tree-sitter will be a useful contribution? I think there might be language clash as ANTLR is in Java while tree-sitter is written in C/C++
Might be. The grammar files of Proleap can be found here. You could try to rewrite the grammar for tree-sitter, and then could re-implement the pre-processor and optionally the ASG analyzer on top of your tree-sitter grammar and tree-sitter APIs.
However, as this project is strongly based on ANTLR nothing in that direction is planned from our side so far.
I am trying to use this python library: https://pypi.org/project/antlr-ast/ which hopefully is ANTLR but in python (as I want to implement the ANTLR4 grammar for COBOL parsing in python). Is this similar to ANTLR4 in Java?
I have been comparing pro leap's cobol parser offering to Broadcom's LSP which also has the parser here: https://github.com/eclipse-che4z/che-che4z-lsp-for-cobol. I want to build this: https://github.com/BroadcomMFD/cobol-control-flow (which is not opensource) using either pro leap cobol parser or Broadcom's LSP cobol parser. What are the pros/cons of both?