uwol / proleap-cobol-parser

ProLeap ANTLR4-based parser for COBOL
MIT License
134 stars 73 forks source link

Broadcom's LSP for COBOL #96

Open sundar-sarvam opened 11 months ago

sundar-sarvam commented 11 months ago

I have been comparing pro leap's cobol parser offering to Broadcom's LSP which also has the parser here: https://github.com/eclipse-che4z/che-che4z-lsp-for-cobol. I want to build this: https://github.com/BroadcomMFD/cobol-control-flow (which is not opensource) using either pro leap cobol parser or Broadcom's LSP cobol parser. What are the pros/cons of both?

uwol commented 11 months ago

Internally che4z is using ProLeap Cobol Parser https://github.com/eclipse-che4z/che-che4z-lsp-for-cobol/blob/development/server/parser/src/main/antlr4/org/eclipse/lsp/cobol/core/parser/CobolParser.g4

Also https://github.com/BroadcomMFD/cobol-control-flow according to https://github.com/BroadcomMFD/cobol-control-flow/blob/main/LICENSE.md internally uses ProLeap Cobol Parser.

So in the end, creating JSON from the AST should be simple and the AST should work for this...

sundar-sarvam commented 11 months ago

Awesome! Does proleap-cobol-parser different COBOL dialects like IBM Enterprise COBOL, GnuCOBOL, MicroFocus COBOL, etc.?

uwol commented 11 months ago

COBOL-85 and IBM

MicroFocus deviates heavily from COBOL-85 -> never tried to integrate this into the grammar -> not sure, if modularization of the grammar or separate grammar would be better option. The grammar is already quite large :-)

sundar-sarvam commented 11 months ago

Oh ok thanks! Is there any plan to work on any project to do something like a control flow graph or function call graph like what Broadcom has done (and has closed sourced) from your end? It seems to be a crucial thing for any COBOL analysis project.

uwol commented 11 months ago

Currently not due to limited resources.

In general the idea behind ProLeap is to develop an open source COBOL toolchain. If a company wants certain features to be implemented as open source into the toolchain, it could contribute this feature or ask us to implement such a feature with a project budget -> function call graph until now was not requested, however is an interesting idea.

sundar-sarvam commented 11 months ago

https://tree-sitter.github.io/tree-sitter/ - This is the general tree parser used for many languages. Integrating the SOTA pro leap's parser with the tree-sitter will be a useful contribution? I think there might be language clash as ANTLR is in Java while tree-sitter is written in C/C++

uwol commented 11 months ago

Might be. The grammar files of Proleap can be found here. You could try to rewrite the grammar for tree-sitter, and then could re-implement the pre-processor and optionally the ASG analyzer on top of your tree-sitter grammar and tree-sitter APIs.

However, as this project is strongly based on ANTLR nothing in that direction is planned from our side so far.

sundar-sarvam commented 11 months ago

I am trying to use this python library: https://pypi.org/project/antlr-ast/ which hopefully is ANTLR but in python (as I want to implement the ANTLR4 grammar for COBOL parsing in python). Is this similar to ANTLR4 in Java?