Closed Heidernlee closed 4 years ago
@Heidernlee Thanks for the note. I haven't had much time to continue updating that grammar for PL1, but I will at some point. Yes, that looks like a bug. Can you tell me where the EBNF for PL1 is? That would be very useful. The grammar in this Git repo was imported from Yacc SourceForge open source program. It could be that the import was bogus, or that I ran a transform that wasn't working over the imported grammar. At the time, I was doing things through VS2019, and the transforms were buggy. I now have a command-line tool that can generate reproducible results from the open-source Yacc grammar. Or, I can just redo the grammar from scratch from the EBNF you mentioned.
@kaby76 Thanks for your response. This EBNF for PL1 is from here: https://www.cs.vu.nl/grammarware/browsable/os-pli-v2r3/ (Maybe it's the only one PL1 grammar on the Internet of 2020...) I'm just getting started of Grammar Parsing or Antlr4, the meaning of LL/LR/LALR always made me confused, I just noticed the other Antlr4 Parser (G4 file) is always like [XX]+ or [XX] , that makes a Tree like Parent-Childs. But your Parser used lots of Recursive Program instead of [+] or [] . Is that means your G4 is not LL but LALR or something else ?
Thanks for the link.
I just checked the editdataspec
rule again, and I think it's okay. The rule came from converting the Yacc grammar into Antlr. I just hadn't transformed the grammar to EBNF. The EBNF syntax for grammars (e.g., "symbol+" or "symbol*") not only helps to make the grammar easier to read, but results in a faster parse and a flatter parse tree, which you mention.
This grammar is LR because it was derived directly from a Yacc and Bison grammar. Strictly speaking, it's not LL because it has left recursion. I did have to rewrite the grammar to remove indirect left recursion, but I left some direct left recursion if Antlr could handle it. Internally Antlr converts direct left recursion in a rule with EBNF before generating an LL parser (editdataspec : ('(' datalist ')' '(' editformatlist ')' | editdataspec ('(' datalist ')' '(' editformatlist ')';
=> editdataspec : ('(' datalist ')' '(' editformatlist ')')+;
). It's possible to convert LR to LL and vice versa, with the caveat that the parse trees may not be the same.
I think my plan for a PL1 grammar will be to scrape the Lämmel/Verhoef grammar directly from the website, update the current grammar in Antlr with EBNF, then compare this with the imported Yacc grammar. I will likely also go to the Language Reference doc by IBM, and try to scrape via tool a grammar from that. The link to Tom Everett's version for PL1 is gone, but I have enough versions here to get a good PL1 grammar. I believe in scraping grammars from elsewhere rather than typing in grammars from scratch in order to reduce errors.
It's good to learn about the various parsing algorithms. The Dragon book is a great book, so definitely pick up a copy.
@kaby76 Thanks for your Response. Now I finally get the difference between LL and LR.....
After tested with many PL1 Programs, here's a weird thing, It's OK while use G4 File directly , but throw Exception while Parser=>Tree with Java, Like this:
Very Simple PL1 Source Code:
PUT FILE (SYSPRINT)
EDIT ('AS','AB') (2(B,A,COL(119),A))
(C,C) (2(B,A(120))) ;
It's a perfect Grammar Tree while use G4 File directly:
But I tried to read it with Java:
The following Exception threw:
Is the same Exception threw on your Environment?
BTW, Dragon Book is already bought from Rakuten Marketplace, Many thank you.
@kaby76 Sorry, I fixed it , cause by wrong dependency version of Java Package. I'll keep working on this PL1 Parser, Hope one day it can be EBNF Format ^ ^
Mainframe Modernization is a very well-paid job in Japan these years, My next step is RPGII Parser .... Hope my Boss give me a break ~
Many thank you for providing this Antlr4 Grammar. I just followed from this Issue [https://github.com/antlr/grammars-v4/issues/1752] It's so hard to find a good PL1 parser in 2020......
This G4 file Pared 80% of my example PL1 source code Successfully, Thanks again. Here's the question about this G4 file, For example, I saw the EBNF for "EDIT" is like: ["EDIT" { "(" data_list ")" "(" format_list ")" }+], Here's a [+] in it, I think it means "dataList" and "formatList" is a set that appears 1 or more times But from your G4 file , [editdataspec : '(' datalist ')' '(' editformatlist ')'], no [+].
Is there any reason to define the "EDIT" grammar like that ?