Open DonMathi opened 3 years ago
You are correct. The rule shouldn't have the alt-operator. And there are other errors. I am still refining the scaper and refactorings for the c++14 grammar. The new grammar is in https://github.com/kaby76/scrape-c-plus-plus-spec/blob/main/CPlusPlus14Parser.g4 but it is not compiling yet. The original scraped grammars are in https://github.com/kaby76/scrape-c-plus-plus-spec/tree/main/scraper and those should look exactly as what you see in the specs. I have checked those multiple times so I think they are correct. But they are not functioning grammars. The script https://github.com/kaby76/scrape-c-plus-plus-spec/blob/main/trash.sh takes the c++14.g4 grammar and tries to produce a working grammar but it is not finished.
I am also looking into your CPlusPlus14Parser.g4 file, and there are some problems with the use of attribute_specifier_seq ?
and the rule attribute_specifier_seq : attribute_specifier* ;
You can now optionally (?) recognize an empty string (). But if you change the `to an
+`, then it doesn't complain.
There are some more similar places where this is the issue
The last thing that ANTLR is complaining about is the use of a fragment in the parser grammer. I'm not sure how to fix that.
preprocessing_token : FHeader_name | Identifier | pp_number | Character_literal | User_defined_character_literal | String_literal | User_defined_string_literal | preprocessing_op_or_punc | ~Newline ;
fragment FHeader_name : '<' FH_char_sequence '>' | '"' FQ_char_sequence '"' ;
Thank you for the info. I'll update trkleene to do the right thing for attribute_specifier_seq.
The plan is to continue to make changes to the Trash tool set. However, the main focus is still just the C++ Spec scraper, which reads one of the several dozen C++ Spec pdfs and outputs a non-functioning, but acceptable, Antlr syntax grammar as it is in Appendix A. I purchased the three official ISO specs a couple of weeks ago. And, they are very different from the drafts and each other. The requirement for the scraper is to produce an identical grammar in Antlr syntax to that of a Spec. Trash is used to bring that grammar into a functioning, optimized Antlr grammar.
The grammars that are being outputted by the Trash refactoring script do not compile yet and are mainly produced to find errors in scraping the Spec
I have checked the c++14.g4 several times against the Spec visually, but I will need to do that a couple more times to add in more code to the scraper to add another layer to the code to make small differences in the spec self-correcting. I haven't checked by eye whether c++17.g4 and c++20.g4 are correct.
I plan to add a large (~25k files) test suite of Clang, Gnu, Windows C++ source code to test the parser.
So, I am still very far away from completion.
Great work you are doing
Thank you. Much appreciated! And, thanks for looking at this even if it's unstable still. I'll let you know here when things are a little more stable with the grammar. --Ken
On 11/22/2021 2:43 AM, Kjeld Mathias Petersen wrote:
Great work you are doing
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaby76/scrape-c-plus-plus-spec/issues/3#issuecomment-975210060, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACW4UC5XYSMCVSHHYRPXR2DUNHYB7ANCNFSM5ILR5YKA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi Kaby76
I'm looking into some details about your g2/Cpp14Parser.g4 file, and it seems to me that
opaque_enum_declaration : enum_key attribute_specifier_seq ? | Identifier enum_base ? | Semi ;
doesn't look right. To me there are too many | in here. The pdf look like this:What am I missing?