rubberduck-vba / Rubberduck

Every programmer needs a rubberduck. COM add-in for the VBA & VB6 IDE (VBE).
https://rubberduckvba.com
GNU General Public License v3.0
1.9k stars 298 forks source link

Feedback on the ANTLR grammer #6173

Open Beakerboy opened 10 months ago

Beakerboy commented 10 months ago

I’m using ANTLR to parse VBA files and see Rubberduck cited on the most recent version of the grammer file.

When exporting a Form object from Excel VBA, the header looks like this:

VERSION 5.00
Begin {C62A69F0-16DC-11CE-9E98-00AA00574A4F} Login 
   Caption         =   "Please Log In"
   ClientHeight    =   1920
   ClientLeft      =   120
   ClientTop       =   465
   ClientWidth     =   2295
   OleObjectBlob   =   "Login.frx":0000
   StartUpPosition =   1  'CenterOwner
End

This file fails to parse on three places:

I have an updated grammer file that I was hoping to get some feedback on. Let me know if there is a reason the main file should not be updated with my changes.

I’ve:

i Imagine the “real” changes will need to be more nuanced than I’ve made it. For example

Vogel612 commented 10 months ago

We're doing some preprocessing steps and I don't remember off the top of my head whether we actually parse exported Form components. There is basically multiple different coexisting gestalts of VBA code, not all of which we target (i.e. we have a separate grammar for dealing with preprocessor directives and we parse both exported as well as 'code pane' code in Rubberduck).

@MDoerner probably has smarter things to say about this than me :)

MDoerner commented 10 months ago

We actually only parse the form headers in order not to fail on them. Afterwards, we ignore that part of the parse tree.

The current grammar's rules around forms are purely based on examples I have seen as there does not seem to be a specification anywhere.

If you want, you can open a PR for the changes. However, if this too much hassle for you, I could also incorporate the changes and see whether our tests still pass.

Beakerboy commented 10 months ago

if this too much hassle for you

Not at all. I was planning on doing that, but thought I’d start the conversation here beforehand. There is a separate Visual Basic 6 grammar that seems to have the changes I was planning on making. Is that to be expected?

MDoerner commented 10 months ago

Oh, now I realize that you referred to the VBA grammar on the Antlr repo. I think we provided that at some point. However, our grammar at https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Grammar/VBAParser.g4 has evolved since.

Two bugs in there were actually only fixed last month.

Beakerboy commented 10 months ago

Nice! I didn’t notice a separate grammar within this repository. Should the latest copy be sent over to the ANTLR group? I noticed there were some recent changes over there to fix bugs with line continuation and line numbering.

MDoerner commented 10 months ago

By the way, our grammar assumes that precompiler directives are already taken care of. To do that we first parse with our precompiler parser https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Preprocessing/VBAConditionalCompilationParser.g4, evaluate which code is dead and then set the corresponding tokens to hidden before reusing the token stream for the actual parse.

MDoerner commented 10 months ago

Hm, one could see whether one can merge enhancements in.

Beakerboy commented 10 months ago

I’m just dealing with Linting raw VBA code that is exported from Excel without Rubberduck, so I don’t think the precompiler directives will come into play.

I just published a github action to lint VBA code as part of a CI / CD process. I also created a github organization called VBA-actions, so if any of you have any actions you would like to publish separately from the Rubberduck project, let me know and I can add you to the org.