Early to Late Binding Refactoring

rubberduck-vba / Rubberduck

Every programmer needs a rubberduck. COM add-in for the VBA & VB6 IDE (VBE).

https://rubberduckvba.com

GNU General Public License v3.0

1.9k stars 299 forks source link

Early to Late Binding Refactoring #1184

Open comintern opened 8 years ago

comintern commented 8 years ago

One of the dreary tasks that I find myself performing fairly often is converting from early binding to late binding. I like to take advantage of auto-complete when I'm writing the code, but usually want to release the finished product as late bound so I don't wind up with referencing issues.

It would be awesome to right-click the declaration line below, then Refactor->Convert to Late Bound...

Private Sub Example()
    Dim app As Excel.Application
    Set app = New Excel.Application
    Target (app)
End Sub

Private Sub Target(foo As Excel.Application)
    foo.Bar
End Sub

...and then wind up with something more like this:

Private Sub Example()
    Dim app As Object
    Set app = CreateObject("Excel.Application")
    Target (app)
End Sub

Private Sub Target(foo As Object)
    foo.Bar
End Sub

Bonus points if it would remove the reference. Many bonus points if it were possible to Refactor->Convert to Early Bound, but the feasibility of that might be a stretch.

retailcoder commented 8 years ago

Okay. I think this is definitely possible... but not both ways... and only once the 2.0 COM reflection code is merged into the main repo.

Here's the approach I'm thinking of:

When reflecting COM types, keep the GUID in the Declaration object.
When the resolver establishes that app is an Excel.Application object, Rubberduck knows the GUID to use.
Once we know what GUID to use, hit the registry and locate the CLSID key for it; get the ProgId key under it and read the default value, which contains the string to use for CreateObject.
At the declaration site, rewrite the asTypeClause to As Object
At the reference assignment site(s), rewrite the right-side and replace New {Type} with the CreateObject({ProgId}) call
"Tracking" which procedures/parameters the reference is passed to, what copies are made, etc., is going to be way too expensive to be reasonable, and will leave other variables' paths (and dead code too) unaffected anyway - best approach would be to actually seek anything declared with the "AsType" we're after, and simply replace it with As Object. Then we're sure all references are gone.
Reparse & resolve; if no Declaration object remains that's using a type in the target library, the reference can be safely removed.

Reverting the operation might not be possible, at least not in one sweep.

I have a very vague idea of how it could happen:

Get the ProgId string from the CreateObject call
Hit the registry and search for the ProgId key under HKCR; get the CLSID under it.
Search the registry for the CLSID/GUID, and get the InprocServer32 key for it.
.......
Locate the library and add the reference to the project if it's not there.
Track the procedures/parameters the reference is passed to, what copies are made, etc., and change the declaration type from Object to ........

Meh, no cookie for me

bclothier commented 7 years ago

May I propose an alternative?

Instead of dealing with all "what is this?", refactor by adding the compile flag and rewrite code as accordingly:

Private Sub DoIt()
#If LateBind Then
  Dim xlApp As Object 'Excel.Application
#Else 
  Dim xlApp As Excel.Application
#End If

  Set xlApp = CreateObject("Excel.Application")

  ....

End If

That way, the refactor is now a cut'n'paste affair; just replace the original early-binding line, and insert the additional lines. It's oversimplifying a bit (since we want to gather all possible references and dump them into a single #If/#End If block and convert all Set xxx = New xxx into a CreateObject call, but once refactored, no further metaknowledge is necessary.

However, I want to call this out that might cause issues with the parser. I often write what I call "2-headed procedures"... like so:

#If LateBind Then
Public Sub ChoppedLiver(xlWorksheet As Object)
   Dim xlRange As Object
   Const xlSomeExcelConstant As Long = 1
#Else 
Public Sub ChoppedLiver(xlWorksheet As Excel.Worksheet)
   Dim xlRange As Excel.Range
#End If

  Set xlRng = xlWorksheet.Range(1,1).InvokeThatMethod(xlSomeExcelConstant)

End Sub

The code will compile just dandy and the pattern help cut down on the redundancy of the early/late binding, especially when you need to expose the parameters that needs to be derived from some external libraries. Only one head can be active at a time.

HTH.

PS: I should add that this is likely to be one-way, from early bound to late bound. If the code already contained the late-bound objects the refactor should just leave those well alone. I think because most of time we write early bound code, then convert to late binding late, the unidirectional refactor shouldn't be an issue, at least for me.

retailcoder commented 7 years ago

@bclothier the grammar cannot deal with # preprocessor instructions and actual VBA syntax at once; what we do, is that we preprocess the VBA code to pick up the "live" and the "dead" code; during that phase we replace the "dead" code in each module content strings with whitespace that preserves the line position of everything, and the resulting string is what we feed our actual VBA parser: Rubberduck knows nothing of the "dead" code in the module.

So the "dead" branch(es) don't even get tokenized. We know "dead" code is dead because we can read #Const values and interpret the Boolean expressions in #If...#Else...#EndIf statements.

However if you don't have #Const values and instead the preprocessor constants are defined at project-level... there is no easy way to get the project-level preprocessor constants without popping the project properties dialog and grabbing the contents of the textbox in there, so we didn't do it - because we don't have a way to know whether the values have changed without popping the project properties dialog and grabbing the likely identical contents of the textbox in there; in other words there'd be an annoying flicker every time Rubberduck started a reparse... so we dropped the idea - and that's the only thing that could break our parser, at least as far as preprocessor constants are concerned.

That said nothing prevents us from injecting such code, including the #Const declaration. That's an interesting idea; ideally though, we'd only add as little code as needed to make it work. Precompiler directives everywhere quickly becomes hard to read.

bclothier commented 7 years ago

Thanks for the explanation.

Yes, project-level constants will be a big issue and like you, I've been forced to settle for that silly "pop open the dialog to automate setting the const".

One lame workaround is to support some kind of project-level annotation so that RD can read that instead of trying to pop open the dialog. But that means it can get out of sync, etc. etc. This wouldn't be so lame if you had a means of reliably detecting and trapping whenever the user has opened the project properties so that you can inspect the changes, and suffer flicker only once for a new VBA project.

As for the minimizing the preprocessor code, I am in total agreement with you. However, I'd say that if this is a problem, this is a different refactor problem than the one we are discussing here. Normally if I have a VBA project and I know I am going to use Excel.Application 1000 times in various place, you can assume that I will define a internal function ExcelApp() to abstract away the GetObject/CreateObject, which then allow me to focus on #Ifing only the local variables I need to do whatever. The constants that comes from the external library could be similarly defined in the same module that houses the ExcelApp module.

So it would make sense to have an inspection that warns of often CreateObject("xxx") used everywhere and recommending refactor it into a common procedure. That would then make the # of preprocessor instructions less of a problem.

retailcoder commented 7 years ago

This wouldn't be so lame if you had a means of reliably detecting and trapping whenever the user has opened the project properties so that you can inspect the changes, and suffer flicker only once for a new VBA project.

You're onto something here. It wouldn't need to flicker - project-level preprocessor constants are empty by default no? And at this point we can certainly hook up the project properties dialog and detect it opening and closing... hmm...

comintern commented 7 years ago

Picking up change events from the Properties dialog is trivial with the WinEvents APIs - the only issue is that it only receives update events when it's visible. It's also really easy to subclass now (too easy as it turned out - I think at one point it was accidentally subclassed) so tracking its state is also trivial. Just thought I'd throw that out there...

bclothier commented 7 years ago

The flicker is necessary in the case where you have an existing VBA project with project-level compilation constants already defined. Perhaps when RD is first started on parsing a new VBA project, and it detects it has no project-level annotations, it should flicker to inspect the current setting. That would be one-time only and subsequent parsing can just refer to the project-level annotations. That would reduce the need to define the #Const constant (which I really dislike using - it's more likely I need it for multiple modules than only one module).

ThunderFrame commented 6 years ago

For late-bound to early-bound conversions, it would be nice to have a user-configurable list of bindings that Rubberduck should consider available to the user, such as Microsoft Scripting, and VBScrpt Regular Expressions. That would allow a developer to identify binding opportunities in code, but only for specific white-listed references.

For early-bound to late-bound, you could work from the same white-list, but Rubberuck could also present a dialog that allows the user to choose which references are maintained as early-bound, and which should be converted to late-bound. Ideally, Rubberduck would know the unremovable references of the host app, so that a user doesn't try to late-bind Excel in Excel - but then, what if you really do want to late-bind a second instance of Excel (or maybe even a different version), from Excel? The reference conversions gets messy there.

ThunderFrame commented 6 years ago

linking #3750