Open Dob4115 opened 5 years ago
Thank you for reporting. This is always a thing we want to improve.
However, you might want to read about what we already do currently to minimize the parsing time, particularly the section titled "Doing Only What Is Necessary".
The request to load the code pane immediately is kind of already an issue: #3697.
Note that inspections, which isn't a part of the parsing process may be automatically run after a reparse. You may want to try and turn off that via the settings dialog.
We also suggest that you run the log with Trace
log and review if there are any cases where the parser is falling back to slower LL mode. There are certain expressions that we cannot resolve using the faster SLL mode, in which case, we must fallback to the slower LL mode and the log will indicate that. If there are several entries about fallbacking to LL, that would be a contributing factor and requires some code changes.
Consider furthermore that because we are limited to a granularity of a module level, a module with 10000 lines is much more expensive to parse, even you only changed a single character compared to a module with only 100 lines. More particularly so if the module itself has several references. Again, this is a case where code needs to be modified.
If you have a specific module that you think it should run fast but is very slow, please share the log with us so we can analyze it.
Of course, we always need more help -- building a fast and reliable parser is a very hard problem and we welcome anyone who want to contribute toward that goal.
Thanks for the feedback! I'll add that Rubberduck, like many open-source software tools, is maintained by a small team of volunteers, and as stipulated by its license, is provided as-is, in the hope that it will be useful. We love that you care enough for what we do to file a bug report, that's awesome - feel free to show that care and appreciation by starring our project.
As for how you can speed up the parser/resolver:
Worksheets("Sheet1")
returns a Variant
: this makes the resolver work harder when trying to resolve the rest of the expression, whereas if the sheet is pulled into a Worksheet
variable, the resolver will perform much better.bang!notation
and review logs to see if parsing is being slowed down by SLL failures and retries. Simplify complex expressions by extracting local variables.Note that these are good coding practices in general, not just ways to make Rubberduck work better.
Cheers!
Thanks for the additional information. I looked through the options several times but had not seen the ability to not perform inspections... I turned that off and it is a little faster.
You said:
I logged opening the VBE (initial parse took 1:12 according to the log) and then a couple minutes later, after not changing anything, I reparsed (this took 1:02). I was expecting more than 10 seconds difference, does this make sense? I also am interested, if parsing is in the background, why Excel (not just the IDE) freezes for most of the parsing. The freezing is new in the latest stable version (previously I used 2.4.0 stable).
I looked through the log and did see some instances of SLL failing, but not many (to my untutored eye, however!). I also saw a bunch of
MainParseSyntaxErrorException: extraneous input ',' expecting {<EOF>, ':', REM, NEWLINE, ''', WS, LINE_CONTINUATION`
(e.g., at 2019-04-08 15:01:47.4469). Would these be slowing things significantly?
I looked further at the instance I provide and the line of code at the address specified (L197C29) is:
.Sorted = False
This does not have 29 characters... I'm a little confused about what I can change to eliminate the error. RubberduckLog.0.txt
The log you provided has two parsing runs. The first parsing run takes ~62 seconds according to the log, the second takes 40.
The first run also provokes multiple (14) SLL failures and falls back to LL mode. None of these appear in the second run.
These times break down into the following specific timings (according to the log)
Run 1:
Run 2:
I've skipped some items with times below 10ms. Note that the second run does not parse user modules. Also all operations that it does perform are completed faster than in the preceding run.
I also want to add a word on the syntax error you found in the log:
Unless Rubberduck explicitly tells you there is a syntax error while the VBE compiles the code just fine, we won't be able to do very much about it.
Rubberduck uses the "SLL prediction mode" of the ANTLR-generated parser to speed up the parsing process. That prediction mode does not guarantee that the grammar is applied correctly, though.
This can result in "faux syntax errors". When reparsing using the slower, but 100% correct prediction mode ("LL"), most of these errors vanish. Only if the LL parsing fails, does Rubberduck tell you there is a syntax error.
SLL failures cost quite a lot; they involve catching and logging an exception, so .NET needs to unwind the call stack, and then the parse run gets retried with the slower prediction mode.
As for L197C29
not pointing to anything, see if the same position in the exported code file makes more sense - Rubberduck exports each module to a temp file that is then processed with a simplified grammar, to pick up module & member attributes (unless that was changed): it's possible that line 197 of the exported code file yields more information about what the problem might be.
If you can find & share the code SLL is choking on, we can probably help tweak it to help the parser process it orders of magnitude faster - I completely agree that this shouldn't be needed (tweaking your code, that is), but VBA being a painfully ambiguous language to parse, it's entirely possible some phrasings perform better than others.
I see you're running 32-bit Office on a 64-bit Windows... 32-bit limitations might be interfering too, memory-wise, causing more pressure on the hard drive - and waiting on the hard drive for memory swaps can't be helped by throwing more threads at the problem. This could be why the process is "freezing" (no repro here, 64-bit Excel 2016, Win7 Pro x64).
I'm doing some investigating... do underscores in class module methods trigger an LL parse? That's the only thing I can figure. Here is one piece of code that triggers an LL reparse (from the exported codemodule... it has one tab indent):
`2019-04-08 15:01:37.7998;WARN-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.TokenStreamParserBase;SLL mode failed while parsing the CodePaneCode version of module uf_TreeviewControls at symbol , at L50C26. Retrying using LL.;
[Line 50 follows]
TVHandler.MoveNode tr_Demote
`
(Yes, I know undescores are a bad thing and I understand why... just haven't yet cleaned up my legacy sloppy coding-style!)
32-bit office on 64-bit windows is because I still use the TreeView and ListView controls.
The underscore isn't likely - I setup a repro and verified it does not cause a fallback.
Public Enum foo
f_bar
End Enum
Public Sub Test()
Cells(1, 1).Value = foo.f_bar
End Sub
It might be something to do with the type of TVHandler
or the method MoveNode
, or the fact that tr_Demote
isn't fully qualified and is ambiguous.
from the exported codemodule
Sorry for the confusion - in "the CodePaneCode version of module uf_TreeViewControls", CodePaneCode
is coming from an enum:
(unrelated: why do we have two spaces between enum
and CodeKind
?)
AttributesCode
would be off the exported module; CodePaneCode
is straight off the code panes (SnippetCode
is used for ad-hoc parsing in other areas, not during normal parse tasks). In other words, L50C26 would be around line 50 in the actual editor - the SLL failure is likely unrelated to tr_Demote
, its underscore, or anything in line 50 of the exported module.
Aha... in that case, here is line 50 from the editor... one indent:
TVHandler.AddNewNode , tr_ChildLast
TVHandler
is a custom class that I use to interact with a TreeView control. AddNewNode
looks like this:
Public Function AddNewNode(Optional Relative As MSComctlLib.Node, Optional Relationship As TreeRelationship = tr_ChildLast)
Must be the skipped argument throwing it off, see if using named arguments helps it:
TVHandler.AddNewNode Relationship:=tr_ChildLast
Or, since the specified optional argument has the same value as the default parameter value, consider not supplying it:
TVHandler.AddNewNode
That said, AddNewNode
being a Function
, I'd expect it to return the created node object, to be captured at the call site:
Set node = TVHandler.AddNewNode
Hope it helps!
Am I the only one this is giving "Rubberduck Opportunities" inspection ideas? We could warn about skipped optional arguments thwarting SLL prediction mode (perhaps not in these words) and make a quickfix that introduces named arguments and removes the skipped arguments, making Rubberduck work better.
Is there any way to tell from the logfile how much extra time the SLL failures cost? There are 14 such failures (logfile line 873 is the one I've mentioned).
Would these errors account for Excel freezing in the latest version of RubberDuck, in contrast to the last version, which did not freeze the main Excel window while parsing?
@Dob4115 That's nearly impossible to tell. We'd need to know how much time a completion in SLL mode would've taken to subtract that from the LL parsing time. But we can't know that, nor do we time individual module parses.
It's just not a metric that gives us as development team a lot of information. Parsing basically takes as long as it takes, aside from a few exceptions where ANTLR is bugged or the grammar could be optimized for a specific case (which comes with costs for other cases ...)
We're pretty happy with parsing times as they are, especially considering how many quirks of VBA and the VBE we're handling correctly...
Is there any way to tell from the logfile how much extra time the SLL failures cost?
Takes a bit (lots!) of mucking around with Excel, but yeah... kind of.
I managed to parse enough of the logs to make a data set that can be pivoted, then calculated the difference between [Started] and [Ready] states for each module.
So with a population of 210 modules (!) and pivoting on the MIN timestamp per state/module, time to "Ready" state is between 0:47 and 1:00, with an average of 0:54 and a standard deviation of 0:04.
But "Ready" state is the whole entire process, including parse tree traversals and identifier resolution - which SLL failures don't affect at all.
Now, there's more than a single parse run going on here; the 14 failures are really off 7 modules - and now we're starting to get a clearer picture of how SLL failures are affecting the run:
As you can see, the effect of SLL failures and LL retries is clearly not where the majority of the processing is going. A little bit of simple diffing math, and we can break down the duration of each subsequent parser state:
Note that resolving declarations only begins after all modules reach "Parsed" state, and the SLL-Fail timestamp includes unwinding the call stack to log the exception's stack trace, which doesn't appear to be insignificant.
Also note... the bulk of the "parsing" time is spent resolving the declarations, i.e. traversing the parse trees and building a usable representation that our internal API can consume. That step comletely runs on background threads (that's part of why many small modules is better than few massive ones), and shouldn't be locking up the host for any reason - unless you're running with limited resources.
Attaching the workbook I used to do these pivots, if anyone wants to dig further into this:
Nice analysis! So let me clarify my original "problem" a bit. There are two things that, in combination, give me grief:
If these two issues were resolved, I could at least move around and start reviewing code while waiting for the parser to finish. (I'd be ok with editing while parsing, though it's obvious that the current parse would be of limited value for the newly edited code!)
As I said before, the freezing is new with this build of RD. I didn't think specs were a problem:
Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz
16GB of 2666MHz memory.
Also, I'm not trying to point fingers or downplay what the RD team has already achieved, particularly regarding parsing. Between the addin itself and the blog posts, my coding is getting much more maintainable! (The posts on clean code alone were revolutionary, thank you.) I understand that parsing has gotten much scrutiny and am not trying to suggest that "there has to be a faster way to do it." I'm just trying to better understand what's going on.
Thanks also for spending the time to answer my questions and perform research!
Specs look very much right - note that memory is being artificially constrained by running a 32-bit host: all that RAM is essentially useless, and I suspect a lot of disk activity is going on to compensate: that is what's slowing it down. As already mentioned and proven by the logs, pretty much everything that happens during a parse, happens on a background thread - there's nothing more we can do to avoid "freezing"... unless we move the parsing out-of-process.... which is wicked complex at this point.
I have done some research under a profiler. I see two leads we should really follow:
The second point is the freeze you observe. To fix that we probably just need to read the settings only once and not repeatedly. Moreover, we might want to look for more heavy operations and push them to background tasks.
Regarding the first point, we should evaluate whether there are better methods to collect the data and then write it to the cache in one swoop.
@MDoerner , your second point makes sense with what I am seeing. Everything runs ok for, say, 2/3's of the parse, then the GUI locks up until the code explorer "suddenly" populates.
Version 2.4.1.4627 OS: Microsoft Windows NT 10.0.17134.0, x64 Host Product: Microsoft Office 2016 x86 Host Version: 16.0.4822.1000 Host Executable: EXCEL.EXE
I have several large Excel addins containing library functions (one is 800KB+). A reparse on my machine takes ~45 seconds to a minute, complete with Excel and VBE freezes/blank windows, which renders Excel unusable during any reparse.
Is there any way to speed up parsing or, at least make it less invasive? I have turned off rubberduck for the moment so I can work. A couple ideas I had: