Improve `global` detection

penguinencounter commented 10 months ago

resolves #112

[x] Integrate ---@plugin
[ ]

JanSharp commented 10 months ago

I was aware of this issue however I've not added parsing for strings because it'd be even more performance cost, and the plugin is already performance heavy. I just added single line comment detection to each module that inserts newlines.

With #110 you could fix global being used "as an expression" inside of a s tring literal by adding ---@plugin disable-next-line: global or similar.

Looking at your implementation there are some issues, such as table allocations that can be removed as well as an O(n) operation that can be changed to O(log(n)) using binary search, which can then be improved further by remembering the lower bound making it even closer to O(1) in most cases. This is noticable with files with many comments, string literals and global usage, a 1.5k line file with ~175 global, ~246 comments and ~237 string literals goes from 59 ms to 667 ms.

However by fixing the issues mentioned above this is actually promising because it could replace the single line comment check from every file and they could all use the same data. This ignore logic would actually be very similar to what I've done in #110.

So what I'd like to do, if it's alright with you, is extend upon your lexer, imporving performance and integrate it with all the modules, potentially combining it with the data structure in #110. After that I'd compare performance with and without the lexer and we'll see if it's good enough to be kept. If it isn't, then you'll have to use ---@plugin disable... once it's released.

(I'll also go ahead and check if there's any other locations where there's a missing space before inserted --)

penguinencounter commented 10 months ago

@JanSharp Thanks for looking! For the performance changes, would you rather

I close this PR so you can open a new one
I add you as a contributor to my fork
something else?

Either way, you can go ahead with the changes :)

JanSharp commented 10 months ago

Sicne I can't bulk reply to the other performance comments I'll do it here: I've had the same thoughts, I think the while take("=") do is probably faster with a string.match, because it's fewer function calls and I also thought about string interning. Speaking of, I also thought about not using string.sub for any of the parsing for that same reason.

For the other ones, I'm not sure if string.match or string.find is faster, something I wanted to test just to make sure. The difference is most likely pretty minor though.

@JanSharp Thanks for looking! For the performance changes, would you rather

I close this PR so you can open a new one

I add you as a contributor to my fork

something else?

Either way, you can go ahead with the changes :)

Option 2 is probably the cleanest/straight forward, if that's alright with you.

And also thank you in advance for the PR. Like I mentioned previously I can't guarantee if it'll be fast enough, but you've set the ground works for actually figuring it out, where I previously simply assumed it would never be close.

JanSharp commented 10 months ago

I just realized that I had the debugger attached when testing performance earlier today. The same file mentioned previously actually went from 27.2 ms to 33.5 ms. Oops, my bad. And fyi I'm testing performance by running it inside of Factorio, so with Factorio Lua. Simply because I'm already using Factorio for plugin dev anyway (to actually view generated diffs). I'll have to do a least some tests in the real plugin environment :see_no_evil:

JanSharp commented 10 months ago

Oh I already did that, I just didn't push the commits, sorry. I've got a habbit of not pushing commits unfortunately.

penguinencounter commented 10 months ago

oh oops shall I move the branch back, or have you already merged it on your side

JanSharp commented 10 months ago

I can merge it no worries. It's better this way anyway so you get credit for it in the git blame. (I felt bad for moving that code making it look like I touched all of it in git history even though you did the work)

JanSharp commented 10 months ago

Huh github doesn't show the other commits in the list in the PR, but you can look at them through other means. Either way, feel free to push more changes if you want to.

My next step is going to be moving the ---@plugin parsing into this parser actually, which is a pretty big change, but I realized that it's the simplest and presumably an efficient solution to having both ---@plugin disabling and string/comment disabling co-exist.

penguinencounter commented 10 months ago

@JanSharp the commits are there, they're just earlier in the list because you committed them locally yesterday

JanSharp commented 10 months ago

Oh dear, I scrolled up because I thought that was the case... yet I missed them. :sweat_smile:

JanSharp commented 10 months ago

Progress update, with the same test file as before: ~27.8ms on 35db845 (upstream/next, next) Merge remote-tracking branch 'origin/pr/110' into next ~35.4ms on f5f1aab Merge branch 'next' into global-is-not-a-string-or-comment ~30.2ms on 7f3f6fa Change code ranges to use binary search ~29.2ms on ba123ae (global-is-not-a-string-or-comment) Move ---@plugin parsing into the comment parser So I can tell you already that it's going to be kept. I even had the idea to move some of the modules into the parser, because when profiling I've noticed that string.match/string.gmatch is the most expensive part in pretty much all the modules. But that'll be a separate PR most likely.

JanSharp commented 10 months ago

Alright, it's done. This is actually 10% +-5% faster than next branch, thanks to the removal of preceding_text captures from 2 modules (7ecc551). Aside from that this is a big change and unfortunately due to potentially overlapping ranges the creation of the ranges and ranges_flags arrays has grown in complexity significantly. This is caused by disable-next-line and disable-line, since the ranges they affect could contain other ranges such as comments or strings. It's written with generated source files which have really long lines in mind, which is why all searches through the ranges array are done through binary searches, even when it's just searching through the last/current line. The biggest issue right now is that even though I've tested it quite a bit, there's no guarantee that there are no bugs. It's too complex for that. Automated testing is on my list of things to add, but damn that'll be a lot of work :sweat_smile: For the record I did check if anything else is inserting -- which would combine with a - and cause syntax errors, but nope global was the only one.

Anyway, @penguinencounter and @justarandomgeek if you'd both like to git checkout the PR and use this version of the plugin in your mods folder (or similar) just to see if something obvious is broken that'd be awesome. With that I'm calling it good.

JanSharp commented 10 months ago

I just noticed that the package-lock.json file got modified in 6fe5df0. I don't know enough to say if that's good/bad or doesn't matter.

justarandomgeek commented 10 months ago

Anyway, @penguinencounter and @justarandomgeek if you'd both like to git checkout the PR and use this version of the plugin in your mods folder (or similar) just to see if something obvious is broken that'd be awesome. With that I'm calling it good.

Ah, excellent, i had been watching the progress initially but got distracted the last couple days, i'll check this out soon!

I just noticed that the package-lock.json file got modified in 6fe5df0. I don't know enough to say if that's good/bad or doesn't matter.

Mostly harmless. It gets touched any time any of the deps change and get pulled in. I'll almost certainly smash it again anyway.

justarandomgeek / vscode-factoriomod-debug

Improve `global` detection #113