Performance: Restrict the directories searched for testitems to `src/` and `test/`

NHDaly commented 1 year ago

@davidanthoff: We have found that the performance for our large repository is absurdly bad.

For example, consider this video, where you can see that it takes about 2 minutes to pick up changes in our testitems:

https://user-images.githubusercontent.com/1582097/214360420-4ea5b44a-d04d-4cc6-9bdf-f34a4fcc0a3c.mov

I believe that this is because our directory is very large, containing a bunch of test data and benchmark datasets, etc. You can see how big this is by just running tree in the directory -- not even reading the files, just listing them takes 1 minute! Compare that to tree on src/ or test/, which both finish in 30 milliseconds:

$ time tree > /dev/null
tree > /dev/null  13.15s user 33.66s system 73% cpu 1:04.09 total
$ time tree src > /dev/null
tree src > /dev/null  0.02s user 0.01s system 85% cpu 0.036 total
$ time tree test > /dev/null
tree test > /dev/null  0.02s user 0.01s system 91% cpu 0.034 total

I don't see any reason why anyone should need to put tests anywhere except for src/ and test/.

I think by being a bit more prescriptive about this, we can have a much better experience. Finally, I think that there's actually some benefit to the community to have some clarity in the conventions, so that it's easier to compare across projects to find the tests always in the same places.

davidanthoff commented 1 year ago

That would mean that we have to restrict the language server to those folders, the test item detection is just part of the regular parsing of Julia files in the LS and I believe doesn't really add much overhead at all. But of course, the LS generally tries to find all Julia files in the workspace.

I think we can't restrict the LS in general to a subset of folders, users just have too many different structures there. But what we could do is add an option to exclude certain folders, so that if users have a situation like yours there is a way to make things work. That would probably be enough for your use-case, right?

Although, watching your video right now I'm very confused... The live edits are sent from the VS Code client directly to the LS, and I would have thought that the number of files in the workspace actually has no impact on latency there at all... I think unless the LS is just generally busy with other stuff (but unclear what that could be) I'm not sure what is going on there... Here are some things to try out:

can you observe this after all initial LS processing/init stuff is done? So launch VS Code, open the folder, wait until the whole LS init sequence is over and maybe make sure it actually is by waiting until hover help really works.
Can you then observe the same behavior that edits in an open file are not picked up?

If that is so, then I'm pretty sure this is unrelated to the number of files, and due to something else. Maybe the static analysis is so slow that it saturates the LS process or something like that...

NHDaly commented 1 year ago

interesting. makes sense! No matter what else we do to investigate this, can we eliminate whatever caching is going on in the video that's causing it to run out of date test code? That seems really bad, since you can think your newly written test passed even though it didn't. It seems to me that if the LS is stuck, it should simply wait to run the tests until the LS is able to send the new test content?

Or maybe, is there some way to change the way the tests are run so that running the test always involves fetching the latest contents of the test file from the cached file name and line number position?

Even if we're able to identify the cause of the delay in my video and fix it, i would want this situation to be impossible. My concern is that if we make it faster, it could still happen if you're super fast in between making the change and starting the test.

NHDaly commented 1 year ago

I wonder whether we might want to separate the process handling testitems and the process doing the rest of the work in the LS? Even if we're able to shrink the delay, it seems like very large projects could still have some kind of delay, and it would be great if running the tests was as low-latency as possible... What do you think about splitting these?

NHDaly commented 1 year ago

Regarding the rest of your ask:

I'll try that, yeah. I will note that the issue was repeatable, so it is slow for a while. But yeah i'll try again, and make sure to wait until mouseover is working.

I will note that for the last few months I've noticed that i'm seeing https://github.com/julia-vscode/julia-vscode/issues/1093 happening again, even though i've made the changes discussed in that thread (moved the .julia dir out of the repo). There is almost always a julia task running in Activity Monitor at 100% CPU usage. So maybe that's related?

I see that both for our internal repo (RAICode), and also when I have the JuliaLang/julia repo open.

NHDaly commented 1 year ago

Ah, it looks like i've been suffering from https://github.com/julia-vscode/julia-vscode/issues/3037. I wonder if this was causing the server to keep restarting?

EDIT: Aha, yes, now that i'm watching the Output tab, it is indeed crash-looping. I'll keep an eye out for that from now on! I've never paid much attention to this tab before.

davidanthoff commented 1 year ago

So I think the TODO here is that we should just remove all the info about detected tests while the LS is down. That is essentially a state where that information is stale, and then it is probably better to just remove it from the UI.

julia-vscode / TestItemRunner.jl

Performance: Restrict the directories searched for testitems to `src/` and `test/` #45