haskell / haskell-language-server

Official haskell ide support via language server (LSP). Successor of ghcide & haskell-ide-engine.
Apache License 2.0
2.64k stars 354 forks source link

Overhauling the HLS testsuite #3736

Open fendor opened 1 year ago

fendor commented 1 year ago

Problem

The test suite of HLS is gigantic. That's great! Running the test suite of HLS takes a long time. That's bad. Even worse, the test suite has become immensely flaky. Usually, we require at least one CI rerun for failed CI jobs per commit. Then we add insult to injury, since tests fail for components that the PR doesn't even touch. We concluded, that a lot of the flakiness and bad performance originates from lsp-test which is basically a parser generator for parsing a very specific order of lsp messages.

However, HLS is not the most deterministic language server out there, so sometimes messages come out of the expected order. For example, published diagnostics for module A are sent before module B, which leads the test to fail. Rerunning the same test might yield a different order of messages, causing the test to succeed.

The performance and flakiness of our CI has become a real problem, which is why need to come up with solutions.

As a comparison, rust-analyzers while test-suite takes 1.30-min while HLS usually takes up to an hour. It is our dedicated goal to be within the 5-min mark for running our entire test suite.

Solution

There are many ways to reach our destination. So we outline a couple here:

1) Reducing flakiness in lsp-test. We have already identified some common sources of flakiness. The most important is loading too many modules into the same test session. Making sure, each test-case loads only exactly what it needs for the test, reduces the flakiness. 2) I argue that lsp-test is often the wrong tool for testing. To be precise, LSP is the wrong API boundary for most of our tests! lsp-test sets up an entire HLS session, negotiates capabilities, etc... All not needed for testing a plugin Code Action! Thus, we take https://matklad.github.io/2021/05/31/how-to-test.html as our role model and try to gain as much as possible from it.

The rest of the issue, will outline how we envision fixing the test suite using the knowledge gained from point 2. We heavily piggyback from the various pieces of wisdom from https://matklad.github.io/2021/05/31/how-to-test.html.

We have some goals for our test suite:

Now, we discuss some potential API boundaries and test API for Plugins. Ghcide internals may require a different API boundary.

Plugin Tests

Plugins have a rather well-defined interface that is declaratively tested by lsp-test right now. Our tests should be as declarative as possible.

For the API boundaries that we want to test, we identify the following:

michaelpj commented 1 year ago

Just brainstorming. It seems to me that many plugins could have their tests split into two:

And then it's relatively reasonable to assume that the composition of these things works.

The second part does seem like it would benefit from being run in a HLS build session. But it maybe it doesn't even need a LSP server! We could set up the VFS manually, and then just query the state of a rule.

fendor commented 1 year ago

Absolutely! I am working on a prototype for basically all the things you are describing right now.

Slightly unintended side effect is, that we need to re-invent test utils for the different API boundary. That's fine, but a bit more work.

We could have nice builders for VFS that can be persisted if desired (to test FOI onDisk/notOnDisk), in addition enabling a lot of parallelism.

soulomoon commented 7 months ago

Absolutely! I am working on a prototype for basically all the things you are describing right now.

Slightly unintended side effect is, that we need to re-invent test utils for the different API boundary. That's fine, but a bit more work.

We could have nice builders for VFS that can be persisted if desired (to test FOI onDisk/notOnDisk), in addition enabling a lot of parallelism.

Really nice ideas, how is the current state of it.


Current test take up 30 min. Seems much better than the previous 1.30-min test (9.6, ubuntu-latest) succeeded now in 29m 42s

fendor commented 6 months ago

Really nice ideas, how is the current state of it.

A sub-part of this proposal has been merged, the Virtual File Tree: https://github.com/haskell/haskell-language-server/pull/3767

A WIP branch implements the main part of this proposal https://github.com/fendor/haskell-language-server/tree/enhance/plugin-tests-no-lsp-test The hls-eval-plugin tests show how this change test case definition in practice.

I put it a little bit on the back burner, as the performance gain (at least for hls-eval-plugin) wasn't as substantial as hoped, once https://github.com/haskell/haskell-language-server/pull/3767 was used to write tests. It could still be useful, but I think it is more worthwhile to make sure that the tests use the Virtual File Tree than to migrate to a different testing library.

michaelpj commented 6 months ago

Did we figure out whether it gives us any benefit in terms of being able to run tests in parallel?

fendor commented 6 months ago

The main issue that keeps us from parallelism is we use getCurrentDirectory in HLS and ghcide, and the plugin tests do not launch a new HLS process but only a thread: https://github.com/haskell/haskell-language-server/blob/master/hls-test-utils/src/Test/Hls.hs#L609

In a way, the plugin tests have implicit globals they rely on. If we manage to fix that, e.g. ban getCurrentDirectory, makeAbsolute and canonicalizePath or launching a separate executable, we should be able to run tests in parallel right away.

soulomoon commented 6 months ago
soulomoon commented 6 months ago

waitForDiagnosticsWithSource wait for 5 seonds unconditionally. We should avoid using it or fix it up eliminate the unconditional wait.

fendor commented 5 months ago

hls-hlint-plugin-tests does the same thing. I am trying to fix it in #4144