rish987 / nvim-lspconfig-test

Language-specific integration testing of nvim+LSP via neovim/nvim-lspconfig (intend to upstream to nvim-lua)
2 stars 1 forks source link

Feedback #54

Open mjlbach opened 3 years ago

mjlbach commented 3 years ago

You weren't on matrix, so I thought I would give it here :)

  1. Automated issues: I'm not sure how I feel about the issues. Where are these supposed to be filed? It would be kind of weird to automatically file an issue on failed CI, since that could be because someone made an error while pushing/rebasing/merging a PR. I think we should just do as normal CI (see luacheck example in lspconfig) does and return a list of failed servers in the event CI fails, which will block the merge.

  2. Installation/encapsulation of state: I think it's problematic to install everything in the same VM. Servers may for whatever reason pollute global state with config files, extra dependencies, etc., or having clashing binary names (worst case). I would say we should consider Dockerizing the installation steps and running the test in neovim separately (perhaps parallelized) in multiple containers.

  3. Allowing users to "test" servers by dropping into a container. This builds on 2, but one feature I thought would be cool is if we bundle a minimal config (or defaults.nvim) in the container and allow users to easily run a server on a test. This would be so handy in filing issues against lspconfig, because they could just podman run neovim-lsp/lean3ls and have a development environment with an example lean project, the lean server, and neovim to try to repro.

Anyways, terrific work, curious to hear your thoughts. We can go ahead and move this to nvim-lua whenever you'd like.

rish987 commented 3 years ago

Yeah not on matrix yet (will join soon) but I generally find things easier to keep track of on GH if you don't mind...

  1. You'll find that automated issues can only be created on push or on schedule on the master branch, not on PRs. It's easy to disable, but I think we should have a way to automatically know if something breaks (for example if the language server or one of its dependencies changes, requiring corresponding config changes).
  2. Are you referring to when running locally? I believe GH actions is ready doing things in parallel in separate containers (see recent workflow run).
  3. This sounds like a great enhancement! We should already have most of what we need for this (for leanls/lean3ls anyways), but I don't have much experience with VMs/containers yet. Will create a new issue for it.

Thanks! Feel free to move to nvim-lua whenever you feel it's ready.

mjlbach commented 3 years ago
  1. You'll probably have to spell this a bit more out for me. We're not going to merge a PR that breaks these tests into lspconfig. Is the idea that in the event we do, for some reason, merge a PR in lspconfig that does break these test it will create an issue here only upon merging to lspconfig master? Seems a bit weird and edge-casey, but I guess it's fine.
  2. No, I mean remotely. We can spawn all tests in parallel in containers. I don't believe as it stands they run in parallel.
  3. Yes. I'm also working on getting this going with nix, I'll push some work soon.
rish987 commented 3 years ago
  1. Wasn't thinking of that case in particular -- the main thing here is that this repo is going to have a LOT of dependencies in the future which can change without immediate notice (since VSCode is probably their first priority).

    As a hypothetical, say that Lean 4 is changed such that it requires additional command line arguments to enable a certain (on-spec) LSP feature. Without automatic issue filing, this could potentially pass our notice for a couple of days, confusing users in the meantime, or send us looking for issues with lean.nvim rather than with the fundamental LSP config.

    So perhaps we can try it out for a bit, and decide later whether to keep/disable? Let me know what you think.

rish987 commented 3 years ago
  1. See e.g. this workflow, where 4 jobs ([leanls, lean3ls] x [macos, linux]) are run in parallel. According the the GH docs, jobs are run in parallel and in separate environments by default.
mjlbach commented 3 years ago

Ah so you want to do a separate job for each language server? I guess that's fine.

rish987 commented 3 years ago

Yep that's exactly the way it is right now since we're using a job matrix (see here).

mjlbach commented 3 years ago

Yes, I use a job matrix in lspconfig as well, I'm just not sure that's particularly readable if we have 70 (x4)+ jobs

rish987 commented 3 years ago

Yeah for actions that seems best at the moment (perhaps it makes better use of GitHub's hardware for faster runs?), but once we have what you're talking about with containers for running locally we can decide whether to just use that for a single job.

mjlbach commented 3 years ago

I would use a matrix for each runner, but run each language server test in parallel within the same job. Otherwise it will get extremely long and hard to read on PRs haha

williamboman commented 2 years ago

As for job matrixes: GitHub has a hardcap at 256 permutations. This will be fine for the time being, but worth keeping in mind if there's an interest to leverage matrices to test servers on multiple platforms (Linux x MacOS x Windows x x64 x arm x armhf = 💥).

I wonder if it'd make sense to build this as a reusable workflow action instead? This'd allow other repos to easily plug into these tests, for example:

Think

jobs:
  lsp-server-tests:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest]
    steps:
      - uses: rish987/nvim-lspconfig-tests@v0.6
        with:
          nvim: ./build/bin/nvim
          lspconfig: /path/to/lspconfig
williamboman commented 2 years ago

I've spent some time familiarizing myself with the current tests and the new async API in plenary. I'm taking a slightly different approach in terms of how interactions are done and how things are asserted. There are no direct interactions with any underlying LSP APIs, everything revolves around user actions. I'd also like to explore whether snapshot testing could prove helpful here, where a snapshot would be essentially be a serializable representation of current neovim state (for starters this could be the buffers and their contents, open windows, and cursor positions).

This is 100% POC test code so don't scrutanise it too much haha:

The LSP server is installed prior to running the tests, simply by running

$ nvim --headless --noplugin -u "minimal_init.lua" -c "LspInstall --sync <server_name>"
rish987 commented 2 years ago

I wonder if it'd make sense to build this as a reusable workflow action instead? This'd allow other repos to easily plug into these tests

This sounds perfect, though I don't have much experience with making workflow actions so feel free to have a go at it if you do.

There are no direct interactions with any underlying LSP APIs, everything revolves around user actions.

I figured that testing user actions and UI features was the responsibility of core. Really, my thoughts were just to check that the server was connected at all, and supports and responds at all/reasonably to the requests we expect it to support.

Also judging from that code, it looks like we want to go in the direction of minimizing checks on the specifics of the server response. I agree that we should have such a "default" layer as a minimum for all newly tested language servers. Then all we would need at a minimum is a test file written in that language and a mapping from request name to cursor position, and for each of those we could just check that there is a non-error response.

However, I think we should also have a "custom" layer so that servers can test things specific to their config (e.g. root directory search and off-spec requests). The current tests for Lean kind of combine these two layers, and I think it would be worth separating them.

I noticed you were using a kind of async test which would seem very useful for us, have you implemented this? It seems that's not in plenary.nvim yet, but CMIIW. (edit: nvm, just found it!)

williamboman commented 2 years ago

Yeah I'm a bit divided on which interface to test. My goal is the same as yours - I just want to make sure the server connects and is successfully responding. I figured that testing on a "higher level" would make tests easier to write (they could probably be 100% templated) and would be less prone to changes in core internals. Other than that they serve the exact same purpose.

My experience so far is a bit so-so tbh. I've ended up spending an enormous amount of time debugging tests, but I think it's mostly due to me struggling with the test runner (I opened a few issues in plenary), not the actual tests themselves.

However, I think we should also have a "custom" layer so that servers can test things specific to their config (e.g. root directory search and off-spec requests). The current tests for Lean kind of combine these two layers, and I think it would be worth separating them.

For sure! I think this is a great resource for regression testing and certain off-spec (or in-spec but not really correct-spec :D) functionalities.

I noticed you were using a kind of async test which would seem very useful for us, have you implemented this? It seems that's not in plenary.nvim yet, but CMIIW. (edit: nvm, just found it!)

Yeah this was probably a mistake at this point tbh, it seems to still be quite early stage. I've spent most of the time learning and working around some issues from this, but by now things seem like they've stabilized. The prime benefit, and also the reason I introduced it, was that it allows you more easily test asynchronous behavior (I've since replaced the sleep(magic_number) with a custom assert.wait_for()).

I've now written tests in this style for tsserver, sumneko_lua and rust_analyzer, and taking the amount of time it has taken me, this won't scale at all 😂. Again, I've mostly struggled with the test runner and some async behavior so I'm not saying this is not a feasible approach.

Going forward I'd like to focus more on the quantity of servers, with the trade off of simpler tests. As a first test target, I really just want to make sure that the server succeeds to start, and that's it. We can later expand from there.