Test-client hangs and integrating with Pygls

swyddfa / lsp-devtools

Tooling for working with language servers and clients.

https://lsp-devtools.readthedocs.io/en/latest/

57 stars 8 forks source link

Test-client hangs and integrating with Pygls #24

Closed tombh closed 1 year ago

tombh commented 1 year ago

Continuing on from the discussion in the Pygls v1 thread about hanging CI.

Basically, yes absolutely, I think it'd be great to formally have pytest-lsp as part of the Pygls ecosystem 🙇

So let's see here what would be involved to address test-client hangs, which we've identified as a shared problem. Maybe even using pytest-lsp as the solution for Pygls' CI hangs, as well as being the excuse to wholly rely on it in Pygls for all testing. There's even some ongoing internal discussions about creating a pygls-org Github organisation, for which pytest-lsp would be an obvious candidate member.

Fun fact, the test client in pytest-lsp was originally based on the LSP client in pygls' test suite, it's just been adapted to talk to a subprocess rather than an os.pipe() and I'm facing the exact same issue.

What are the advantages of subprocess over os.pipe()? I assume the former is async?

I've been playing around with a few ideas (the most promising having a background thread cancel all pending tasks when the subprocess exits), but nothing that works in all cases...

So I assume that the exiting of the subprocess (namely the tested LSP server instance) isn't a problem? At the very least just prepending POSIX's timeout reliably protects against hangs right? Eg: timeout 5 python custom-pygls-server.py.

The exiting of client "processes" (or rather async futures yes?) is the issue at hand right? I was thinking that maybe a dedicated test-client could have its own request method, that all tests used, and had a single site for setting and guaranteeing a timeout. How does your background thread relate to that idea? Am I mixing up 2 different things?

Also, changing the subject a little ... looking through the code here, I briefly noticed those JSON files that seem to have serialised editor requests? What do you use them for? Is it because there is such divergence in how each editor sends LSP client requests?

alcarney commented 1 year ago

What are the advantages of subprocess over os.pipe()?

By moving the server out into a subprocess you get a "true" end-to-end test, in that pygls-lsp talks to the server over stdio, just like a regular language client, plus it's more flexible

You can test any language server with pytest-lsp, not just the ones written in Python.
pytest-lsp's Python version could be decoupled from your server's Python version, it could adopt newer asyncio features etc more agressively and still be used to test servers running under older Python versions
Not that I've got it working yet but there shouldn't be a reason why you couldn't run a python server wrapped in a tool like coverage or debugpy

I assume the former is async?

There is support for async subprocess in Python, but pytest-lsp isn't using them currently... the core of pygls itself is synchronous, so using a synchronous subprocess seems to play nicer with the architecture - I'm not sure what magic pygls is using to enable async request/responses...

So I assume that the exiting of the subprocess (namely the tested LSP server instance) isn't a problem?

Correct. For pytest-lsp at least, the tests hang is because it's awaiting on something that it will never get a response for. e.g.

result = await client.completion_request(...)

One of the primary reasons why a response will never come is because the server process has already crashed/exited on its own (rather than hung), so pytest-lsp needs to be able to detect this. This is where the background thread comes in, it continuously polls the server process to see if it's still running, killing all awaited tasks if it isn't.

However the other case I've seen is where the server sends a response, but it crashes the response handling code in the client (which is what prompted openlawlibrary/pygls#277) because that code is running in yet another thread it doesn't bubble up to where the test has awaited - I'm still yet to handle this case correctly.

Does that make sense? Did I answer your questions?

I briefly noticed those JSON files that seem to have serialised editor requests? What do you use them for?

There's an initial handshake at the start of an LSP session that basically boils down to

client: Hi, I'm <client name> and I support X, Y, Z from the LSP spec
server: Hi, I'm <server name> and I have implemented A, B, C from the LSP spec

Those JSON files capture the client side of that handshake, allowing pytest-lsp to mimic a particular language client. The dream is that when coupled with the helpers in check.py it will be easy to verify that a server complies with the spec and correctly adapts its responses to the features available in the connected client.

tombh commented 1 year ago

Ah right, I hadn't appreciated that os.pipe() wasn't like a full subprocess, but now that you mention it, I recall that Pygls' current tests can reach into the server state, because the server is just on another thread right? Which, as you say, means that it's not truly end to end testing.

And wow, when you explain the virtues of pytest-lsp's subprocess approach, it makes it so clear what the advantages are, and why it should exist in the LSP ecosystem as a whole! Not just a Pygls tool. Really nice.

Ohh, the core of Pygls isn't async? I need to wrap my head around that, because it can indeed handle concurrent requests, and there are async def functions (although a quick search reveals there's only 4 in the whole code base!). But you're saying that maybe the top-level entry into the server isn't called with an await? Like the design isn't fundamentally an asyncio design. Which means all the gymnastics of managing concurrency is done with lower level asyncio features, like asyncio.loop, asyncio.wrap_future(), etc? I wonder why that is? It's not a big deal, just would be interesting to know the history and reasoning etc. Maybe the first version of Pygls wasn't async and async features have just been slowly added on.

Anyway, I don't think that async subprocess support is critical in pyest-lsp because performance isn't critical either. If parallel tests are needed there are other ways of doing that like pyest-xdist right.

Ok, so yes, what you describe about the background process watching for a server crash, is what I was understanding. So what I'm not clear on is what the advantages of that approach are over just wrapping the way the client makes requests with something that times out? Is it because timing out functions isn't so well supported in Python, as this Stack Overflow thread seems to suggest?

Regarding https://github.com/openlawlibrary/pygls/issues/277, it's really interesting thinking about that now, now that I'm slowly building up layers of understanding. CI hangs and being able to test errors have been sticking points for me ever since I started getting to know Pygls. It seems that Pygls has never really got error handling completely tied down. And I wonder if that's because of a lack of end to end testing? Which pyest-lsp is here to save the day for! Of course even in a perfect architecture, the server could still crash or hang without notifying the client, which is I think the main failure mode we're exploring in this issue. But generally speaking 99.9% of failure modes should be somehow sent to the client. Whether the client does anything with them or not is another story, but the critical point is that error reporting is both essential from a production environment perspective and a testing one. I'm not exactly sure what the point here is that I'm making, other than that all this is important beyond just the convenience of testing.

Wow, you're thinking of everything with those editor-specific request captures! Amazing ✨

alcarney commented 1 year ago

Ohh, the core of Pygls isn't async?

There's definitely an event loop involved as each of the server.start_xxx() functions (apart from pyodide) start one. However all the methods in JsonRPCProtocol (which feels like the core to me) are all sync... then again I've just checked the source and I see it's a subclass of asyncio.Protocol - so I guess pygls is async first?! :man_shrugging:

So what I'm not clear on is what the advantages of that approach are over just wrapping the way the client makes requests with something that times out?

I think the biggest one is error messages. Say you make a request to a server and it times out, did the server crash?, did it reply only to crash the test client? or did it take just a bit too long to respond? You'd probably have to dig through the logs on both sides to piece it back together.

Personally, I feel like pytest-lsp is in control of enough of the variables that it should be able to catch these errors and fail the test with a message that describes what happened, it's "just" a case of figuring out all the places it needs to keep an eye on and which asyncio tasks it needs to cancel.

Not that I'm against timeouts (and maybe that's the initial solution), but I'm hopeful that there's a solution to be found that doesn't require them :)

It seems that Pygls has never really got error handling completely tied down. And I wonder if that's because of a lack of end to end testing?

Speaking with my esbonio hat on, I've actually found pygls to be pretty good at handling errors, they very rarely completely crash the server and are normally at least logged (tangent: maybe pygls should provide a basic logging handler similar to esbonio's so that these messages can be forwarded to the client)

Perhaps the issues you've seen are more due to the fact that writing robust tests involving 2 event loops in separate threads talking to each other is hard? :smile:

tombh commented 1 year ago

Ohh right, so JsonRPCProtocol is a subclass of asyncio.Protocol, interesting. That's really good to bear in mind. My naive assumption is that for an application to be async it needs a top-level, or near enough, entrypoint that's awaited. But having a core that inherits from asyncio.Protocol seems like an equally worthy reason for a project to be an considered async. Although, I wonder if a protocol is more about message passing than it is about concurrency. Anyway, this isn't critically important I don't suppose. I'm only curious about it because it's surprised us both. Does that reflect on us as developers or is it a faint call from the bowels of Pygls seeking out improvement or just clarification? I don't know. But sometimes these little surprises can be indications of architectural hotspots or weakspots, so to speak, that might be candidates for areas, that should they be improved, would have the greatest impact on the quality of the codebase as a whole.

That actually segues into my thoughts on the background-process-watcher approach to handling tests. One of the main reasons I added that LanguageServer.report_server_error() function is because I wanted to test error handling in the language server I'm building (BTW, I also considered that esbonio LSPHandler approach of sending errors as diagnostics, but then thought that sendMessage would be more idiomatic in the LSP world, I'm not sure though?). Seeing as there aren't any actual true e2e tests in the Pygls project itself, and being such a fan of esbonio (😎), and that esbonio is successfully using e2e tests, it was obvious to me that I should invest in that same approach myself. Then when I realised that an LSP client talking to a Pygls server has no way of knowing about "out of band" errors such as those arising from diagnostic processing failures (they were only logged in the server, not being sent to the client), I was even more sure that your approach was onto something. Namely that it was surfacing an overlooked area of Pygls, which is exactly what one desires from a test suite. So the segue from the previous point is that, I think sometimes friction in testability is an indication of friction in the main codebase. It doesn't appear like LSP clients out in the wild have suffered too badly from poor visibility into "out of band" errors. But equally it's reasonable to at least consider that, lack of client-access to errors, reduces the reporting of errors to server authors by end users. Which in turn might reduce the number of Github errors reported on Pygls, and therefore ultimately might contribute to a misperception of the significance of Pygls error reporting or lack thereof. Maybe it's an example of Survivorship Bias, if you've ever come across that?

The tl;dr is: I think test suite design and application design are one and the same thing. The obstacles to writing intuitive tests so often arise from obstacles in the main codebase. Basically, if maintainers are struggling to write tests then end users will also be struggling to use the project itself.

I don't mean that as a hard and fast rule, just as a guiding principle. I still don't feel familiar enough with Pygls to say whether this principle is actually at play here or not. In fact I would indeed weigh your familiarity with Pygls as more relevant than this principle.

Anyway, considering this principle, how does the background-process-watcher approach reflect real-world usage of a text editor against LSP servers? Is this how we want to encourage LSP client authors to interact with LSP servers? Namely, managing another process to watch the server. Or put another way, what would be the most LSP-client-like way of testing Pygls? Like how could we most take advantage of the native features of Pygls in order to test it?

Like I say, I very much defer to your wisdom here, I'm just offering some food for thought 🍣🥞

alcarney commented 1 year ago

Apologies! I don't know how I missed the notification for this! :man_facepalming: :slightly_frowning_face:

is it a faint call from the bowels of Pygls seeking out improvement or just clarification?

I was looking at the start_xxx() functions again the other day, I think they (potentially) could be tidied up a bit as they use what's now considered the low-level event loop APIs as opposed to the high level one introduced (relatively) recently in Python 3.7

I also considered that esbonio LSPHandler approach of sending errors as diagnostics, but then thought that sendMessage would be more idiomatic in the LSP world, I'm not sure though?

Ah.. when I linked I LspHandler I was thinking of its use of window/logMessage to forward Python log messages to the client rather than the diagnostics. In VSCode at least these messages are shown in its "Output" panel.

how does the background-process-watcher approach reflect real-world usage of a text editor against LSP servers?

I'm assuming all clients have to keep an eye on the server process to some extent, just to be able to handle the situation when it suddenly crashes/stops. VSCode's client for example will refuse to restart your server if it crashes too often in quick succession.

what would be the most LSP-client-like way of testing Pygls?

Probably something along the lines of how pytest-lsp works. We'd write a number of server implementations covering parts of the spec and "speak" LSP to them over the various transports (stdio/tcp/ws/etc.)

tombh commented 1 year ago

I've taken even longer to reply 😬 I was hoping I would revisit this area to be able to have an informed response, but I've lost my knowledge in this area. I hope I'll get refreshed on it Soon™

alcarney commented 1 year ago

I'm going to close this since I think we have this covered now with the new pygls.Client and the end to end tests we're starting to add there? Feel free to re-open if that's not the case! :slightly_smiling_face: