Closed tombh closed 1 year ago
What are the advantages of
subprocess
overos.pipe()
?
By moving the server out into a subprocess you get a "true" end-to-end test, in that pygls-lsp
talks to the server over stdio, just like a regular language client, plus it's more flexible
pytest-lsp
, not just the ones written in Python.coverage
or debugpy
I assume the former is async?
There is support for async subprocess in Python, but pytest-lsp
isn't using them currently... the core of pygls
itself is synchronous, so using a synchronous subprocess seems to play nicer with the architecture - I'm not sure what magic pygls
is using to enable async request/responses...
So I assume that the exiting of the subprocess (namely the tested LSP server instance) isn't a problem?
Correct. For pytest-lsp
at least, the tests hang is because it's awaiting on something that it will never get a response for. e.g.
result = await client.completion_request(...)
One of the primary reasons why a response will never come is because the server process has already crashed/exited on its own (rather than hung), so pytest-lsp
needs to be able to detect this. This is where the background thread comes in, it continuously polls the server process to see if it's still running, killing all await
ed tasks if it isn't.
However the other case I've seen is where the server sends a response, but it crashes the response handling code in the client (which is what prompted openlawlibrary/pygls#277) because that code is running in yet another thread it doesn't bubble up to where the test has await
ed - I'm still yet to handle this case correctly.
Does that make sense? Did I answer your questions?
I briefly noticed those JSON files that seem to have serialised editor requests? What do you use them for?
There's an initial handshake at the start of an LSP session that basically boils down to
client
: Hi, I'm <client name>
and I support X, Y, Z from the LSP specserver:
Hi, I'm <server name>
and I have implemented A, B, C from the LSP specThose JSON files capture the client side of that handshake, allowing pytest-lsp
to mimic a particular language client. The dream is that when coupled with the helpers in check.py it will be easy to verify that a server complies with the spec and correctly adapts its responses to the features available in the connected client.
Ah right, I hadn't appreciated that os.pipe()
wasn't like a full subprocess, but now that you mention it, I recall that Pygls' current tests can reach into the server state, because the server is just on another thread right? Which, as you say, means that it's not truly end to end testing.
And wow, when you explain the virtues of pytest-lsp's subprocess approach, it makes it so clear what the advantages are, and why it should exist in the LSP ecosystem as a whole! Not just a Pygls tool. Really nice.
Ohh, the core of Pygls isn't async? I need to wrap my head around that, because it can indeed handle concurrent requests, and there are async def
functions (although a quick search reveals there's only 4 in the whole code base!). But you're saying that maybe the top-level entry into the server isn't called with an await
? Like the design isn't fundamentally an asyncio design. Which means all the gymnastics of managing concurrency is done with lower level asyncio features, like asyncio.loop
, asyncio.wrap_future()
, etc? I wonder why that is? It's not a big deal, just would be interesting to know the history and reasoning etc. Maybe the first version of Pygls wasn't async and async features have just been slowly added on.
Anyway, I don't think that async subprocess support is critical in pyest-lsp because performance isn't critical either. If parallel tests are needed there are other ways of doing that like pyest-xdist right.
Ok, so yes, what you describe about the background process watching for a server crash, is what I was understanding. So what I'm not clear on is what the advantages of that approach are over just wrapping the way the client makes requests with something that times out? Is it because timing out functions isn't so well supported in Python, as this Stack Overflow thread seems to suggest?
Regarding https://github.com/openlawlibrary/pygls/issues/277, it's really interesting thinking about that now, now that I'm slowly building up layers of understanding. CI hangs and being able to test errors have been sticking points for me ever since I started getting to know Pygls. It seems that Pygls has never really got error handling completely tied down. And I wonder if that's because of a lack of end to end testing? Which pyest-lsp is here to save the day for! Of course even in a perfect architecture, the server could still crash or hang without notifying the client, which is I think the main failure mode we're exploring in this issue. But generally speaking 99.9% of failure modes should be somehow sent to the client. Whether the client does anything with them or not is another story, but the critical point is that error reporting is both essential from a production environment perspective and a testing one. I'm not exactly sure what the point here is that I'm making, other than that all this is important beyond just the convenience of testing.
Wow, you're thinking of everything with those editor-specific request captures! Amazing β¨
Ohh, the core of Pygls isn't async?
There's definitely an event loop involved as each of the server.start_xxx()
functions (apart from pyodide) start one. However all the methods in JsonRPCProtocol
(which feels like the core to me) are all sync... then again I've just checked the source and I see it's a subclass of asyncio.Protocol
- so I guess pygls is async first?! :man_shrugging:
So what I'm not clear on is what the advantages of that approach are over just wrapping the way the client makes requests with something that times out?
I think the biggest one is error messages. Say you make a request to a server and it times out, did the server crash?, did it reply only to crash the test client? or did it take just a bit too long to respond? You'd probably have to dig through the logs on both sides to piece it back together.
Personally, I feel like pytest-lsp
is in control of enough of the variables that it should be able to catch these errors and fail the test with a message that describes what happened, it's "just" a case of figuring out all the places it needs to keep an eye on and which asyncio tasks it needs to cancel.
Not that I'm against timeouts (and maybe that's the initial solution), but I'm hopeful that there's a solution to be found that doesn't require them :)
It seems that Pygls has never really got error handling completely tied down. And I wonder if that's because of a lack of end to end testing?
Speaking with my esbonio
hat on, I've actually found pygls to be pretty good at handling errors, they very rarely completely crash the server and are normally at least logged (tangent: maybe pygls
should provide a basic logging handler similar to esbonio's so that these messages can be forwarded to the client)
Perhaps the issues you've seen are more due to the fact that writing robust tests involving 2 event loops in separate threads talking to each other is hard? :smile:
Ohh right, so JsonRPCProtocol
is a subclass of asyncio.Protocol
, interesting. That's really good to bear in mind. My naive assumption is that for an application to be async it needs a top-level, or near enough, entrypoint that's awaited. But having a core that inherits from asyncio.Protocol
seems like an equally worthy reason for a project to be an considered async. Although, I wonder if a protocol is more about message passing than it is about concurrency. Anyway, this isn't critically important I don't suppose. I'm only curious about it because it's surprised us both. Does that reflect on us as developers or is it a faint call from the bowels of Pygls seeking out improvement or just clarification? I don't know. But sometimes these little surprises can be indications of architectural hotspots or weakspots, so to speak, that might be candidates for areas, that should they be improved, would have the greatest impact on the quality of the codebase as a whole.
That actually segues into my thoughts on the background-process-watcher approach to handling tests. One of the main reasons I added that LanguageServer.report_server_error()
function is because I wanted to test error handling in the language server I'm building (BTW, I also considered that esbonio LSPHandler
approach of sending errors as diagnostics, but then thought that sendMessage
would be more idiomatic in the LSP world, I'm not sure though?). Seeing as there aren't any actual true e2e tests in the Pygls project itself, and being such a fan of esbonio (π), and that esbonio is successfully using e2e tests, it was obvious to me that I should invest in that same approach myself. Then when I realised that an LSP client talking to a Pygls server has no way of knowing about "out of band" errors such as those arising from diagnostic processing failures (they were only logged in the server, not being sent to the client), I was even more sure that your approach was onto something. Namely that it was surfacing an overlooked area of Pygls, which is exactly what one desires from a test suite. So the segue from the previous point is that, I think sometimes friction in testability is an indication of friction in the main codebase. It doesn't appear like LSP clients out in the wild have suffered too badly from poor visibility into "out of band" errors. But equally it's reasonable to at least consider that, lack of client-access to errors, reduces the reporting of errors to server authors by end users. Which in turn might reduce the number of Github errors reported on Pygls, and therefore ultimately might contribute to a misperception of the significance of Pygls error reporting or lack thereof. Maybe it's an example of Survivorship Bias, if you've ever come across that?
The tl;dr is: I think test suite design and application design are one and the same thing. The obstacles to writing intuitive tests so often arise from obstacles in the main codebase. Basically, if maintainers are struggling to write tests then end users will also be struggling to use the project itself.
I don't mean that as a hard and fast rule, just as a guiding principle. I still don't feel familiar enough with Pygls to say whether this principle is actually at play here or not. In fact I would indeed weigh your familiarity with Pygls as more relevant than this principle.
Anyway, considering this principle, how does the background-process-watcher approach reflect real-world usage of a text editor against LSP servers? Is this how we want to encourage LSP client authors to interact with LSP servers? Namely, managing another process to watch the server. Or put another way, what would be the most LSP-client-like way of testing Pygls? Like how could we most take advantage of the native features of Pygls in order to test it?
Like I say, I very much defer to your wisdom here, I'm just offering some food for thought π£π₯
Apologies! I don't know how I missed the notification for this! :man_facepalming: :slightly_frowning_face:
is it a faint call from the bowels of Pygls seeking out improvement or just clarification?
I was looking at the start_xxx()
functions again the other day, I think they (potentially) could be tidied up a bit as they use what's now considered the low-level event loop APIs as opposed to the high level one introduced (relatively) recently in Python 3.7
I also considered that esbonio
LSPHandler
approach of sending errors as diagnostics, but then thought thatsendMessage
would be more idiomatic in the LSP world, I'm not sure though?
Ah.. when I linked I LspHandler
I was thinking of its use of window/logMessage
to forward Python log messages to the client rather than the diagnostics. In VSCode at least these messages are shown in its "Output" panel.
how does the background-process-watcher approach reflect real-world usage of a text editor against LSP servers?
I'm assuming all clients have to keep an eye on the server process to some extent, just to be able to handle the situation when it suddenly crashes/stops. VSCode's client for example will refuse to restart your server if it crashes too often in quick succession.
what would be the most LSP-client-like way of testing Pygls?
Probably something along the lines of how pytest-lsp
works. We'd write a number of server implementations covering parts of the spec and "speak" LSP to them over the various transports (stdio/tcp/ws/etc.)
I've taken even longer to reply π¬ I was hoping I would revisit this area to be able to have an informed response, but I've lost my knowledge in this area. I hope I'll get refreshed on it Soonβ’
I'm going to close this since I think we have this covered now with the new pygls.Client
and the end to end tests we're starting to add there?
Feel free to re-open if that's not the case! :slightly_smiling_face:
Continuing on from the discussion in the Pygls v1 thread about hanging CI.
Basically, yes absolutely, I think it'd be great to formally have
pytest-lsp
as part of the Pygls ecosystem πSo let's see here what would be involved to address test-client hangs, which we've identified as a shared problem. Maybe even using
pytest-lsp
as the solution for Pygls' CI hangs, as well as being the excuse to wholly rely on it in Pygls for all testing. There's even some ongoing internal discussions about creating a pygls-org Github organisation, for whichpytest-lsp
would be an obvious candidate member.What are the advantages of
subprocess
overos.pipe()
? I assume the former is async?So I assume that the exiting of the subprocess (namely the tested LSP server instance) isn't a problem? At the very least just prepending POSIX's
timeout
reliably protects against hangs right? Eg:timeout 5 python custom-pygls-server.py
.The exiting of client "processes" (or rather async futures yes?) is the issue at hand right? I was thinking that maybe a dedicated test-client could have its own request method, that all tests used, and had a single site for setting and guaranteeing a timeout. How does your background thread relate to that idea? Am I mixing up 2 different things?
Also, changing the subject a little ... looking through the code here, I briefly noticed those JSON files that seem to have serialised editor requests? What do you use them for? Is it because there is such divergence in how each editor sends LSP client requests?