openlawlibrary / pygls

A pythonic generic language server
https://pygls.readthedocs.io/en/latest/
Apache License 2.0
588 stars 105 forks source link

An attempt at migrating to `lsprotocol` #264

Closed alcarney closed 2 years ago

alcarney commented 2 years ago

Description

To help drive the conversation in #257 forward here is an attempt at migrating to using lsprotocol for all our type definitions. Note: This PR was done with the mindset of what's the minimum number of changes I can make to get something that works?. So it's very likely better solutions can be found than what I have here currently.

As of now I have something that mostly works in that most (but not all) of the current test suite passes, but I wouldn't be surprised if I managed to introduce a few bugs here and there.

There is a fair amount to digest but I've done my best to split it into separate commits. What follows is a brain dump of everything I've thought about/noticed while working on this - hopefully it's not too overwhelming! šŸ˜¬ See commits and review comments for the fine details

Breaking Changes

Ideally, I would've wanted to not introduce any breaking changes by migrating to lsprotocol, but now I'm not so sure if that will be possible. Here is a list of the breaking changes this PR introduces that I am aware of so far.

Serialization/De-serialization

This commit I'm least happy with is https://github.com/openlawlibrary/pygls/pull/264/commits/eb92fb784a30810f07e92359935c27bbab5ba1dc which attempts to integrate lsprotocol's converter into the serialization/de-serialization setup in pygls. However, the two libraries seem to take a slightly different approach which I think complicates things

pygls tries to hide most of the details surrounding JSON RPC from the user, asking them to only provide values for message fields such as params, and result. This means its LSP_METHODS_MAP only returns types representing the params/results fields of protocol messages

The METHOD_TO_TYPES map in lsprotocol on the other hand simply returns types representing the full JSON RPC message body.

For the most part I think I've managed to resolve the two perspectives without having to change too much of pygls' internals, but I wonder if a cleaner solution could be found if we opted to change pygls' approach to align more closely with lsprotocol

Anyway I'd be interested to hear people's thoughts on this.

XXXOptions vs XXXRegistrationOptions

Edit: After some more investigation, it turns out XXXRegistrationOptions extend XXXOptions to include additional fields required for dynamic registration. https://github.com/openlawlibrary/pygls/pull/264/commits/557f942f930d883107f216144c9701b7bd37889f adjusts how the type to validate against is chosen, see the commit message for more details

~An interesting difference to note is that the METHOD_TO_TYPES map in lsprotocol uses the XXXRegistrationOptions for a method which as far as I can tell is meant to be used with the register/unregister capability part of the spec since it includes the DocumentSelector field.~

~However the current LSP_METHODS_MAP is set up to provide the XXXOptions for a method as it uses the options provided via the @server.feature() decorator to populate its ServerCapabilities.~

~This means migrating to the new mapping will break any code currently in use as the type checking done in the @server.feature() decorator will fail. Note I don't think either approach is necessarily wrong, but I'd be interested to hear people's thoughts on resolving the two perspectives (even if we just ultimately declare it to be a breaking change)~

Questions for the lsprotocol team

Here are some thoughts/questions I had while working through this.

cc @tombh @dgreisen @karthiknadig

Code review checklist (for code reviewer to complete)

karthiknadig commented 2 years ago

Enum fields are now UPPERCASE rather than Captialised

I think we can generate Capitalized unless you feel like the UPPERCASE names make sense?

The type definition for SemanticTokensOptions seems to be missing support for.

Looks like a bug in code generator: https://github.com/microsoft/lsprotocol/issues/52

The type definition for the ServerCapabilities.workspace field appears to be missing

Looks like a bug in code generator: https://github.com/microsoft/lsprotocol/issues/53

Is it possible to have the generator define constants for each of the LSP method names?

Yes. We can do that. https://github.com/microsoft/lsprotocol/issues/54

This test case is currently failing due to the "result": None field being omitted from the serialized JSON - do you know how we can preserve it?

I can look into this. We have a list of properties where we require them to be preserved, we could add this one there. The "results" has some added behavior to it in the spec. I will get back to you on this.

Do you know how we could extend some of the types in lsprotocol to add additional methods?

We can add total_ordering on some of the types that could use it like Range, Position, and Location. But helper methods are out-of-scope.

karthiknadig commented 2 years ago

@alcarney I published a new version of lsprotocol, it adds total_ordering to Position, Capatilized the Enums, adds the missing types and fields, added __eq__ and __repr__ to Location, Position, and Range. Added constants for method names.

Do let me know if the new one addresses the issue with the result filed. We do have a List of fields that we try to preserve, and it might have to be tweaked.

alcarney commented 2 years ago

@karthiknadig awesome thanks.

I've aligned this PR to the new version, but it appears that the issue with the result field is still there

tombh commented 2 years ago

@alcarney! What an epic undertaking šŸ¤“ I feel bad that I've taken so long to absorb the enormity of what you've done. I'm still only starting to understand everything involved, but the highlight is that this definitely introduces a breaking change. Which I think will be the first time for pygls? So let's "go to town on it" (as they say in the UK anyway)! Meaning if we're going to be bumping the project by a whole number, what other breaking changes could be useful to introduce (this would be for another issue thread of course)? Maybe add a heated swimming pool in the back garden? šŸ¤£

Thank you so much for the isolated and descriptive commits, they really ease the cognitive load of getting to grips with everything. From my initial reading I think you've taken sensible and practical decisions, and so I don't see any problems. So I see 3 things to address at the moment:

  1. [ ] The missing "result": None field
  2. [ ] Get a more in-depth review of the "Fix" serialization/deserialization commit
  3. [ ] Write a "Migrating From Pygls v0.12 to v1.0" guide

What we should be aiming for now is a justifiably messy release candidate. So somewhat counter-intuitively I think the standards for "merging" this PR (it will be merged into a v1.0rc branch) are somewhat lower than normal. With that in mind what else do you see being needed on the TODO list?

Would you say that, for those new to this PR, the Align to breaking changes is the best place to get an overview of what the consequences of this PR are? Or is that commit just patches from +karthiknadig's updates?

karthiknadig commented 2 years ago

@tombh I will be looking into the missing "resutls": None test case. Created an issue on lsprotocol repo to track progress: https://github.com/microsoft/lsprotocol/issues/86

karthiknadig commented 2 years ago

@alcarney The issue with preserving result is that the converter does not know about that field being "special" since it is defined in pygls. You could add the following after the class definition, and it should preserve the field. This is a short term solution.

lsp_types._SPECIAL_CLASSES.append(JsonRPCResponseMessage)
lsp_types._SPECIAL_PROPERTIES.append("JsonRPCResponseMessage.result")

In the long term, the de-serialization should not depend on JsonRPCResponseMessage, JsonRPCRequestMessage, or JsonRPCNotification types, in deserialize_message. you should be able to refer to this dictionary lsprotocol.types.METHOD_TO_TYPES for the specific types for each request, notification or response. For this to work correctly, I need to add more converter hooks. Since the LSP spec also provides direction validation, it could be incorporated into the server side. You might get and error like this while de-serializing the response. The request and notification types are fine.

<class 'lsprotocol.types.SemanticTokensRegistrationOptions'> has no usable non-default attributes.

My modification was this, I save the response type along with the future, and get it as needed. I am using the test suite here to catch any missed converter hooks.

def deserialize_message(data, get_response_type, get_params_type=get_method_params_type):
    """Function used to deserialize data received from client."""
    if 'jsonrpc' in data:
        try:
            deserialize_params(data, get_params_type)
        except ValueError:
            raise JsonRpcInvalidParams()

        if 'id' in data:
            if 'method' in data:
                return METHOD_TO_TYPES[data['method']][0](**data)
            elif 'error' in data:
                return converter.structure(data, ResponseErrorMessage)
            else:
                return converter.structure(data, get_response_type(data['id']))
        else:
            return METHOD_TO_TYPES[data['method']][0](**data)

    return data
alcarney commented 2 years ago

Would you say that, for those new to this PR, the Align to breaking changes is the best place to get an overview of what the consequences of this PR are?

Yes, that commit will probably give the best impression of the kind of changes someone consuming pygls can expect to see. However, I'd say we'd want a decision on the Drop pygls.lsp.types, use lsprotocol.types directly fairly soon, as those changes have been kept separate so they can be dropped if required.


From karthiknadig's comment above it seems like the "Fix" serialization/deserialization commit could use some more thought as it sounds like the issue with the result field is because we're not using lsprotocol quite as intended...

Personally, I think there is a nice generic JSON RPC implementation buried in pygls which I (very selfishly) would like to be able to reuse for parts of esbonio, do you have any thoughts on making that a usecase pygls would officially support?

I ask because I think a "simple" fix to the result field issues could be to "lock" pygls to implementing only LSP as provided by lsprotocol and removing the generic JsonRpcMessage types etc.

tombh commented 2 years ago

I haven't been deep into the client/server JSON communication, so my naive understanding is that it's currently a comparatively adhoc implementation. Adhoc in the sense that it caters only to the specific requirements of Pygls, it can't easily be extended to provide extra features beyond LSP, such that esbonio might like. It's also adhoc in the sense that it doesn't formally adhere to the LSP standard as now defined in lsprotocol (although the result issue seems to be the only sticking point with that).

So from my understanding I think you're saying that you have a tension between, on the one hand, wanting to invest more in the JSON RPC to allow it to be more easily extended and, on the other hand, understanding that the most straightforward approach is to just formally adhere to lsprotocol and remove the Pygls JSON RPC?

If my understanding is right, then I think an example of esbonio's usecase for extending the JSON RPC would be good.

alcarney commented 2 years ago

From what I've seen, pygls handles the low level aspects of JSON RPC really well i.e. handling request/response cycles over various transports such as stdio/websockets etc - which we have to keep anyway as lsprotocol doesn't provide any of this.

lsprotocol instead, defines a layer that sits on top of JSON RPC and contains the specific types and methods that is the language server protocol.

The issue in the "Fix" serialization/deserialization commit is partly due to mismatched expectations between the two libraries. If we look at an example JSON RPC response message

{"jsonrpc": "2.0", "id": 1, "result": {"hello": "world"}}

So ideally, the two perspectives need to be aligned and I can think of three possibilities


I think an example of esbonio's usecase for extending the JSON RPC would be good.

Having the ability to define custom JSON RPC based protocols would be very useful for esbonio as you'd be able to

It's actually possible today to swap the types pygls uses out, though as shown below it's a bit clunky and of course not officially supported.

from functools import partial

import pygls.protocol
from pydantic import BaseModel
from pygls.lsp import get_method_params_type
from pygls.lsp import get_method_registration_options_type
from pygls.lsp import get_method_return_type
from pygls.protocol import JsonRPCProtocol
from pygls.server import Server

class ExampleResult(BaseModel):
    hello: str

MY_METHODS_MAP = {"example/method": (None, None, ExampleResult)}

# Override the default method definitions
pygls.protocol.get_method_return_type = partial(get_method_return_type, lsp_methods_map=MY_METHODS_MAP)
pygls.protocol.get_method_params_type = partial(get_method_params_type, lsp_methods_map=MY_METHODS_MAP)
pygls.protocol.get_method_registration_options_type = partial(get_method_registration_options_type, lsp_methods_map=MY_METHODS_MAP)

server = Server(protocol_cls=JsonRPCProtocol)

@server.lsp.fm.feature("example/method")
def example_method(ls: Server, params):
    return ExampleResult(hello="world")
tombh commented 2 years ago

This is a great explanation, thank you.

So my first thought, and it's just a thought, perhaps somewhat academic or philosophical: to what extent should the official LSP standard support custom client-server communication? I think the short answer is it shouldn't. Or at the very least I'm most certainly not saying the answer to this issue is upstream at lsprotocol! I merely pose the question to get a sense of what LSP most fundamentally is and what its responsibilities are or should be. Clearly there needs to be some flexibility somewhere, that's how innovation is nurtured and eventually matured into standards. On the one hand the LSP standard itself is certainly not static, but on the other hand its flexibility isn't such that we can expect upstream support for multiple Sphinx instances overnight!

So, where I think this gets interesting is when thinking about Pygls' role in all this. Maybe Pygls is the place to provide a more formal bridge between the static standard and the ever changing boundaries of innovation. Being one step removed from lsprotocol Pygls doesn't have such strict responsibilities, so maybe it should formally provide a way to override and extend the JSON RPC. Looking at it from this perspective I feel that your second option is the way to go:

lsprotocol provides our default types, but we provide a way to swap them out. However, the new types must provide all book keeping fields

Superficially one might think that such an approach was a half-way house lacking in commitment, that we'll someday find a better solution for. But I don't think that's case. I think it's a good opportunity to define Pygls' role and identity. Namely that it's critical for the LSP ecosystem that innovation is supported and welcomed.

karthiknadig commented 2 years ago

I made an attempt to switch entirely to lsprotocol here: https://github.com/alcarney/pygls/pull/1 . I did this to catch any missed cattrs hooks, found a couple of them. I will be making a lsprotocol update soon. I updated protocol.py to rely on lsprotocol for serialization and deserialization using actual request types instead of the JsonRPC* types. I created https://github.com/alcarney/pygls/pull/1 as an indicator of the extent of changes to tests with the new types. My main concern was the missing cattrs hooks.

alcarney commented 2 years ago

I think this is starting to come together, looks like the test suite passes now, though I have at the very least some linting issues to clear up

Thanks to @karthiknadig for the alcarney#1 PR, it was a big help in figuring out what to do next.

Most of the (important) new changes are in https://github.com/openlawlibrary/pygls/pull/264/commits/04875c5d71540194279753d859a7ec5c66b7b91e which replaces the old "Fix" serialization/deserialization commit and goes beyond the minimum amount of change mantra to a deeper refactoring.

Happy to talk through the changes in more detail later, but since it's quite late I'll leave you with just the highlights on changes made to the JsonRPCProtocol/LanguageServerProtocol classes :smile:

tombh commented 2 years ago

Awesome. As soon as you feel ready let's merge this into a RC branch. I'm happy to approve the changes as soon you're ready.

If a method is not known (as in the case of custom lsp commands) we fall back to pygls's existing generic RPC message classes. Wow, so is this best of both worlds??

alcarney commented 2 years ago

I think this is now in a place where is can be merged to a staging branch so people can start testing it - I'm sure there will be a few issues to find still!

tombh commented 2 years ago

Awesome! I've published it to Pypi (as 1.0.0a) and made a dedicated pre-release PR: https://github.com/openlawlibrary/pygls/pull/273

The new branch is v1-lsprotocol-breaking-alpha. I tried to change this PR's base "into" branch, but I added a new commit (just the version bump to get it released on Pypi) on the new branch, so Github wouldn't let me do it. Can you see how to rebase that commit into here? It'd be good to close this PR with a merge, rather than just close it as unmerged.

alcarney commented 2 years ago

Can you see how to rebase that commit into here?

Not sure sorry... I don't see that commit anywhere - have you pushed it?

It'd be good to close this PR with a merge

It's not the end of the world though if we don't, the main aim of this PR was only to move the conversation forward :)

tombh commented 2 years ago

Oh, I never pushed that commit šŸ¤¦! Sorry. Can you see it on #273 now? It's here if you don't.

alcarney commented 2 years ago

I see it now and have included it in this branch - though I'm not sure if that will help at all as the two branches are now identical - in theory there's nothing to merge?

tombh commented 2 years ago

Yeah you're right, now it says:

There are no new commits between base branch 'v1-lsprotocol-breaking-alpha' and head branch 'lsprotocol'
And so won't let me change the base branch šŸ˜¢

Ah well, not worry. Your code isn't going to disappear šŸ˜Š