openlawlibrary / pygls

A pythonic generic language server
https://pygls.readthedocs.io/en/latest/
Apache License 2.0
541 stars 100 forks source link

Provide alternate syntax for registering LSP feature #263

Open Bruntaz opened 1 year ago

Bruntaz commented 1 year ago

Currently, the supported method of registering features in a LangaugeServer is through the @server.feature() decorator. This works well for a lot of use-cases, but the standard usage relies on instantiating a global LanguageServer object, therefore making it difficult to create a custom LanguageServer which comes with registered functionality on instantiation.

The current pattern for registering features is as follows:

class MyLanguageServer(LanguageServer):
    def __init__(self):
        super().__init__()
        self.custom_thing = 123

server = MyLanguageServer()

@server.feature(TEXT_DOCUMENT_DID_OPEN)
def did_open(ls: MyLanguageServer, params: DidOpenTextDocumentParams):
    ls.custom_thing += 1
    ls.show_message(f"Custom thing = {ls.custom_thing}")

In order to register register a feature, the server object must already be instantiated, because it is a bound method on the instance of MyLanguageServer instead of a global decorator, which could be used anywhere. This means that this pattern is impossible:

class MyLanguageServer(LanguageServer):
    def __init__(self):
        super().__init__()
        self.custom_thing = 123

    @server.feature(TEXT_DOCUMENT_DID_OPEN)  # Error because `server` doesn't exist
    def did_open(self, params: DidOpenTextDocumentParams):
        self.custom_thing += 1
        self.show_message(f"Custom thing = {self.custom_thing}")

server = MyLanguageServer()

Note that it is also invalid syntax to write @self.feature(...), @feature(...) or @MyLanguageServer.feature(...) because none of them are in scope there.

There is a workaround for this due to how decorators are implemented in Python, but it is quite ugly and more confusing than necessary:

class MyLanguageServer(LanguageServer):
    def __init__(self):
        super().__init__()
        self.custom_thing = 123
        self.feature(TEXT_DOCUMENT_DID_OPEN)(lambda params: self.did_open(params))

    def did_open(self, params: DidOpenTextDocumentParams):
        self.custom_thing += 1
        self.show_message(f"Custom thing = {self.custom_thing}")

server = MyLanguageServer()

Would it be possible to add a syntax to register features that are methods on classes? I'm looking for something along the lines of:

class MyLanguageServer(LanguageServer):
    def __init__(self):
        super().__init__()
        self.custom_thing = 123

    @feature(TEXT_DOCUMENT_DID_OPEN)
    def did_open(self, params: DidOpenTextDocumentParams):
        self.custom_thing += 1
        self.show_message(f"Custom thing = {self.custom_thing}")

server = MyLanguageServer()

Or alternatively, a simplified way of registering features at runtime, along the lines of:

class MyLanguageServer(LanguageServer):
    def __init__(self):
        super().__init__()
        self.custom_thing = 123
        self.register_feature(TEXT_DOCUMENT_DID_OPEN, self.did_open)

    def did_open(self, params: DidOpenTextDocumentParams):
        self.custom_thing += 1
        self.show_message(f"Custom thing = {self.custom_thing}")

server = MyLanguageServer()
dgreisen commented 1 year ago

Can you provide a concrete use case for why you would want to register on the class rather than the instance?

Bruntaz commented 1 year ago

The server I want to write requires a database to pull data from. To set this up, I want to pass a path to the database on init of the server object so that the class can always guarantee the database it is connected to is valid. Because of this, the reference to the database (and other derived things being stored on the class) should be private members so that the area of the code that instantiates the server cannot modify it. In order to modify state on the language server instance in a global function, the state in the server object must be publicly accessible:

class MyLanguageServer(LanguageServer):
    def __init__(self, db_path: str, db: sqlite3.Connection):
        self._db_path = db_path
        self._db = db
        super().__init__()

    @feature(TEXT_DOCUMENT_DID_OPEN)
    def did_open(self, params: DidOpenTextDocumentParams):
        # If this was global, it wouldn't be able to safely access `self._db`
        result = self._db.execute("SELECT * FROM MyTable")
        ...

For efficiency and simplicity, I want to store several things from the database in caches in the class. Making them all public would imply that at the call-site it is appropriate to modify any of the attributes independently of each other, which could leave the class in an inconsistent state.

The cleanest way of using the feature decorator on an instance in this system that I can think of is this:

class MyLanguageServer(LanguageServer):
    def __init__(self, db_path: str, db: sqlite3.Connection):
        self._db_path = db_path
        self._db = db
        super().__init__()

    def did_open(self, params: DidOpenTextDocumentParams):
        # If this was global, it wouldn't be able to safely access `self._db`
        result = self._db.execute("SELECT * FROM MyTable")
        ...

def create_server(db_path: str, db: sqlite3.Connection):
    server = MyLanguageServer(db_path, db)

    @server.feature(TEXT_DOCUMENT_DID_OPEN)
    def did_open(ls: MyLanguageServer, params: DidOpenTextDocumentParams):
        ls.did_open(params)

    return server

if __name__ == "__main__":
    ...  # Commandline setup
    args = parser.parse_args()
    with sqlite3.connect(args.db) as conn:
        server = create_server(args.db, conn)
        ...

Which isn't very nice and means that if the code has multiple branches/functions to create the actual server instance then each one needs to define the features individually because they are bound to the instance instead of the class.

tombh commented 1 year ago

Apologies for the late reply. I think this sounds very reasonable. Considering there are workarounds I don't think this will be high priority, but I can certainly see how it would help clean up the code, so it is at least a priority.

I don't know the internals well enough to be sure, but I can't think of any downsides to this. Anything springs to mind?

tombh commented 1 year ago

I've just come across another downside to the current approach, namely that it doesn't take advantage of the ability to assert the RPC expectations with types. I'm not a Python aficionado, so I'm wondering about an approach like this:

class LanguageServer:
    def __init__(self):
        # Loop over methods in `self.methods` to find features to enable.
        # `LanguageServer.methods` (therefore the methods below) is a
        # collection of all the features that can be enabled, but just as
        # prototypes expressing the argument and return types.
        # Can be done with something like:
        # `if type(self).method == Base.method:` ...

    def did_open(self, params: DidOpenTextDocumentParams):
        pass

    def completion(self, params: CompletionParams) -> CompletionList:
        return []

    ... # All the available features

class CustomLanguageServer(LanguageServer):
    def did_open(self, params: DidOpenTextDocumentParams):
        CustomFunctionality(params)

    def completion(self, params: CompletionParams) -> CompletionList:
        return CustomCompletions(params)

I don't think there's a way to enforce those types when they're overridden? At the very least, it would mean you could just copy and paste the method bodies into your own CustomLanguageServer class to get you started.

But the point is that we could kill a few birds with one stone, so to speak. Firstly, I assume it solves the original need of this issue? Secondly, considering that we have an upcoming opportunity to make breaking changes in #273, it's at least worthwhile thinking about ideal solutions, now that Pygls is more mature. Though making this kind of change to feature definitions doesn't have to be a breaking change, it can also live alongside the existing method with a log warning about deprecation. Thirdly, #273 is mainly about introducing better support for types, so having a new way to define server features that encourages typing will compliment it.

alcarney commented 1 year ago

One advantage of the current decorator approach, is that it provides a place to provide additional options when registering methods. For example, specifying trigger characters for completion

@server.feature(
    COMPLETION,
    CompletionOptions(
        trigger_characters=[">", ".", ":", "`", "<", "/"], resolve_provider=True
    ),
)
def on_completion(ls: LanguageServer, params: CompletionParams):
    ...

The given CompletionOptions object is then passed through to the ServerCapabilitiesServer and used to populate the required fields.

It's a use case we'd have to keep in mind when designing any alternate syntaxes - maybe a solution would simply be to require a completion_options field alongside the method?

class CustomLanguageServer(LanguageServer):

    completion_options = CompletionOptions(trigger_characters=[">", ".", ":", "`"])

    def completion(self, params: CompletionParams) -> CompletionList:
        return CustomCompletions(params)

Another thing worth mentioning is that method implementations can come in three varieties, synchronous functions, async functions and functions that run in a separate thread (currently registered via @server.thread()). I don't think that places any restrictions on what the syntax can be, but it's probably worth keeping in mind

tombh commented 1 year ago

Good points. I'd completely overlooked them.

I wonder if a default keyword arg would work for those extra completion options? It could be inspected with inspect.signature.parameters(). That gives us the stronger typing and maintains a single place for definition, but makes it weird that there's a param that isn't used during actual requests and notifications.

Good to be thinking about it all though.

tombh commented 1 year ago

Just want to make a note about something else that has been frequently popping into my mind regarding how we register features.

No matter which approach we end up going with, I think it would be helpful to explicitly express whether a feature is a notification or a request. So for example:

@server.feature.notification(TEXT_DOCUMENT_DID_OPEN)
def did_open():
    ...

@server.feature.request(COMPLETION)
def completion():
    ...

Requests and notifications (and maybe commands is a 3rd type?) have slightly different behaviours and significantly different code paths. The most notable is that requests require direct responses, whilst notifications do not, indeed the spec states that any direct responses to a notification should be ignored by the client.

Whilst the current lack of distinction is not critical, I think the self-documenting requirement for stating the nature of the feature will help manage expectations and the ability to better reason about behaviour.

Anyway, just an idea I wanted to note down.

Edit: The thing that actually led me to this idea was that Pygls does not currently have a centralised way to handle all errors. Namely that errors from notifications are merely logged in the server and not expressed back to the client.

nthykier commented 2 months ago

Even with the decorator method, we can probably figure out a way for it to work with a class method. As an example, the python property decorator is quite magical in this regard and might be useful as in inspiration:


class C:

     @property
     def foo(self):
         ...

     @foo.setter
     def foo(self, new_value):
         ...

Conceptually, maybe something like

class CustomLanguageServer(LanguageServer):

    lsp_registry: ClassVar[FeatureRegistry] = FeatureRegistry()

    @lsp_registry.feature(...) # or require/notification
    def did_open(self, params):
         ....

    @lsp_registry.feature(...) # or require/notification
    def did_change(self, params):
         ....    

The FeatureRegistry.feature would then do a similar thing as the existing feature decorator except the wrap_with_server (the self attribute is already there for this purpose). The LanguageServer parent class could check for lsp_registry being provided in the subclass and if so, initialize its feature manager based on that attribute.

Probably it can be made even more smooth than this; like maybe we can have lsp_registry be defined already in the LanguageServer and repurpose @LanguageServer.feature so it works both as a class and as an instance method. A bit of introspection of the first parameter should be sufficient (or via the hybridmethod decorator from https://stackoverflow.com/questions/18078744/python-hybrid-between-regular-method-and-classmethod). It would be a bit more magical but it would retain having a "single" entry point that people need to know about (or two, if we split feature into request and notification).

Let me know if this is something that might be interesting and I can look into writing a patch/PR for it. I, like OP, also had a case where I wanted to do a custom subclass and initialize somethings in the __init__ method. :)

alcarney commented 2 months ago

repurpose @LanguageServer.feature so it works both as a class and as an instance method.

If it's possible to make that work, I'd definitely find that interesting :)

nthykier commented 2 months ago

repurpose @LanguageServer.feature so it works both as a class and as an instance method.

If it's possible to make that work, I'd definitely find that interesting :)

I think at its core it would be the following:

from functools import wraps

class hybridmethod(object):
    def __init__(self, func):
        self.func = func

    def __get__(self, obj, cls):
        context = obj if obj is not None else cls

        @wraps(self.func)
        def hybrid(*args, **kw):
            return self.func(context, *args, **kw)

        # optional, mimic methods some more
        hybrid.__func__ = hybrid.im_func = self.func
        hybrid.__self__ = hybrid.im_self = context

        return hybrid

class LanguageServer:

    ...

    @hybridmethod
    def feature(self_or_cls, ...):
         # We could also have the wrapper inject custom flag to separate (etc) instead of using `isinstance`
        if isinstance(self_or_cls, LanguageServer):
             print("Called with instance")
             # Existing logic here
        else:
             print("Called as class method")
             # New logic here

# Tests
# - This prints "Called with instance"
LanguageServer("foo", "v1.0").feature()
# - This prints "Called as class method
LanguageServer.feature()

The "harder" part is how to handle input the data in the classmethod variant, because we do not have an "instance" to hold the data and we may have to "Hold" the data and wait with acting on it until __init__ is called.

alcarney commented 2 months ago

I think at its core it would be the following:

That looks nice!

The "harder" part is how to handle input the data in the classmethod variant, because we do not have an "instance" to hold the data and we may have to "Hold" the data and wait with acting on it until init is called.

Actually, the data isn't directly held by the server, the @server.feature, and @server.command decorators are just proxies to the FeatureManager which actually holds the information... so perhaps the hybridmethod class could be adapted to be the "home" of the FeatureManager instance?

Then again... that might just create more problems - how would the various components that need it, get access the underlying instance? :thinking: Since the state is already separated, I imagine it's a solvable problem but might require a non-trivial re-wiring of the internals.

Also worth mentioning that if/when #418 lands, while most of the major components of pygls will remain the same, how they are assembled will be different - so I don't know if you want to look at that branch to see if it makes the changes you would have to make any easier/harder?

nthykier commented 2 months ago

Reviewing the situation a bit more, this basically ends up being a variant of @lsp_method which is used for built-in features. Probably have @lsp_method wrap the details in a @dataclass class and then update the LanguageServer.feature to be a @hybridmethod that uses the same data class as @lsp_method (just with slightly different settings - notably a boolean/enum to tell it appart from "built-ins" provided by @lsp_method).

Then the initialization that reads the @lsp_method attribute would then need to delegate to two different methods based on said boolean/enum (either going to the original registration or passing it to equivalent route that feature would go through).

Is there agreement on this approach? If so, then I am might have look at doing a PR for that.

alcarney commented 2 months ago

I think that makes sense, as long as both the current approach and the new syntax work I can't see there being an issue :) Do you have any thoughts on this @tombh?

tombh commented 2 months ago

It sounds great!

Though I'm afraid I've lost my insider knowledge on this area, as I haven't worked with this part of the code in a while. So I can't give any insightful technical feedback.

nthykier commented 1 month ago

Hi,

I have now taken a stab at implementing this feature. The conceptual work is at https://github.com/nthykier/pygls/pull/3. I have not opened a PR against this repo yet, since I would like to discuss the concept a bit before I have you do detailed code review (and I spend more time with the polish).

First off, the PR works expected, you can do:

class FooServer(LanguageServer):
   @LanguageServer.feature(...)
   def implementation(self, params):
       ...

ls = FooServer(...)

@ls.feature(...)
def some_adhoc_feature(ls, ...):
   ...

ls.start_io()

(See https://github.com/nthykier/pygls/blob/6f1d7c21d4b0724d0397dbbcc3e9e8b4ad881011/tests/test_class_registration.py as a concrete example).

So far so good. The part that I feel is a bit sub-optimal is that linters and type checkers do not understand @hybridmethod. This means you get warnings from PyCharm about the @LanguageServer.feature call since it thinks you are calling it as a class method and PyCharm expect it to be an instance method. This will be a paper cut for everyone using the class-based registration.

I am inclined to say we should strongly reconsider this part of the approach and replace it with a separate callable that does not trigger confusing in all existing tools.

It could be as simple as using @lsp_feature_method or @LanguageServer.feature_method instead of @LanguageServer.feature (possible with a tweak of the concrete name).

@tombh @alcarney, could I have you chime in here on the direction you would like to go? :)

tombh commented 1 month ago

That's great you've got something working! This is some pretty complicated meta-language stuff, so hats off. I'd need more time to really review this, but it seems reasonable so far. I think Alex might have more insight.

The multiple self._start_self_check()s do seem a bit hacky. But I don't know enough yet to know what might be better.

I don't fully understand the confusion that linters/type checkers get with @hybridmethod, other than it's too "magical". What would be the separate callable you mention?

alcarney commented 1 month ago

I am inclined to say we should strongly reconsider this part of the approach and replace it with a separate callable that does not trigger confusing in all existing tools.

It could be as simple as using @lsp_feature_method or @LanguageServer.feature_method instead of @LanguageServer.feature (possible with a tweak of the concrete name).

I think I agree, having separate decorators for class-based and instance-based registration makes a lot of sense, especially if it removes the "magic" that confuses the linters.

I'm not the biggest fan of the global state in PENDING_USER_LSP_METHODS, while I admit it's highly unlikely anyone will be defining multiple servers in the same process, it's not impossible and I wouldn't want to prevent it.

Would replacing

def _register_class_based_features(self):
    """Registers generic LSP features from this class."""

    for name in self.lsp_user_methods:
        ...

with

def _register_class_based_features(self):
    """Registers generic LSP features from this class."""

    for name in dir(self):
        ...

remove the need for PENDING_USER_LSP_METHODS? It looks like _register_class_based_features already checks to see a given method is a valid feature method or not. (I've not read the code that closely, so I wouldn't be surprised if I'm missing something!)

Overall, I think it looks really promising! It doesn't look too dis-similar to what I was experimenting with in this commit for pygls' built-in methods.

nthykier commented 1 month ago

That's great you've got something working! This is some pretty complicated meta-language stuff, so hats off. I'd need more time to really review this, but it seems reasonable so far. I think Alex might have more insight.

Thanks. :)

The multiple self._start_self_check()s do seem a bit hacky. But I don't know enough yet to know what might be better.

If it is the multiple calls, then we might be able to wrap them in a decorator or so. Though however we slice it, it is an internal pre-start hook

I don't fully understand the confusion that linters/type checkers get with @hybridmethod, other than it's too "magical". What would be the separate callable you mention?

Linters basically look at methods and look for self vs. cls and the @classmethod (or @staticmethod annotations). From those, they derive the "expected call pattern". The @hybridmethod is either, so they do not recognize it. For the first parameter, I can do one of three:

The alternative would be to have two methods; one for the instance based call (the ls.feature that you already know) and the other one for the class based one (LanguageServer.<some_name_tbh>).

nthykier commented 1 month ago

I am inclined to say we should strongly reconsider this part of the approach and replace it with a separate callable that does not trigger confusing in all existing tools.

I'm not the biggest fan of the global state in PENDING_USER_LSP_METHODS, while I admit it's highly unlikely anyone will be defining multiple servers in the same process, it's not impossible and I wouldn't want to prevent it.

I agree, if I could do it without global state, I would.

Would replacing

def _register_class_based_features(self):
    """Registers generic LSP features from this class."""

    for name in self.lsp_user_methods:
        ...

with

def _register_class_based_features(self):
    """Registers generic LSP features from this class."""

    for name in dir(self):
        ...

remove the need for PENDING_USER_LSP_METHODS? It looks like _register_class_based_features already checks to see a given method is a valid feature method or not. (I've not read the code that closely, so I wouldn't be surprised if I'm missing something!)

I started with that. It broke my consumer code because I had attributes (@properties) that were only applicable when the LanguageServer instance is in a certain state. This means that the dir + getattr approach is quite fragile with subclasses. As an example:

class Base:
  def __init__(self):
     for name in dir(self):
          if name == "square":
              print(f"{name}={getattr(self, name, None)}")

class Subclass(Base):
    def __init__(self, side):
        super().__init__()
        self.side = side

    @property
    def square(self):
        return self.side ** 2

s = Subclass(2)
# Prints "square=None", which logically should not be possible

That can lead to some funky debugging sessions that I would rather not expose our uses too, since the "during initialization" the object might not be in a stable state. As said, I broke my own code with this because I had "safety checks" to ensure certain properties were not called, before a delayed initialization had happened. The dir + getattr violated all of those safety checks. There is an argument to be had that most of these would probably be redundant in a fully converted class-based setup, since I could delay initialization till everything was ready. Though that implies we are asking people to do a "all or nothing" conversion (as in, you cannot just try one method to see how it works, like how I did when I tested whether the code worked).

One solution would be to move the dir + getattr checks to the start up phase, so they are first initialized when the server is started on the class-based method. Though that would mean logging happens later and it would change how interleaving instance vs. class registration would work. I would be fine with saying "do not mix these patterns" as an answer to the later and the former could be written off as a "non-issue".

Overall, I think it looks really promising! It doesn't look too dis-similar to what I was experimenting with in this commit for pygls' built-in methods.

Glad to hear it. :)

nthykier commented 1 month ago

[...]

That can lead to some funky debugging sessions that I would rather not expose our uses too, since the "during initialization" the object might not be in a stable state. As said, I broke my own code with this because I had "safety checks" to ensure certain properties were not called, before a delayed initialization had happened. The dir + getattr violated all of those safety checks. There is an argument to be had that most of these would probably be redundant in a fully converted class-based setup, since I could delay initialization till everything was ready. Though that implies we are asking people to do a "all or nothing" conversion (as in, you cannot just try one method to see how it works, like how I did when I tested whether the code worked).

[...]

Related, even when fully converting to the class-based setups, the consumer might need to initialize some of their attributes before the super().__init__(...) pattern. Personally, I find that a bit iffy because means I have to know about implementation details of the parent class that I often feel that I should not have to care about.

alcarney commented 1 month ago

I started with that. It broke my consumer code because I had attributes (@properties) that were only applicable when the LanguageServer instance is in a certain state. This means that the dir + getattr approach is quite fragile with subclasses. As an example:

Thanks for the explanation, I thought I was missing something :)

While mildly disappointing, the global state is not a deal breaker for me since it's something you could work around if needed to.

That can lead to some funky debugging sessions

because means I have to know about implementation details of the parent class

I agree with you, it's more important to prevent issues like those.

Viicos commented 1 month ago

On a side note, maybe there is some room to support a pattern that I saw being used in esbonio (@alcarney I really like it, this language server project probably deserves more recognition), that I'm also using in a ongoing attempt to implement a LS:

Taking the example of TEXT_DOCUMENT_COMPLETION feature, you might want to register different callbacks for different trigger characters. Esbonio has some kind of setup function (see also _configure_completion) to actually call all registered (esbonio-specific) features.