sublimelsp / LSP

Client implementation of the Language Server Protocol for Sublime Text
https://lsp.sublimetext.io/
MIT License
1.63k stars 181 forks source link

Provide `Transport` as parameter when invoking `plugin.on_post_start` #2491

Open twwildey opened 2 months ago

twwildey commented 2 months ago

Is your feature request related to a problem? Please describe.

When launching the LSP server for AWS Q Developer, this LSP server expects an AES encryption key to be written on stdin - before any other messages - in order to encrypt AWS credentials that would later be sent by the LSP client. This ensures plaintext AWS credentials are not persisted in logs by LSP clients. One can observe the need to receive an AES encryption key before any other messages by reading the following code in the LSP server for AWS Q Developer:

  1. https://github.com/aws/language-servers/blob/main/app/aws-lsp-codewhisperer-binary/src/iam-standalone.ts
  2. https://github.com/aws/language-server-runtimes/blob/7e44d7516d55797cf1f7e5614dfa9fce2236e7f3/runtimes/runtimes/standalone.ts#L90C13-L90C34

Describe the solution you'd like

Within an LSP plugin, the only opportunity for the plugin to access the process for the underlying Transport of an LSP server - before any messages are sent to it - is within the on_post_start hook for an AbstractPlugin:

  1. https://github.com/sublimelsp/LSP/blob/fc5a7c75c8cec2e6765e218f7df980315c5c9ff3/plugin/core/windows.py#L282
  2. https://github.com/sublimelsp/LSP/blob/cd7cb682e634657c63105b906bacda99611846fd/plugin/core/sessions.py#L943

However, in order for the AbstractPlugin to access the underlying process in on_post_start, it needs to receive the Transport instance created for the Session. Since the initialize_async method on Session will send the initialize message to the LSP server, there is no other opportunity for the LSP plugin to provide an AES key to the LSP server before other messages are sent:

  1. https://github.com/sublimelsp/LSP/blob/fc5a7c75c8cec2e6765e218f7df980315c5c9ff3/plugin/core/windows.py#L284
  2. https://github.com/sublimelsp/LSP/blob/cd7cb682e634657c63105b906bacda99611846fd/plugin/core/sessions.py#L1510

Given these constraints, it seems the best way to support the LSP server for AWS Q Developer is to provide the Transport instance that the plugin will use as an additional (keyword?) parameter to the on_post_start hook.

Describe alternatives you've considered

In order to get around this issue, I found that transports.py stores a singleton Weakset of processes launched by each Transport instance in a variable called _subprocesses: https://github.com/sublimelsp/LSP/blob/cd7cb682e634657c63105b906bacda99611846fd/plugin/core/transports.py#L285

Currently, I am importing this _subprocesses from transports.py and finding the "most recent" subprocess that the LSP plugin has not seen before. However, this is very brittle and relies on internal implementation details. It would be better for the on_post_start hook to explicitly provide the Transport instance, so there's no possible ambiguity in resolving the correct process to write the AWS encryption key to.

Additional context

I am authoring an LSP plugin for AWS Q Developer at this time, and Amazon will be unwilling to compromise the security profile of this LSP plugin by sending unencrypted AWS credentials over LSP messages to the LSP server for AWS Q Developer. I kindly ask that we consider this concern critical to the success of new LSP plugins that will be meaningful to Sublime Text users.

I would be happy to author a PR for this issue as well, but I'd like to get alignment with the community first before doing so, in order to address any material concerns they may have.

jwortmann commented 2 months ago

Thanks for the request and the detailed description. I haven't worked a lot on the code of the subprocess creation and stdio transports, so perhaps there are others who have more insight on that, but essentially I don't see any reason against exposing the Transport instance to the plugin. Then Transport should probably be added to the public API at https://github.com/sublimelsp/LSP/blob/755436391882ca5eec77b3aef534260c8b0c52f7/plugin/__init__.py#L29-L30

However, from the technical side I don't think we can simply add it in form of a keyword argument to on_post_start. Other plugins implementing that method with its current signature would break if they receive the additional argument. So I guess to keep backwards compatibility, we would need to do something like this (untested, so I can't guarantee that it works):

diff --git a/plugin/core/windows.py b/plugin/core/windows.py
index 29abdea9..136453f4 100644
--- a/plugin/core/windows.py
+++ b/plugin/core/windows.py
@@ -37,6 +37,7 @@ from .workspace import sorted_workspace_folders
 from collections import deque
 from collections import OrderedDict
 from datetime import datetime
+from inspect import signature
 from subprocess import CalledProcessError
 from time import perf_counter
 from typing import Any, Generator, TYPE_CHECKING
@@ -279,7 +280,10 @@ class WindowManager(Manager, WindowConfigChangeListener):
             transport_config = config.resolve_transport_config(variables)
             transport = create_transport(transport_config, transport_cwd, session)
             if plugin_class:
-                plugin_class.on_post_start(self._window, initiating_view, workspace_folders, config)
+                if len(signature(plugin_class.on_post_start).parameters) == 5:
+                    plugin_class.on_post_start(self._window, initiating_view, workspace_folders, config, transport)
+                else:
+                    plugin_class.on_post_start(self._window, initiating_view, workspace_folders, config)
             config.set_view_status(initiating_view, "initialize")
             session.initialize_async(
                 variables=variables,

... which is a bit awkward and results in a linter error (unless we define the signature of the abstract implementation as def on_post_start(cls, *args), but that would be even more awkward).

Of course we could suppress the linter error on this line if there is no other way, but perhaps you have another idea for an implementation?

Or perhaps it would be better (i.e. more high level) to instead pass the session instead of transport and to define a new method on Session which allows to send arbitrary data directly to the server (without the LSP/JSON-serialization) and could be used for the AES key, but is not logged to the log panel?

I would be happy to author a PR for this issue as well, but I'd like to get alignment with the community first before doing so

PRs are always welcome!

twwildey commented 2 months ago

I would be happy to align on a new "hook" API for AbstractPlugin to receive the Transport instance after it's created and before any messages are sent over it. However, it would effectively be called immediately before or after the on_post_start hook anyways, which makes this feel duplicative.

I find your proposal to use signature with plugin_class.on_post_start to maintain backwards compatibility superior in comparison.

I'm also sympathetic to passing the Session instance instead of the Transport instance, since the Session instance would be a higher-level API with access to the Transport instance anyways. However, the Session instance does not have a reference to the Transport instance until Session::initialize_async is invoked by WindowManager::start_async: https://github.com/sublimelsp/LSP/blob/cd7cb682e634657c63105b906bacda99611846fd/plugin/core/sessions.py#L1517

As such, we would either need to 1) invoke plugin_class.on_post_start within Sesson::initialize_async itself or 2) create a new "hook" API to be invoked by Sesson::initialize_async instead. Additionally, on_post_start must be invoked before Session::intialize_async invokes send_request_async on itself: https://github.com/sublimelsp/LSP/blob/cd7cb682e634657c63105b906bacda99611846fd/plugin/core/sessions.py#L1521

I'm seeing that plugin_class.on_post_start receives the initiating_view, which is then passed to an invocation of config.set_view_status by WindowManager::start_async before the invocation of Sesson::initialize_async. Looking at the code for set_view_status, it appears to set the status for the sublime.View only and does not rely on any of its other state: https://github.com/sublimelsp/LSP/blob/fc5a7c75c8cec2e6765e218f7df980315c5c9ff3/plugin/core/types.py#L834

As such, it doesn't seem to matter whether on_post_start is invoked before of after config.set_view_status is invoked. Any modifications to the initiating_view by plugin_class.on_post_start should not affect the invocation of config.set_view_status. We could even migrate the invocation of config.set_view_status into Session::initialize_async without issue.

rwols commented 2 months ago

We can keep this on_post_start compatible with other plugins in the following way. Currently it's documented to return None. But we can let it return an Optional[bytes] which would be understood to be some raw bytes to send before sending the initialize request. Would that work?

There are some changes needed in plugin/core/transports.py to send these raw optional bytes, but that should be doable.

rchl commented 2 months ago

I don't feel like we need to be shoe-horning this to an existing method if it doesn't feel right. It would just make the API more confusing.

A new method like on_transport_ready (or whatever fits) would IMO be clearer.

rchl commented 2 months ago

And note that stdio is only one of the ways the servers can use to communicate with so we might want to create a higher level method to abstract that. We should think if something like that fits with other methods (tcp, node-ipc in the future). If it doesn't then the method name should probably be very specific like write_to_stdin.

rwols commented 2 months ago

But you can't really send anything else after the initialize request because at that point the server and client are in a state machine that awaits requests and handles responses. So, I don't think we should really be advertising this transport object to plugins. And the requirement of it not being logged is also somewhat specific.

rwols commented 2 months ago

We should think if something like that fits with other methods (tcp, node-ipc in the future).

I don't understand this comment... Isn't the Transport abstraction the point of not having to care about the transport details (tcp, tls, node-ipc, websocket, ...)?

rchl commented 2 months ago

But you can't really send anything else after the initialize request

But why would that be an argument against what I've suggested? Maybe I'm missing something since I haven't re-read the thread recently?

We should think if something like that fits with other methods (tcp, node-ipc in the future).

I don't understand this comment... Isn't the Transport abstraction the point of not having to care about the transport details (tcp, tls, node-ipc, websocket, ...)?

My point is that we should think whether such functionality would also apply to other transport methods. Like, is it even possible to do something equivalent with those? Because if not then what should happen in those cases? Should the method throw or do nothing or...?