Open BoPeng opened 4 years ago
Yeah, the Language Server Protocol Specification doesn't say anything about multi-language documents, so we're kinda shooting in the dark here. Further, basically 0 language servers care about Jupyter's JSON format, or any special syntax kernel authors have added on top of their host language(s).
Our detection is currently based on existing Jupyter approaches like file extension sniffing, contents manager introspection, or in the case of notebooks, the kernel and or notebook metadata. If everything is just sos
, we can't do much for you, nor do we offer many hooks into this system at this point.
Presently, we do handle a small number of transclusions on the front-end, on a kernel-by-kernel basis, and it's rather deeply embedded inside the code. #191 discusses some approaches on how to normalize this, as a set of regular expressions + templates, or maybe some portable grammar and declarative transformation rules. If that's adopted, whether it's handled on the server side or the client side, there will be some hooks to extend it, ideally without having to rebuild the client (ha).
In a related effort, #268 (rough draft of an implementation on #278) suggests changing jupyter_lsp
into a kernel, which handles all the management of language servers. If that approach is adopted, and your kernel supports kernel comms, you might be able to reuse the machinery there and offer your own solution... while that PoC presently treats the language server kernel as a singleton, it's important to me to not inject more "our way or the highway" pieces into the architecture: even for a single language server kernel implementation, it is important to be able to launch multiple instances that handle different documents, again without restarting your whole system.
However, as you've created a multi-language kernel with a special syntax, you've basically created a new language, which is certainly not unique: allthekernels
, pidgy
, metakernel
are all in the same boat, protocol-and-bits-on-disk-wise. In all these cases, you might end up having to create a multi-language Language Server. There are a number of toolkits for different languages for doing so, e.g. pygls or vscode-languageserver-node, which might then in turn have to handle spinning up other language servers, as you really don't want to be writing all these things yourself. Costs aside, an investment in writing a Language Server can pay dividends through usability in any Language Server client.
Finally, there are also a number of upstream discussions occurring around this that may be worth your time to peruse:
Just super fast thought from me: we may want to suport this case and it would be super easy if we settle on per-cell language definitions, but it requires a longer discussion and a consensu in the wider Jupyter community.
Will elaborate next weekend
@krassowski I think if lsp had this per-cell language definition it could work with SoS without much work on the SoS side because SoS kernel in a cell could be treated as another python kernel and its other functions don't have much of an overlap with lsp functionality. Am I wrong? @BoPeng
The problem could potentially be solved at the backend or frontend level.
If I am to implement an sos-language-server, it will of course try to start and use other language servers and act as a proxy. However, the language server protocol might not allow the passing of meta information to the server, so the sos language server might not be able to know the language of the content being passed. Hopefully the situation is not as bad as @bollwyvl said, "shooting in a dark".
It appears easier, and cleaner to implement this at the frontend level since jupyterlab-lsp
is designed to work with multiple language servers anyway. It should be good enough for jupyterlab-lsp
to know which language server to talk to at the cell level. SoS currently has some customized messages for changing cell level kernel (e.g. https://github.com/vatlab/jupyterlab-sos/blob/master/src/index.ts#L497), so it could be quite trivial, as @krassowski pointed out, if jupyterlab-lsp
provides a hook/api for jupyterlab-sos
to dynamically change the language of the kernel. I can work on a PR if this is allowed by the architecture, and acceptable to the team.
However, the language server protocol might not allow the passing of meta information to the server,
lsp had this per-cell language definition
language server protocol might not allow the passing of meta information to the server
I wouldn't hold your breath trying to get changes into LSP! I may be very mistaken, but you'd have to make the case in such pitches very strongly that it would benefit microsoft and vscode pretty directly, and probably land some reference implementation there.
per-cell language definitions
While useful, this doesn't solve the larger problem of per-token transclusions, e.g. line magics, or query languages embedded in strings (#197). Further, this would probably require a breaking change to nbformat
, and probably the jupyter kernel messaging protocol, neither of which like to be changed much.
so the sos language server might not be able to know the language of the content being passed
Assuming your files-on-disk can be statically analyzed by sos-language-server
: the way it would work for a "pure" language server today:
jupyter-lsp
and sos-language-server
sos-language-server
registers itself for whatever file extensions, mime types, and codemirror modes you created for the language sos
:
traitlets
(e.g. jupyter_notebook_config.json
) and setuptools
entry_points
jupyter-lsp
, would advertise the sos
spec on its REST APIsos
kernel session gets started, finding the sos
declaration jupyterlab-lsp
would open a new websocket for sos
, to be used for all sos
documents
jupyter-lsp
would start sos-language-server
jupyterlab-lsp
would start the LSP session with initialize
jupyter-lsp
would proxy this and all messages verbatim to sos-language-server
sos-language-server
jupyterlab-lsp
would finish setup with configuration/didChange
(#245), textDocument/didOpen
, etc.
sos-language-server
would:
sos
syntax (with access to the whole file)jupyter-lsp
initialize
to each of those languagestextDocument/publishDiagnostics
appears easier, and cleaner to implement this at the frontend level
That's your call: as an extension to an extension to an client, the stuff would "only" work with jupyterlab-lsp
, and only with the version of jupyterlab we support, and therefore would need to be upgraded in pretty tight lockstep to the Lab version. No doubt you could write your stuff in such a way that the "guts" could be used in another client.
dynamically change the language of the kernel. I can work on a PR if this is allowed by the architecture, and acceptable to the team.
As I mentioned, have a look at #191. If, instead of requiring hacking a bunch of typescript (which, yes, we should of course allow, expose, and dogfood to implement any of the below), sos
could do one or more of:
jupyter_lsp
jupyter.lsp.transclusions
which sos
can use..to mostly-statically describe "ways to transform code and into what language". The kernel-based approach could potentially offer said code transformation dynamically. This would support these concepts in a way that jupyterlab-lsp
would only be a reference implementation, not the only implementation.
@bollwyvl Thanks for all the info. Let me dive into language server (protocol and implementation) and source code of jupyterlab-lsp
before getting back to you.
@BoPeng just wanted to let you know that I worked hard on restructuring the source code to make it more pleasant to look at. Also, potentially of your interest could be the improved cell-level syntax highlighting that we added here: https://github.com/krassowski/jupyterlab-lsp/pull/319. Please let us know if you are still interested in working on ridging SoS with jupyterlab-lsp - we are always happy to help!
Yes, this is on my TODO list, even relatively high, but I am swamped with other obligations (covid related projects, not surprisingly) and have not been able to work on this.
I had another look at the problem and it is likely a sos language server as @bollwyvl suggested is the best way to proceed. It would be a larger project than what my current bandwidth allows so it will take a while for sos users to make use of language servers.
Okay, instead of creating sos-language-server
, why don't we just use per-cell language-server as we already do with cell magics for IPython? This should be simple to implement.
[...] why don't we just use per-cell language-server as we already do with cell magics for IPython? This should be simple to implement.
Are there any obstacles?
jupyterlab/debugger could/should/must also support multi-language notebooks. Are there similarities in implementation of the multi-language abstractions for LSP and for jupyterlab/debugger DAP support?
Okay, instead of creating sos-language-server, why don't we just use per-cell language-server as we already do with cell magics for IPython? This should be simple to implement.
That will make things much easier for SoS. SoS currently uses kernel
meta data to specify the kernel of each cell, but I am willing to change that to whatever will be used by jupyterlab-lsp.
BTW, congratulations on the merge of https://github.com/jupyter/enhancement-proposals/pull/72 !
Okay, instead of creating
sos-language-server
, why don't we just use per-cell language-server as we already do with cell magics for IPython? This should be simple to implement.
@krassowski I would be interested in implementing this. I am a student and currently writing my master thesis and the project I am working on would benefit from supporting language servers. Unfortunately, the current state of the LSP plugin (if I understand it correctly) doesn't fit our use case, because we use multiple languages in one notebook. Per-cell language servers would solve this issue, so I would like to contribute. Though I am not the most experienced developer and I need to get a bit more familiar with the existing code, so a little guidance or at least general idea on how to solve this would be very much appreciated. :)
You are very welcome to do work on it. I will be available to help and guide you if you run into any problems, though I may have longer response time than usual as next two weeks are very busy for me. I will try write up something with references to the code over the weekend.
Thanks! That sounds great! It may take some time, because I am just at the beginning of my thesis, but I will try my best. Some references would be very helpful indeed.
You are very welcome to do work on it. I will be available to help and guide you if you run into any problems, though I may have longer response time than usual as next two weeks are very busy for me. I will try write up something with references to the code over the weekend.
@krassowski Just a little update: I am still busy with some other parts of my thesis, but I'll have time to work on this issue soon. I know you're busy and I don't want to bother you, but I would really appreciate, if you could write a little guidance regarding the code and a general idea for solving the problem. That would help me a lot. Thanks in advance!
Very quickly: on the relevant implementation level each cell (and file editor but this is not relevant) is represented by ICodeBlockOptions
Code blocks are appended one by one by VirtualDocument.append_code_block()
:
which calls VirtualDocument.prepare_code_block
to extract fragments of code (which may be in different languages) which is actually implemented in VirtualDocument.extract_foreign_code
to append the foreign code to the appropriate foreign virtual document:
There is also a notion of standalone snippets: even if consecutive cells use the same language, sometimes we do not want to merge them into the same virtual document (e.g. %%python
magic which upon execution spawns a new interpreter so it is independent of any previous %%python
magics); this is handled by:
Back to appending code blocks: ICodeBlockOptions
does not pass any cell metadata (is not even aware of cell existence) - it only passes the value and the reference to the editor. To condition extraction of virtual documents on cell metadata this needs to be passed too. The actual append operations are executed in:
with these constructed from editors map in adapters:
which for notebooks are:
and for file editors there is only one editor:
We have to make the information on cell metadata available to the code extracting foreign virtual documents, so it might make sense to generalize the editors()
getter so that it returns an object which includes both CodeEditor.IEditor
and metadata. We may want to have this as a separate getter and reimplement get editors()
as a simple extraction from the result of that new getter for backward compatibility.
Or we may want to go in all-in and rewrite this code from scratch and release a new major version.
One thing I very much want to include is the reference to the cell (its identifier) as a comment in the virtual document content so that we can reliably translate back-and-forth between the virtual document and the cells, enabling full-blown refactoring as described in https://github.com/jupyter-lsp/jupyterlab-lsp/issues/467. It might or might not be beneficial to rewrite the virtual document to live on the backend, but I think that we should first try to implement it in TypeScript.
Many thanks for your great work on language server support. I have just tried
jupyterlab-lsp
, which works great forPython
andR
, but unfortunately does not work for a multi-language kernel SoS that I have developed.The idea behind of SoS is that it is a superkernel that sits between frontend and other kernels (see this illustration for details). It allows the use of multiple kernels in one notebook (through sos-notebook for classic jupyter and jupyterlab-sos for jupyterlab), and allows data exchange among live kernels.
The reason why
jupyterlab-lsp
does not work with SoS is simple: it does not know what language SoS is. If we are to solve this problem, there needs to be some way for SoS to notifyjupyterlab-lsp
the language used for each cell. I can work at both the frontend and backend (e.g. write a language server for SoS), but I am not sure if cell-level language support is at all possible withjupyterlab-lsp
. I would appreciate any insight from the developers if and how this can be done. Thanks.