jupyter-lsp / jupyterlab-lsp

Coding assistance for JupyterLab (code navigation + hover suggestions + linters + autocompletion + rename) using Language Server Protocol
https://jupyterlab-lsp.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.79k stars 145 forks source link

Implement code overrides matching via tokens parsing #347

Open krassowski opened 4 years ago

krassowski commented 4 years ago

What are you trying to do?

Add token based code overrides. The code overrides make the existing language servers work with non-standard syntax which is added by the kernels on top of the underlying language. For example:

from IPython import get_ipython
get_ipython().run_line_magic('ls', '')

The reverse code replacement translates the above back to %ls. This is needed to make the LSP functions which modify the document work (e.g. rename/reformat/quick fix).

Some magics take variables as arguments. For example, %store data would save the value of data variable. A customised code replacement for this IPython magic would call a function using this variable, e.g.:

from IPython import get_ipython
get_ipython().run_line_magic('store', (data))  # mypy: ignore type check

This has several advantages:

How is it done today?

We only support regular expression-based code overrides. This was initially a "band-aid" to make LSP work with IPython, but we are hitting the limitations of regular expressions.

What are the limitations of using regular expressions?

The proposal

Why should it be declared by the kernel? Should not the LSP servers work harder to adopt to the language variation?

Now, I believe that magics are fun. Magics are what makes the IPython and others kernels easier to work with in the interactive setting. I would like to enable kernel developers, especially developers of kernels for languages which are not so well established (I do not expect to see huge changes in IPython magics) to add, change and refine magics without the fear that the cool language features will stop working for them.

Design notes

Before reading further, you may want to have a quick look at the current regular-expression-based implementation:

Design considerations:

Draft of the interface (to be converted to JSON schema once finished):

interface IArgumentStorageOptions {
   commentPrefix: string;
   commentSuffix?: string;
   /** Which characters should be escaped. */
   charactersToEscape: string[];
   /** Character(s) to be pretended before characters to be escaped */
   escapeCharacter: string;
}

interface ITokenType;

interface IToken {
  type: ITokenType;
  value: string;
}

interface ITokenMatcher {
  type: ITokenType;
  pattern?: string;
  /* If true, this is what should be extracted from the match (compare with regexp capturing groups) */
  capture: bool;
}

interface ITokenGroupMatcher {
   tokens: ITokenMatcher[];
  /** how many repetitions of the argument should be supported; can be Inf to support any number */
  repeats: number;
}

interface IArgument {
  id: string;
  match: (ITokenMatcher | ITokenGroupMatcher)[];
}

interface ITokenizer {
   // TODO
}

interface IArgumentMatch {
   id: string;
   // how to join values if multiple captured under id
   join?: string;
}

interface ITokenCodeOverride {
  tokenizer: ITokenizer;
  argumentStorage: IArgumentStorageOptions;
  match: (IToken | IArgument)[];
  replace: (string | IArgumentMatch)[];
  reverse: ITokenCodeOverride;
}

Example IPython magic for %store:

{
    tokenizer: python,
    argumentStorage: {
       commentPrefix: '#'
    },
    match: [
      {type: 'operator', pattern: '%'} as ITokenMatcher,
      {type: 'variable', pattern: 'store'} as ITokenMatcher,
      {
         id: 'store-argument',
         match: [ 
            {type: 'variable', capture: true} as ITokenMatcher,
            {
                tokens: [
                    {type: 'separator'} as ITokenMatcher,
                    {type: 'variable', capture: true} as ITokenMatcher
                ],
                repeats: Math.Inf
            } as ITokenGroupMatcher
         ]
       } as IArgument
   ],
   replace: [
      "get_ipython().run_line_magic('store', (",
      {id: 'store-argument', join: ', '} as IArgumentMatch,
      "))"
   ],
   reverse: {} // TODO
}

Questions to consider

How to make this work for kernel developers?

Testing overrides would need to be made easy for the kernel developers.

Will this work?

blois commented 4 years ago

Are there languages other than Python that have cell magics? This almost feels like an implementation detail of the LSP that it should be supporting 'Notebook Python' but the underlying implementation only supports Python.

Could cell magics be implemented via an extension to the existing Python LSP engines?

will kernel developers in Jupyter community be willing to go this route, or would they prefer that a LSP server is implemented for each kernel / existing language servers are amended for each kernel?

I'm of the (possibly unpopular) opinion that magics introduce significant unexpected tooling overhead. This includes lsp, syntax highlighting, lint, format and graduating code to files). Languages should think very hard before adding them.

krassowski commented 4 years ago

Thank you for chiming in @blois. I respect your point of view, and agree that if there were no magics, the overhead in maintaining the tooling would not be as high.

However, as a happy user of IPython I could not see it without magics; I use them heavily everyday and I believe they are a huge contributor to the IPython success as compared to some other kernels that explicitly decided not to support magics (excluding kernels of languages which do not need magics as the syntax of the language is very flexible in itself).

Could cell magics be implemented via an extension to the existing Python LSP engines?

No, I believe that implementing magics in LSP servers would not be realistically possible. In addition to the reasons listed in Why should it be declared by the kernel? Should not the LSP servers work harder to adopt to the language variation? (above):

Are there languages other than Python that have cell magics

Kernels that support magics in general:

Kernels that do not support magics:

I did not look specifically for cell magics - why are you focusing on cell magics only?

blois commented 4 years ago

I agree on the usefulness of magics, really trying to figure out a way to break the problem down some.

For moving support into the LSP- I wonder if it could be done with one LSP wrapping another and doing the translation at that level. Some of the implementation would be tricky but I'm not sure it's much different than what needs to be done at the editor level. It sounds like VSCode may be trying something along these lines for their notebook support? I see no mention of supporting magics there though.