Formalize and extend language transclusion

bollwyvl commented 4 years ago

Elevator Pitch

Allow a user to define rules for finding parts of files in a host language that embed other languages. Additionally support referring to the outer language within the inner language. Language servers, libraries, and other things a user might have installed should be easily and robustly able to ship and activate these on installation.

Motivation

Much like with #190 for #189, there are more languages that embed other languages than we'll be able to (or want to) define/maintain. The existing list includes things from IPython built-ins, but doesn't fully capture the scope of other magics, much less what other languages do, especially extremely crazy things like allthekernels, sos or metakernel.

A number of these kinds of things also allow for referring to the outer language, such as {variable} references, up to full-blown templating languages, and having access to completion/diagostics/reference/hover for these nested scopes would be very valuable.

Design Ideas

Add a separate spec to the JSON schema, e.g. transclusions which can be defined/modified without re-compiling lab by traitlets config or entry_points. Move the existing "simple" ones from the hard-coded typescript into defaults on the python side, or simple JSON.

Notional "Simple" Example

LanguageServerManager:
  language_servers: ...
  transclusions:
    ipython-shell-magics:
      host:
        mime_types: 
          - text/x-python
        languages: 
          - python
        file_extensions:
          - .ipy
          - .ipynb
      guest:
        mime_types:
          - application/x-sh
        languages: 
          - sh
        file_extensions:
          - .sh
      patterns:
        line-magic:
          regex: "(?<=\\s)!(.*)$"
          extract: $1
          isolate: true
          host_inclusions:
            variable:
              pattern: "(?<!\\$)\\{([^\\}]+)\\}"
              extract: $1
        cell-magic:
          regex: "^%%bash[^\n]*\n([\n.]+)$"
          extract: $1
          isolate: true
          host_inclusions:
            variable:
              pattern: "(?<!\\$)\\{([^\\}]+)\\}"
              extract: $1

TODO: the ! magic can also be assigned to variables, etc.

x = !ls
# is secretly
x: str = some_function("""ls""")

while

%%bash --bg --proc foo --out foo_out
ls
# is actually
foo: Popen = some_function("""ls""")
foo_out = foo.stdout

More complex things

These would be insufficient for complicated tricks like what's happening in the some of the r-in-python with declaring variables, and for this case being able to register these from an extension that depends on the to-be ILanguageServerManager (or a dedidcated ITransclusionManager) would be required.

User-serviceable and discoverable

Probably need to be configurable in the frontend, e.g.

adding/testing new transclusions
disabling installed ones
some kind of visual indication of them (gutter?)

krassowski commented 4 years ago

Pros:

it just feels so natural to have this in server spec

Cons:

typescript extensions are easier to install and update (and as already mentioned allow for more than regexpr)
schema and maintenance gets more complex when we have this in two places

krassowski commented 4 years ago

some kind of visual indication of them (gutter?)

+100

krassowski commented 4 years ago

Overall I agree that this is the right direction, as it would allow anyone who writes a LanguageServerManager to appreciate the potential configuration options; having them try to go with regexpr and only then fall-back to typescript extension is reasonable - we will need the LanguageServerManager anyways.

bollwyvl commented 4 years ago

typescript extensions are easier to install and update (and as already mentioned allow for more than regexpr)

Disagreed. So long as npm, nodejs, and webpack are in the mix, this is peril-fraught. Putting a json file in a place (even checked into you're project) is about as low effort as you can get. Also the overlap of kernel authors and js/ts people is relatively small, while most packagers will suffer through a little Json munging.

On Sat, Feb 8, 2020, 05:50 Michał Krassowski notifications@github.com wrote:

Overall I agree that this is the right direction, as it would allow anyone who writes a LanguageServerManager to appreciate the potential configuration options; having them try to go with regexpr and only then fall-back to typescript extension is reasonable - we will need the LanguageServerManager anyways.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/krassowski/jupyterlab-lsp/issues/191?email_source=notifications&email_token=AAALCRHNI425GTOT5MWDGW3RB2FBHA5CNFSM4KRXXNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELFPECY#issuecomment-583725579, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALCRHRBTJ6X3LYASGPO5TRB2FBHANCNFSM4KRXXNKQ .

bollwyvl commented 4 years ago

While thinking about #201, another interesting one:

LanguageServerManager:
  language_servers: ...
  transclusions:
    ipython-shell-magics:
      host:
        mime_types: 
          - text/x-dockerfile
        languages: 
          - dockerfile
        file_extensions:
          - Dockerfile
      guest:
        mime_types:
          - application/x-sh
        languages: 
          - sh
        file_extensions:
          - .sh
      patterns:
        RUN:
          regex: "^\\sRUN\\s+(.*)$"
          extract: $1
          isolate: true

The wrinkle here is getting the ARG and ENV tokens into scope.

blois commented 4 years ago

IPython includes some other magics as well, see https://ipython.readthedocs.io/en/stable/interactive/reference.html#automatic-parentheses-and-quotes

I do think it could be useful to expose an API on the kernel where you can ask for some code to be transformed into the underlying language then receive back the transformed code and a source map of the transformations. This is an established pattern for code transformations that can also be used for lint/format as well as highlighting appropriate locations of code errors from the runtime reported location.

This could even be done extensibly, so nested syntaxes could report their transformations- an embedded SQL query could use it to map SQL errors back to the original source code locations.

krassowski commented 4 years ago

After battling with #281 I think that ultimately a hybrid approach might be the best. The kernels doing their bit (e.g. with an API as proposed by @blois) + our custom replacements for user-defined magics and non-standard scenarios.

krassowski commented 4 years ago

Probably deserves a JEP.

kristoffSC commented 2 years ago

Hi, I'm not sure if this is the best place to ask this question... We are having our custom Magic that is pity much an enhanced SQL. We would like to use Jupyter LSP's Syntax highlighting for our magic cells. Ideally we would like to achieved this by some kidn of config where we could defien that %%MyCustomMagic should use SQL syntax highlight mode.

Does this use case fits the the Jupyterlab-LSP enhancement described in this thread?

Assuming I could find some time to contribute, what has to be done to add this functionality? We would be 100% happy only with ability to reuse already existing Highlight modes.

bollwyvl commented 2 years ago

Thanks for the interest!

There's actually several ways this could go down.

One of them might be to do all of this work outside of this repo, as a standalone plugin, as it is only using well-established public APIs, as demonstrated in the example extractor.
- Now that we have an org, this could be, e.g. jupyter-lsp/jupyterlab-lsp-transclusions, which would depend on jupyter(lab)-lsp
Add to this repo
- has benefits, and would allow us to refactor our existing examples into declarative ones, to prove the API

Either way:

On the backend (python): we'd need a new handler that lightly wrapped a new member of the manager.

Much like the manager currently finds language_servers, the manager would initially find more transclusions than in Jupyter config files, as described above in the design ideas from e.g. jupyter_notebook_config.json.

These would be handy, perhaps, for a custom site deployer or packager, and might be able to accept our simplest examples.

To schema: we've updated the one in design ideas a few times, but it's still rather rough. For example, we might want to break it down even further, perhaps, e.g. hoisting line-magic and cell-magic: it's not that big of a hit to limit them by language. And it probably needs some more "human" stuff in it, like description and title and examples. Also, by defining it in the schema, we can have "free" typings in typescript, and python will just keep doing what it does with runtime schema validation, reusing it from the definitions.

Briefly, back to python: at some later date, we'd probably then also define a new entry_point so that these could be defined at the python level... this might be preferable for certain dynamic use cases, but would cause an additional test burden, initially.

On the front end (typescript): this would either be a new plugin, or a new extension. It would implement something like the example extractor, except that it would first fetch all of the transclusions from the server route, and then add each of them. It would probably need a way to refresh its list of transclusions, as well, and unregister/updated ones it created that had disappeared.

And, here might be another place of configuration: loading transclusions from JupyterLab user settings. These would be handy for trying stuff out interactively, without having to restart the (lab) server. Initially, just with Advanced Settings, but eventually with a dedicated UI with preview, one could build up that same Schema.

One final thing (python+?, typescript, schema): definitely in the "nice to have" category, but a MIME renderer for transclusions. Basically, this would allow e.g. ipython to emit a special "display" message when, e.g. register_magic gets called... this is actually what should probably be happening, long term, but it would be great for us to demonstrate it here.

reuse already existing Highlight modes.

Yep: I think defining new modes is important, and could indeed fit in here. I don't remember whether I've built a schema for code mirror simple modes, but that would certainly be possible.... but just saying "whatever codemirror does with it" is also a reasonable stance. The only real issue I've encountered there is using regular expression flags is hard... but maybe i wasn't trying hard enough.

On top of the above changes, there would be:

some python, ts, and robot tests
some docs

Happy to help move this forward!

krassowski commented 2 years ago

And, here might be another place of configuration: loading transclusions from JupyterLab user settings. These would be handy for trying stuff out interactively, without having to restart the (lab) server. Initially, just with Advanced Settings, but eventually with a dedicated UI with preview, one could build up that same Schema.

This might be easiest to start with. Maybe adding a mapping from host-language to RegExpForeignCodeExtractor options (options as in the example). Actually the value would be a list of RegExpForeignCodeExtractor options as there can be multiple extractors added for each language. Or maybe it should be a flat list and we should have extra "host language" argument?

Also cross-referencing https://github.com/jupyter-lsp/jupyterlab-lsp/issues/347 (to highlight that in the future, we will want to add transclusions defined by some grammars/token parsing, in addition to the current regular expression based approach).

bollwyvl commented 2 years ago

This might be easiest to start with.

Easiest to start with, sure, but is not composable/shippable... it's basically no fun to have to tell someone to crack open Advanced Settings and paste in some rando JSON after doing a pip install.

In a binder setting, sure: one can overload settings with with overrides.json, but I think we need at least some way to ship our existing simple ones, or we're just passing the buck.

transclusions defined by some grammars/token parsing

Yeah, sounds rad... so in light of that, perhaps #/transclusions/ipython-shell-magics/patterns/line-magic would get a "type": "regex" field. We could use that as a discriminator in the future.

As for what that might be: dunno, maybe jison or nearlyjs? Either way, we'd have to ship a parser generator runtime to work directly with the data, or once again, we're back in the "you have to build a labextension to play".

kristoffSC commented 2 years ago

Thank You both for answers.

In our use case we would like to have the ability to reuse existing functionalities of code highlight to our custom magic. To be honest we would be ok to manually configure it in some properties, using RegExp etc since it would be done once.

I see you have a clear view and expectations how this feature should look like and how it should be implemented to fit your road map and product vision. However this might be to much for our time capacity so we (I mean me) need to find some fast path that would help us to get the result without breaking the product vision.

Can I use my custom exxtractor.spec.ts, extractor.ts and index.ts files that I created based on SQL/BigQuery as a JupyterLab extension (https://jupyterlab.readthedocs.io/en/stable/extension/extension_dev.html) ? If yest, how I can do that? What about defaults.ts that has to be modified as well when adding a new extractor.

In other words, I would not want to have a need to keep a for of JupyterLab-lsp and build the whole thing just to add few files. I can however maintain some extension code and plug in into the Jupyterlab-lsp

BTW, should I start a new conversation thread for this maybe? I dont want to add noise to this one.

bollwyvl commented 2 years ago

some fast path

Yep, we've tried to avoid fast paths on this repo if they're going to be painful/insoluble in the future (hence why this hasn't happened yet).

Can I use my custom exxtractor.spec.ts, extractor.ts and index.ts files that I created based on SQL/BigQuery as a JupyterLab extension

Yes, here's an example I helped set up for shipping python-wrapped-js language servers (which you don't need, perhaps) along with some extractors, etc.

https://github.com/jupyrdf/graph-lsp

You can then offer a pre-built extension on PyPI, or a source extension on npm (really not recommended in 2021).

should I start a new conversation thread for this maybe? I dont want to add noise to this one.

Also: please don't worry about the length of the thread: it's far worse to have to dig through many, many different ones!

kristoffSC commented 2 years ago

Hi @bollwyvl thanks for sharing https://github.com/jupyrdf/graph-lsp

I managed to modify it to my use case, build using doit (as mentioned in manual) and it seems to work.

However it does not build when using bash .binder/postBuild and so I cannot build it as extension package. Would you have some time to help me with this or maybe explain few things about that project setup? For example it complains on "error Command "schema" not found." and

ValueError: "/mnt/c/GID/Dev/Lab/graph-lsp" is not a valid extension:
Missing extension module "lib/plugin.js"

when runningjupyter labextension install

I would not want to use this thread for this so maybe we could use some other channel? here is my email krzysiek.chmielewski@gmail.com

bollwyvl commented 2 years ago

Yeah, it's a rather opinionated project. I don't mean one should use that repo directly but rather the pattern for publishing it. You might get more leverage from https://github.com/jupyterlab/extension-cookiecutter-ts

jupyter-lsp / jupyterlab-lsp