Defining custom cell handlers in Extension and interaction with computation engine

cderv commented 1 year ago

This came up through discussion in https://github.com/quarto-dev/quarto-cli/discussions/4761 by @coatless through the use case of webr Quarto extension (https://github.com/coatless/quarto-webr)

Main question is :

Is there a way to register engine requirements or options in a quarto extension, e.g. _extension.yaml?

Opening this issue to track idea and discussion. Below some thoughts on all this to consider (long writing to write all my thoughts for later reference)

webr extension syntax used is inspired by shinylive extension

---
title: "Testing"
format: html
filters:
- webr
---

```{webr}
1 + 1


When this is run with `engine: knitr` for example, there will be some interaction 

* A warning 

  ```r
  Warning message:
  In get_engine(options$engine) :
    Unknown language engine 'webr' (must be registered via knit_engines$set()).

A chunk processing by knitr which will lead to different output in intermediate .md file (because Quarto adds the .cell-code class for example to wrap knitr results .

jupyter

[trace] Parsed [CodeBlock ("",["{webr}"],[]) "1 + 1 "] at line 12
[trace] Parsed [CodeBlock ("",["{webr}"],[]) "message(\"hello\")"] at line 17

knitr

[trace] Parsed [CodeBlock ("",["webr","cell-code"],[]) "message(\"hello\")" at line 20
[trace] Parsed [Div ("",["cell"],[]) [CodeBlock ("",["webr","cell-code"],[] at line 23

We don't have those issue with our internal cell handler because we define a custom knitr engine for those and specific processing (no wrapping).

Behind this usecase, there is the generic question of How to use Quarto extension to define a CodeBlock handler in Lua and have it to play well with the rest ?

@jjallaire I know we discussed in the past the Quarto Extension for custom engine and it was not something straightfoward to consider because best integration would mean some typescript processing as we do for ojs, mermaid and dot. However, it seems there is a need to something, maybe just tooling, for Quarto extension that would define specific CodeBlock processing. Two use case for now shinylive and webr.

More thoughts on the current context based on my answer https://github.com/quarto-dev/quarto-cli/discussions/4761#discussioncomment-5319569

Using standard Pandoc fenced code attribute syntax prevent any processing by knitr. This means something like

```{.webr}
1 + 1

knitr will not parse and process this part of the code, and then Lua filter can target using `el.attr.classes:includes("webr")`. 
As notes `shinylive` uses `shinylive-python` as cell engine name. Using dash `-` makes knitr also not see that as a knitr engine to process. 

This is an easy solution to make distinction between computation (or internal) cell handlers and Lua based ones. 

Supports for YAML comments as option is a question though, and this is where `shinylive` solves this by using [typescript](https://github.com/rstudio/shinylive/blob/main/src/scripts/codeblock-to-json.ts) for [parsing comments too](https://github.com/rstudio/shinylive/blob/main/src/parse-codeblock.ts) called from [Lua directly](https://github.com/quarto-ext/shinylive/blob/e5ac95392fd40483f58294ec2ff22944b393eb95/_extensions/quarto-ext/shinylive/shinylive.lua#L71-L76)

This shows that probably some tooling is necessary in Quarto Lua API if we want to help with this pattern. 

OTOH, best would probably be to use common cell handler syntax like 
````markdown
```{webr}
1 + 1


This requires probably a way to indicate to computation engines like knitr to skip the parsing and processing of such cells - exactly what we do for `ojs`, `mermaid` and `dot`. Currently [we tell knitr to output those cells as is](https://github.com/quarto-dev/quarto-cli/blob/9debd347ae6b053fd2e445cffb25f7cdec94d082/src/resources/rmd/execute.R#L45-L68). 

Making that possible directly from extension configuration would allow to say to computation engine from the extension to pass through the code block. 

Anyhow, some challenges are: 

* What pattern do we want to promote for this ? 
    * Cell handler be done exclusively from Lua as shinylive and webr as of now  ?
    * or maybe some pre-processing defined by extension that would run as part of quarto render typescript pre processing ?
* What helpers tooling could be necessary in Quarto Lua API for extension developer ? 
* How does this integrate and behave correctly with YAML intelligence for options as comments and existing computation engine, with their own cell engine ?

jjallaire commented 1 year ago

I think what we want to do here is to provide a fully capable Lua API for registering custom engines (IIRC I actually think this might exist already in some form but we haven't taken it all the way to public API --- @cscheid ?)

Note that I expect that using that engine syntax might fail in pandoc 3 (for both webr and shinylive) as its now no longer valid pandoc markdown to use that syntax. So for the time being I think using e.g. ```{.webr} and manually parsing cell options is the way to go, but it seems like we could promote this to a fully exposed API (w/ correct handling in knitr, support for cell option parsing, etc.) in v1.4.

cderv commented 1 year ago

I think what we want to do here is to provide a fully capable Lua API for registering custom engines (IIRC I actually think this might exist already in some form but we haven't taken it all the way to public API)

Interesting ! Did not know that. 👀

Note that I expect that using that engine syntax might fail in pandoc 3 (for both webr and shinylive) as its now no longer valid pandoc markdown to use that syntax

Oh right I forgot about that. But somehow it works with quarto 1.3 so maybe we are 'patching' this behavior ? Converting to format: native we correctly get

CodeBlock ( "" , [ "{webr}" ] , [] ) "1 + 1 "

while using quarto pandoc we get

Para [ Code ( "" , [] , [] ) "{webr} 1 + 1" ]

it seems like we could promote this to a fully exposed API (w/ correct handling in knitr, support for cell option parsing, etc.) in v1.4.

It would be great to tackle this for 1.4

cscheid commented 1 year ago

I think what we want to do here is to provide a fully capable Lua API for registering custom engines (IIRC I actually think this might exist already in some form but we haven't taken it all the way to public API --- @cscheid ?)

There's code spread all over the place, but the Custom AST work allows us to start thinking about doing custom engineds for real. 1.4 is a good target, except that I'm going to be on crossrefs.

Note that I expect that using that engine syntax might fail in pandoc 3 (for both webr and shinylive) as its now no longer valid pandoc markdown to use that syntax. So for the time being I think using e.g. ```{.webr} and manually parsing cell options is the way to go, but it seems like we could promote this to a fully exposed API (w/ correct handling in knitr, support for cell option parsing, etc.) in v1.4.

With the custom reader we added in 1.3, the {webr} syntax remains valid in Lua filters. So I would vote to keep things consistent.

The main complication I see is that we'd have to figure out a way to tell knitr to stand down on those cells, and knitr runs before pandoc (that's why we have the {ojs} engine in knitr being a no-op, for example).

cderv commented 1 year ago

The main complication I see is that we'd have to figure out a way to tell knitr to stand down on those cells, and knitr runs before pandoc (that's why we have the {ojs} engine in knitr being a no-op, for example).

I would have thought it was the easy part 😅 My thinking was a new field in extension so that quarto can know this is an extension with a filter aiming to provide code cell handler, and then add the class name to the cells handler languages values already passed to knitr so that 'dummy' engine are defined for those. Would that not be enough ?

cscheid commented 1 year ago

The main complication I see is that we'd have to figure out a way to tell knitr to stand down on those cells, and knitr runs before pandoc (that's why we have the {ojs} engine in knitr being a no-op, for example).

I would have thought it was the easy part 😅 My thinking was a new field in extension so that quarto can know this is an extension with a filter aiming to provide code cell handler, and then add the class name to the cells handler languages values already passed to knitr so that 'dummy' engine are defined for those. Would that not be enough ?

It might be close to enough, but not exactly, because now the set of languages that needs to be handled is not static, but depends on Lua code. Naively, that would mean that IDEs now need to run Lua code to decide how to handle code cells. That's not a practical option.

Remember that part of quarto executes inside VS Code and RStudio, and needs to be able to know "what's going on" in a .qmd file, before Pandoc is executed. That's where the complication arises.

jjallaire commented 1 year ago

Practically though VS Code and RStudio already ignore cells they don't know about so I don't think that's a blocker

jjallaire commented 1 year ago

Another solution is that knitr could ignore engines it doesn't know about (passing them through). This could be an option disabled by default that we enable in Quarto?

cderv commented 1 year ago

Remember that part of quarto executes inside VS Code and RStudio, and needs to be able to know "what's going on" in a .qmd file, before Pandoc is executed. That's where the complication arises.

Oh yeah I have this in mind - for a complete solution with full experience, I agree this is not trivial

Practically though VS Code and RStudio already ignore cells they don't know about so I don't think that's a blocker

I was thinking the same - I didn't see that as an issue but more a feature-less experience. If engine are rendered correctly, then IDE supports for YAML intelligence is a bonus to me, and not a blocker.

Another solution is that knitr could ignore engines it doesn't know about (passing them through). This could be an option disabled by default that we enable in Quarto?

@jjallaire Yes definitely - we could detect rendering within Quarto from knitr too. what we do already for unknown engine (as unregistered with knit_engine$set())

Throw a warning -> we would not throw the warning when we are in quarto rendering
Output code chunk source if echo = TRUE using knit_hooks$get('source'). We probably need to adapt that as we want to keep the source markdown with ```{engine} and not ```engine as it would be done.

This could be done in knitr with a special "pass-through" engine and I'll make a PR. This will require updating knitr dependency requirement for Quarto. Should it be preferred to tweak knitr functions from Quarto R files directly ?

jjallaire commented 1 year ago

We definitely can't update the knitr dependency requirement (as that would cause a ton of existing code to fail).

What about this: In Quarto we do a knitr engine lookup and if it fails then we give it the ignore engine treatment? (we could do this for v1.3). The only downside I see here is that you can't point engines at arbitrary binaries but given how many engines are built it it seems like this hardly matters.

cderv commented 1 year ago

I'll come up with a solution within our knitr related code - it needs to be in hooks somehow probably as some engine can be defined after a packages is loaded from within the document. So we need to do this check in the right place, (or patch a knitr function called for each chunk engine processing as we do in other places)

jjallaire commented 1 year ago

If it helps with reducing regression risk for the fix we can accept far less than perfect here as we really only need to catch engines used by Quarto extensions (of which we only know of two right now). Honestly it would probably just fine to skip the webr and shinylive engines explicitly (I'm saying this only in the spirit of a low risk fix close to final freeze, obviously we need something more robust longer term).

cscheid commented 1 year ago

I'm 100% fine with a quick fix like that for 1.3. What I was saying was in the context of at least 1.4. I was assuming we'd do this for the milestone here (sorry, I should have been clearer)

jjallaire commented 1 year ago

Another approach which we sort of already use in VS Code is to to do special handling for engines with a suffix of -python or -r. Note here is our syntax highlighting rule for Python in VS Code

https://github.com/quarto-dev/quarto/blob/main/apps/vscode/syntaxes/build-lang.js#L317

Then shinylive uses this convention

https://github.com/quarto-ext/shinylive#usage

So we could also mask out engines that have a -r or -python suffix (e.g. webr-r or web-r). This also allows us to do completion and cell execution for these blocks in VS Code (by convention rather than by configuration or introspection).

cderv commented 1 year ago

Another approach which we sort of already use in VS Code is to to do special handling for engines with a suffix of -python or -r. Note here is our syntax highlighting rule for Python in VS Code

So that is why shinylive do use the convention - I was wondering.

So we could also mask out engines that have a -r or -python suffix (e.g. webr-r or web-r). This also allows us to do completion and cell execution for these blocks in VS Code (by convention rather than by configuration or introspection).

I know I write long detailed issue so it is a bit hidden but as said in original post

Using dash - makes knitr also not see that as a knitr engine to process.

knitr does not recognize ```{engine-dash} as a codeblock with engine to parse. A dash in engine name is not supported. So using this convention already works without any change to Quarto or knitr. The .qmd content is passed through in .md untouch because knitr parses it as Markdown text and not code block with an engine.

So even more low risk solution would be to advise extensions like webr to use this convention ?

jjallaire commented 1 year ago

Yes I would 100% agree with this (and it would allow us to easily support highlighting and completion for web-r cells in vscode)

jjallaire commented 1 year ago

Note that highlighting will work today but I can make completion work w/o much trouble

jjallaire commented 1 year ago

I just checked and Run Cell and Completions already work in VS Code for e.g. shinylive-python and web-r so this is definitely the way to go.

The visual editor currently doesn't handle these well though, I'll fix that ASAP.

jjallaire commented 1 year ago

The visual editor in VS Code now correctly handles e.g. shinylive-python and web-r (this includes completion and code execution). This is in version 1.76.0.

Note this will work for any prefixed executable block (e.g. foo-python and foo-r work equally well)

cderv commented 1 year ago

Awesome thank you ! I'll leave this open for the context @cscheid shared about a long term solution.

cderv commented 11 months ago

@cscheid I believe this is to postpone. Do you think we need to discuss for 1.5 or should we just push that to Future to deal later ?

cderv commented 11 months ago

For reference, more related discussion on this with holoviz-quarto by @MarcSkovMadsen

https://github.com/quarto-dev/quarto-cli/discussions/8049

Making defining new cell handler easier could help with such project

quarto-dev / quarto-cli

Defining custom cell handlers in Extension and interaction with computation engine #4839