tectonic-typesetting / tectonic

A modernized, complete, self-contained TeX/LaTeX engine, powered by XeTeX and TeXLive.
https://tectonic-typesetting.github.io/
Other
3.93k stars 160 forks source link

V2 cli: extensions #1125

Open Caellian opened 10 months ago

Caellian commented 10 months ago

This suggestion bounces off of shell-escape issue and basically provides an alternative that's more trustworthy than shell-escape at the cost of being a 🤏 harder to implement.

I propose adding an extension interface that can load dynamic libraries which can register macros for text processing in some specialized manner that requires fine-grained control or better system integration than TeX can offer.

User perspective

User needs to install an extension via tectonic -X add extension-name, or add it to the project toml directly.

After that, they can use a macros provided by the extension:

\begin{codeblock}[lua]
function hello()
  return "world"
end
\end{codeblock}

Implementation

Tectonic checks for all extensions specified in the document toml, tries finding them in cache or downloads them from a registry.

Extensions would look something like:

unsafe extern "C" fn register_macros(reg: *mut MacroRegistry) {
  let reg = reg.as_mut().unwrap();
  // register_macro(name, args, Fn(Args) -> String + 'static) 
  reg.register_macro("some_macro", ..., handle_some_macro);
  // register_environment(name, args, Fn(Args, String) -> String + 'static) 
  reg.register_environment("codeblock", ..., handle_code_block);
}

pub fn handle_some_macro(args: Args) -> String {
  // handle args
  // do stuff'n'things
  // return commands and/or text
}

pub fn handle_code_block(args: Args, inner: String) -> String {
  // handle args
  // do stuff'n'things
  // return commands and/or text
}

Motivation

This would provide same functionality already provided by shell-escape, but the user who's building the document can know that the extensions in official registry won't run malicious code on their machine. It delegates trust requirements from the author of a document to the registry.

It also simplifies the document build without shell-escape because the user building the document doesn't have to veto and run other executables before tectonic.

pkgw commented 8 months ago

I agree that this would be a somewhat more "constrainable" system than shell-escape, and thus very nice from that perspective! Another analogy to keep in mind would be the Rust compiler's "procedural macros", which are basically binary extensions to the compiler. One thing that people are doing in that world is starting to implement these kinds of things using WebAssembly, even in non-browser contexts, because WASM code is basically cross-platform (so, basically doing what the JVM was achieve many years ago ...)

Caellian commented 8 months ago

After toying a bit with tiaoma package, I'm noticing that the WASM plugin API is basically what I wanted. The only missing piece is allowing plugins to return content instead of just str. But that can be circumvented with eval.

The issue with the WASM plugins (for some use cases) is that it's not possible to turn a lot of existing programs into WASM modules so I guess calling sos/dlls that can execute commands is still something that would be useful.

pkgw commented 8 months ago

I can think of at least three possible ways of doing things: WASM plugins, native binary SO/DLL plugins, or native executable "plugins" (potentially with a non-trivial protocol communicated over pipes or something). They all have their engineering advantages and drawbacks, but I strongly suspect that the right choice is mostly about what people will actually use in practice, which is not quite the same thing as what makes for the cleanest engineering. If everyone wants to use a Python syntax highlighter, or something, that probably isn't great for either WASM or SO/DLLs.

All that is to say, it would probably be good to identify one or two specific example uses to help guide thinking about what kind of implementation would hit a realistic design sweet spot.