tectonic-typesetting / tectonic

A modernized, complete, self-contained TeX/LaTeX engine, powered by XeTeX and TeXLive.
https://tectonic-typesetting.github.io/
Other
3.91k stars 161 forks source link

V2 CLI: Include "tex" output type in Tectonic.toml #1135

Open j0hax opened 8 months ago

j0hax commented 8 months ago

Howdy! I am currently working with the V2 Interface.

Would it be possible to create a simple output option that produces one unified .tex file from index.tex, _preample.tex, etc. source files?

Implementing this would provide great flexibility for further processing; specifically, I would like pass my TeX source to Pandoc for conversion to EPUB.

Suggestion:

[doc]
name = 'my-document'
bundle = 'https://data1.fullyjustified.net/tlextras-2022.0r0.tar'

[[output]]
name = 'my-document'
type = 'pdf'

[[output]]
name = 'my-document-src'
type = 'tex'
pkgw commented 7 months ago

Yes, I have thought that something like this could be very interesting!

Implementing such an output form would, I think, be more of a challenge than one might expect. It's pretty intuitive to think "OK, I would like to take all of my input files and merge them all into a unified output", but one problem with (La)TeX is that a standard document might access hundreds of input files, and we almost surely don't want to inline all of them. Where do you draw the line? Fortunately, Tectonic offers what could be a very reasonable heuristic: input files from the "bundle" are not inlined, and everything else is.

But I'm pretty sure that there's a much deeper problem to solve. The Tectonic TeX engine is actually expanding and evaluating TeX code as it processes its input(s), such that you can't really recover the input source code in any meaningful way. So, it seems to me that any kind of include-expansion would have to be a source-level operation, not actually invoking the processing engine.

The problem there is that TeX is fundamentally un-parseable, in the sense that the only way to extract the meaning of a piece of TeX code is to evaluate it in an engine. So any source-level processing is necessarily heuristic and easily defeated. For instance, if my file contains \input{results.tex}, that is easy for a parser to deal with, but what about something like:

\newcommand\myinput[1]{\input{res#1.tex}}

\myinput{ults}

And it is easy to devise much, much more pathological cases.

One day, it would be interesting for Tectonic to include a specification of a LaTeX-like language that can be parsed into an AST and understood in ways that don't require pushing it through a real, full TeX engine. If Tectonic could manage guarantees about source-level parseability, it would become more tractable to implement these kinds of transforms.