jamespfennell / texcraft

Framework for building TeX engines and other kinds of TeX software
MIT License
6 stars 0 forks source link

Support loading the Plain TeX format #5

Open jamespfennell opened 1 year ago

jamespfennell commented 1 year ago

I've been working on this project on-and-off for 2 years in a kind of scattershot way, mostly doing projects that interest me like the recent serializable VMs work (#3). I think it would be interesting to change tack, and instead work on the large goal of making Texcraft able to parse the plain TeX format.

The format is essentially just a large TeX file and can be downloaded from CTAN. What makes it interesting is that is uses a lot of different TeX features, so supporting it necessarily means making a lot of progress on the project.

I've audited appendix A of the TeXBook, which describes the format, and come up with this list of tasks which seem necessary for it to work and for it be testable. The tasks are ordered based on where they appear in plain.tex, so as more tasks are completed the Texcraft interpreter can get further in the file.

Preamble

Codes

Registers

Parameters

There's essentially nothing to do here. This section sets default values for 10s of parameters. These parameters would generally be implemented in Texcraft at the same time as the associated feature. For testing we could just implement them in a big throwaway component.

Font information

Marcos for text, math and output

I don't see anything here that need special handling, it's just a bunch of \defs.

Hyphenation

jamespfennell commented 1 year ago

Just some more information on the \edef/\xdef situation in which \the is handled specially when reading tokens to define the macro. From looking at the Pascal code and experimenting it seems the rule is the following: expansion happens normally except if the command to be expanded is \the and the target of the \the command is a tokens list variable. (I think the only example of a token list variable in Knuth's TeX is a token list register defined using \toks. But of course Texcraft will support this as a variable type in general.)

This hack shouldn't be too difficult to support because the logic will live entirely in the standard library rather than the VM. The special parser for \edef/\xdef can just use a special tag on \the.

This special parsing is also used for \message.

jamespfennell commented 1 year ago

Correction: Knuth's TeX does have other token list variables. Examples: \output, \everypar.