Open jamespfennell opened 1 year ago
For benchmarking, it would be awesome to time (de)serializing a VM that has loaded the Plain TeX format. It may take a lot of work though before Texlang can read the Plain TeX format.
Some perf improvement ideas I had:
In general support serializing and deserializing iterators. In a bunch of places I create a new data structure (like a new instance of a map) and then serialize that. We could probably skip this intermediate phase, making serding faster and less memory intensive. Serialization should be trivial; figuring out deserializing may be tricky. Given serde's API it will be impossible to actually deserialize to an iterator.
When serializing the cat code map, don't serialize values that are the same as e.g. INITEX. When deserializing, initialize to the INITEX defaults and then apply the differences on top.
Serialize cat codes as integers, irrespective of the format.
For registers, serialize continuous runs of 0s as something like 0<number of zeros>
. In many serde contexts it is expected that registers mostly have their default values so this will be much faster and space efficient. I had a fancier idea of dividing a vector into blocks of the form <number of non-zero values><number of zeros><non-zero values>
which I think is provably more space efficient in all cases. If a block starts with 0, it means the vector is over.
In the CS name interner, the ends
vector is increasing. We should serialize the diffs between adjacent elements instead of the elements themselves. For formats with varint encoding, this will be more space efficient.
Task list for making the serializable VMs feature complete:
\countdef
)\dump
primitive