This project is a sourcecode transpiler for commutative diagrams.
The aim is being able to translate from and to any format, most of which are LaTeX DSLs.
Here is the progress on the planned ones:
Target | Import | Export |
---|---|---|
amscd | ██████████ |
██████████ |
amscdx | ██████░░░░ |
███████░░░ |
CoDi | ░░░░░░░░░░ |
░░░░░░░░░░ |
quiver | ████████░░ |
███████░░░ |
tikz-cd | ░░░░░░░░░░ |
░░░░░░░░░░ |
xymatrix | ██░░░░░░░░ |
░░░░░░░░░░ |
... |
Private repo: https://github.com/paolobrasolin/ouroboros
A transpiler like this could be realized with many technologies. I have a few end goals:
TypeScript therefore looks like the best choice. On top of it, two outstanding libraries that trivialize a lot of groundwork are nearley.js for grammar-based parsing and Superstruct for data validation and coercion.
Freely transpiling among many DSLs requires a transpilation procedure for each ordered source/target language pair we want to connect.
How many transpilers do we need in total?
n
DSLs directly, then we need two times n(n-1)/2
(i.e. twice the number of edges of a Kₙ graph
).n
DSLs through an artificial Universal Language, then we need two times n
(i.e. twice the number of edges of the Sₙ graph
).Implementing an Universal Language (UL for short) clearly is the winning strategy.
Each DSL will have a dedicated folder It will contain a some components allowing it to be transpiled back and forth from the UL.
schema
describes the AST with superstruct
structures.
optional
.
Anything which is valid for the original processor must validate
.defaulted
.
The schema must be the single source of truth for about the DSL defaults: consumers of the AST must simply trust coercion (e.g. via create
) to make them defaults explicit.defaulted
s must be on children Struct
s, while optional
s should be on the parent Struct
s.
This allows assert
s to be a simple way (after coercion) to get rid of the ... | undefined
from the signatures of optional
parts when processing the AST.grammar
describes the DSL with a nearley
grammar.
parser
.parser
implements a parse
function to transform sourcecode into an AST.
parse
is responsible to perform any extra necessary decoding/deserialization on the input.parse
outputs a bona fide object respecting schema
, meaning that the signatures are correct but no explicit validation (and especially no coercion) is done at this time.parse
may output an array to account for ambiguity and simplify testing, but it should only contain a single object as we ban ambiguous grammars.injector
implements an inject
function mapping the DSL AST into the UL AST.
inject
must assume scheme coercion has been done, so it can have no knowledge of the DSL defaults and can simply perform a few assert
s to check for presence and cirvumvent the inconvenient * | undefined
signatures.inject
must create
its output, so it can avoid reasoning only about the features being actively used.
This allows targeted testing with toMatchObject
and avoids the need for backtracking when adding new features to the UL, all while keeping the UL fully explicit.projector
implements a project
function mapping the UL AST onto the DSL AST.
project
maps only features available in the target DSL.renderer
implements a render
function to transform an AST to sourcecode.
render
should include a minification process to produce the minimal code leveraging implicit defaults of the DSL.
Maybe avoiding coercion is enough, but I haven't made up my mind yet.index
ties together all components into a simple API.
read = inject ∘ coerce ∘ parse
, which translates DSL source into its representation in universal language.write = render ∘ project ∘ coerce
, which translates a univesal language representation into DSL source.
(NOTE: coercion here can be omitted as long as we keep the UL completely explicit.)The UL will also have its own folder. It will contain much less than other DSLs, since it's used only for internal representation.
schema
describes the AST with superstruct
structures.
A few more words should be spent about the design of the UL, as two very different approaches can be followed for the usage of optional
structures.
Everything is optional (except topology).
assert
ed to circumvent partial signatures (after the input has been coerced externally, of course)Everything is mandatory (and has reasonable defaults).
create
d as the injector must not know about defaults and all properties are mandatory; this also avoids breakage on UL extensions* | undefined
) and can be destructured right away while simply ignoring unsupported features of the target DSLIt's a matter of balance, but ultimately the latter alternative has slightly better ergonomics, and a fully explicit UL schema should be simpler to reason about.