Tips for writing bytecode -> <something> or <something> -> bytecode tools for Move

This is a list of helpful reference materials and code pointers for someone interested in writing a compiler from Move bytecode[1] to a different format, a tool that consumes Move bytecode and outputs something else (warnings, optimized bytecode, generated frontend code, ...). This list is also marginally helpful for <something> -> bytecode tools, but probably needs some additional pointers from @tnowacki.

Code for the Move bytecode format, including instructions
List of bytecode instructions and types. Up-to-date.
Spec for the binary format. Not fully up-to-date, but includes some details absent in the list above
The Move bytecode model is an artifact that can be produced from source or bytecode, and contains lots of useful functionality and metadata for a linter, static analyzer, or compiler
Code for stackless bytecode a representation of the Move bytecode that compiles the operand stack into registers. This is a very convenient representation for writing static analyzers, or targeting another representation that does not have an operand stack. This is used by the Move prover. The stackless bytecode compiler is also worth a look if you are writing a compiler directly from bytecode.
An example of a simple, classic analysis pass (reaching definitions) that operates over the stackless bytecode
An example of a compositional, interprocedural analysis pass (read/write set analysis) that operates over the stackless bytecode. This leverages built-in functionality for constructing a call graph, topologically sorting it, and analyzing bottom-up starting from the leaves. Recursive cycles are handled via SCC decomposition.
Paper formalizing the semantics of many core instructions. The memory model in this paper might be useful to look at if you are compiling Move to representation with a byte-addressable memory (e.g., Miden bytecode)

[1] Someone writing such a tool will often reach for Move IR, with the assumption that it is like (e.g.) LLVM IR (i.e., a well specified, carefully designed intermediate language designed for transformations and analysis, a good compilation target). This is not the case--Move IR is used primarily for testing the bytecode verifier, is unstable, and under-documented. Bytecode or stackless bytecode is the best choice for such tools at the moment.

move-language / move

Tips for writing bytecode -> <something> or <something> -> bytecode tools for Move #817