Uniform Reflective Language

We now have something like eight different variants of reflected syntax: Input.expr and Output.expr in https://github.com/mit-plv/fiat-crypto/blob/jgross-phoas/src/Experiments/FancyMachineMontgomery256.v (mine) , https://github.com/mit-plv/fiat-crypto/blob/rsloan-phoas/src/Experiments/SpecificCurve25519.v (@varomodt 's), and https://github.com/mit-plv/fiat-crypto/blob/phoas/src/Experiments/SpecificCurve25519.v (@andres-erbsen 's), my untyped symbolic (SymbolicExpr) in https://github.com/mit-plv/fiat-crypto/blob/jgross-phoas/src/Experiments/FancyMachineMontgomery256.v (I also have Syntax, which I use only for notations), and possibly another one that is the input to @varomodt 's assembly pipeline. We should probably unify them, to some extend, so that we're not duplicating work and copy-pasting.

I propose that we settle on how to design the inductive syntax here. There are some high-level questions:

How much do we want to enforce by construction?
- Should the input and output languages for linearization be the same?
- Should the output language enforce things like that you don't have wrongly-nested lets, or that you don't have trees of operations?
- Should we enforce that we only have variables of type bool and of type word?

There are some questions of what constructions we should support. I need the following features:

Kinds of operators:
- 1-to-1 (ldi)
- 2-to-1 (e.g., mulhwll)
- 3-to-1 (e.g., selc)
- 3-to-2 (e.g., adc, subc)
types:
- booleans (for the carry bit)
- words (or whatever type is carrying bounded integers)
- literal bool constants (for adc)
- literal Z constants (for ldi, shl, shr)
- abstract Z constants (things whose identity should be fixed at reification time, which are of type Z, but which Z_beq will block on)
generic operations:
- let ... in ... (and a variant for handling n-to-2 operators)
- possibly: function abstraction, for things like parameterizing multiplication over its word arguments

The things I want to be able to do with the language are:

reification & interpretation
linearization of lets and of function application
common-subexpression-elimination (including combining a fst of an adc and a snd of the same adc into a single "carry-let-binding")
- note that cse for PHOAS seems to require a variant of the syntax with decidable equality, to be used for keys / indexing of variables
notations that let it look like assembly (since I'm too lazy to write stringification at the moment)

What do we need elsewhere?

There are also some design questions:

How should we represent abstract constants?
Should the let ... in ... be different for different numbers of arguments?
Should we fuse operator application and let ... in ...?

It might be a good idea to keep the Input (reified gallina with the correct operations) and Output (linear chain of lets) languages separate for these reasons

I have a fairly good idea what the output code looks like, and have been making up the input language as I go.
Writing transformations/analysis passes on the linear language is easier. The reason I separated out the output language in the first place was because I had a hard time figuring out how to handle already flattened subexpressions in the input language during the flattening.

Here are some things that I think we want to be parameters of the language:

the word type, or at least its size argument. we want to target architectures with different register widths.
the primitive operations of all allowed type signatures. Bit shifts, rotations, multiplication overflow, etc differ between the architectures we are targeting. Different arities could be supported by hardcoding the allowed type signatues or maybe creating a language for operation declarations (I don't know how whether this would work. @achlipala?).

I do not need cse, and I'm not sure whether the reflective syntax is the best place to do it. Are you sure it would not be easier to just not duplicate subexpressions in the first place? Note that being way too liberal with duplicating would lead to exponential blow-up, so some care is needed anyway.

Another question: should literal constants should have the same type as variables in the source code? The two are not interchangable in assembly.

Parameterization by available primitive operators sounds reasonable. I think our natural path for just the original at-MIT project would save that level of generalization until after we were happy with our first full verified implementation. But it probably isn't such a big deal to add the extra generality now. (However, given @varomodt's time constraints for finishing his thesis, it might be worth adding such generality in a separate branch until he's done.)

At first I was thinking that it would be easiest to include separate Type parameters for unary and binary operators. Unfortunately, that wouldn't accommodate different bit-width combinations, so maybe instead we use list nat -> nat -> Type. That is, each operator's type is indexed with the bit widths of its inputs and output.

I agree that literal constants and free variables should get different AST constructors, because they're compiled differently. Re: discussion on our mailing list, I expect that free variables should be annotated with constants bounding their values (to support compile-time bounds checking), while obviously such a thing is redundant for literals.

Indexing operations by width isn't sufficient; some operations (ldi, shl, shr) take in a non-register constant, and other operations make use of the boolean carry flag (selc) and/or return the flag (adc, subc). I'm currently leaning towards supporting 1-to-n operators, which take a syntactic tuple as input, and return a Gallina tuple of outputs. This would allow us to avoid pair projections in the output language while still treating all operators as 1-to-1 when we don't care about the details of their arguments. The biggest downside that I see is that this will leave a bunch of unfolded Gallina projections lying around. Another alternative is to separately support 1-to-1 and 1-to-2 operators. I plan to spend today and tomorrow working on this, as part of cleaning up my montgomery assembly pipeline.

mit-plv / fiat-crypto

Uniform Reflective Language #58