Big-step: support datatypes ADTS

jaccokrijnen commented 2 years ago

Current idea (see post below for considerations)

Additionally, I think that both language constructs need to be annotated with the ADT they are constructors of, otherwise I don't see how to define the typing rules for them.

jaccokrijnen commented 1 year ago

After some brainstorming with Wouter, we had the following three options

Include an ADT environment in the big-step semantics with information about defined constructors and eliminators
- Con: not really "substitution-like", lot of changes in big-step semantics and theorems
- Con: Overlapping rules for application: function position can now be variable, and we need a very specific rule for applying the match function
- Con: Constructors and match functions share namespace with regular term variables (which use substitution), what if a lambda shadows a constructor?
Substitute in the Scott encoding for each constructor and eliminator in the big-step rules
- Con: Not the usual way to define ADT semantics, uses object language instead of metatheory to implement concept
- Would make that Scott encoding transformation trivial to prove correct
Add two new syntactic constructs to PIR for constructors and eliminators, do not appear in the surface language, only used in rules of big-step (substitute them in for variables).
- This solves the Cons of option 1 and 2

The last option seems the nicest. We substitute in Constr and Match nodes for the introduced variables (or eta-expanded versions) in the big-step rules. And add rules like:

t [ (\x y z -> Constr 1 [x,y,z]) / C1] ... [.../C_n] [(\e b1 ... bn -> Match e [b1 ... bn] /T_match]
-----------------------------------------------------------
let data T = C_1 T_1_1 T_1_2 T_1_3 | ... | C_n with T_match in t

t => Constr i vs     bs_i vs => v
---------------------------------------- [Match-eval]
Match t [b_1 ... b_n] => v

t_i => v_i   (for all i)
--------------------------------------- [Constr-value]
Constr i [t_1 ... t_n] => Constr i [v_1 ... v_n]

jaccokrijnen commented 1 year ago

Phil recommended to look at existing literature and reuse any of those. This will probably look like an environment-based semantics for the ADTs. Michael was also in favor of environment based, even if PIR is to include sums of products. The compiler may choose how to compile ADTs (either scott, or sums-of-products), and just picking either is not really appropriate for the semantics at PIR's level.

A constructor as value only has a meaning under a datatype environment (a bit like a closure).

What will the relation look like?

ADTS |- t => (ADTS, v)

where the first ADTS are in scope and the second are part of the closure?

jaccokrijnen commented 1 year ago

Compatibility lemma of `let data`

A challenging bit is to prove the compatibility lemma for Data binds. In this case, we need to prove that substituting in encoding $e$ is semantically the same as substituting in another encoding $e'$. This comes down to showing that all corresponding encodings of constructors and match-functions logically approximate each other.

Parametricity

The type introduced by a $let data$ is represented as a type variable, which we could see as a polymorphic type + its operations (constructors and match-function). An encodings instantiates the datatype to a concrete representation and may of course use different types to do that (e.g. sum-of-products or scott-encoding). In our case, by definition of a valid encoding, we know that the operational semantics should behave identically.

Our logical relation is equipped with a notion of parametricity, because the substitution of type variables contains two syntactic substitutions (i.e. the possibly different instantiations of the type variable), as well as a semantic substitution, relating two types.

TODO

[ ] Go through Amal's lectures/notes to see how a parametricity proof works in the context of LRs
[ ] Implement proof for valid encoding
[ ] [Harper (2012), chapter 47-49] (http://profs.sci.univr.it/~merro/files/harper.pdf)

Discussion on parametricity for proving adequacy of a different encoding

Lecture by Leroy on parametricity

[TAPL 2, Chapter 7]

jaccokrijnen commented 1 year ago

For now: fix encoding in logical relation

We can fix the encoding to be equal between pre- and post semantics.

Alternative to parametricity proof?

Wouter and I discussed the possibility of proving it in a different way, if we can prove encoding independence:

$$ t \Downarrowe v \Rightarrow t \Downarrow{e'} v' \wedge \mathcal{RV}[[ \tau ]]_\rho(k, v, v') \hspace{10em} \text{(Encoding independence)} $$

And use a compatibility lemma that has fixed encodings.

During the discussion we were not really clear on the relation that should hold between v and v', but I think now that it should just be the logical relation $\mathcal{RV}[[ \tau ]]_\rho(k, v, v')$, since this is part of the definition of the approximation.

jaccokrijnen / plutus-cert