Open janpaulpl opened 1 year ago
For the sake of hopefully advancing the speed of this implementation, I'm going to describe how the current inferencing algorithm looks like and what the main challenges are!
Currently, the algorithm for type inferencing is based on Unification, in which we view type signatures for terms as equations, and the equation has variables which have to be solved, unknown type placeholders.
These type equations (as they're defined in the Issue) are called Constraints
and for our full program we'll have a Constraint Set
which has to go through Unification
, as in, unify
every constraint to a concrete type.
The algorithm for calculating the constraint set is made up of judgment rules, for example the constraint set for if statements come from the following judgment:
C' = C1 U C2 U C3 U (T1 = Bool, T2 = T3}
-------------------------------------------------
G |- if t1 then t2 else t3 : T2 | X1 U X2 U X3 C'
A quick breakdown of what these symbols mean:
G |- x: T1
means that x
has type T1
under the context G
G
is, practically, the symbol table that records the defined variables and their types.X
and C
are also context symbols. X
is the set of generated type placeholders during the algorithm, and C
is the already-calculated set of constraints.G -> x : T1 |X C
means that x
has type T1
under the symbol table G
, with the already-calculated constraint set C
and the generated additional type placeholders from our algorithms stored in X
. Because Scala is a fun functional language, we can view this algorithm as a function that with a lookup table G
which takes x
as a n input, and we produce X
and C
as a tuple output.
For the implementation, the canonical way of introducing unification in Scala seems to be using an enum. I'll explain later why this is awful for our case, but I think the re-formatting to fit our needs shouldn't be that difficult.
We need the following information in our enum Type
:
Type.Base
which represents monotypes, types that can be represented by single symbol. In our case, these would include Int
, Bool
, and Bytes
. Bool
because although PyTEAL doesn't seem to use it as a primitive, we have to convert it to a C0 IR which does use Bool
s.Type.Forall
represents the placeholders.The format would look something as follows:
enum Type:
case Base(n: String) extends Type
case Constructor(from: Type, to: Type) extends Type
case Forall(n: String) extends Type
end Type
object TypePrimitives:
val Int: Type = Type.Base("Int")
val Bool: Type = Type.Base("Bool")
val Bytes: Type = Type.Base("Bytes")
end TypePrimitives
Now, this is where it gets tricky.... We have to also define our AST as an enum! I'm not really sure if there's a way to extend our untyped AST to out typed AST enum
, or if we have to change the structure of the enum
to fit an object
. Either way, this is how it would look like:
enum SyntaxNode:
case If(condition: SyntaxNode, trueClause: SyntaxNode, falseClause: SyntaxNode)
Now we have to define the Constraint Set
. It has three members, first one the calculated type signature, second the type placeholders (whatever unknown t#
we get), and thirdly the constraint set (for example we have information that 0 -> Int
). Finally we implement the symbol table as a mutable map from String -> Type
opaque type Binding = mutable.Map[String, Type]
I'm not sure if we need to implement a freeVariable
identifier because we don't have globals (I think???), so everything can be identified in scope. We can discuss the details of this in-person, but hopefully the pipeline is clear!
Concerning the data representation from untyped AST: We will keep the untyped/typed AST as a single file. We just have temporary type values in the untyped AST, which then are “hardcoded” in the typed enum, and this information is extended back in the “untyped” AST, gradually typing it.
Issue
Currently, the gvc0 pipeline assumes static typing for the AST. PyTEAL uses a mix of static typing for TEAL types and dynamic typing for pythonic declarations. The static typing for TEAL types such as
a = Int(2)
can be directly translated asint a = 2;
for our untyped AST in the parser.However, for pythonic declarations such as
add_expr = Add(a, b)
we have to take the Node from the untyped AST to a typed AST. The type inference algorithm runs through the following steps:[ ] 1. Assign a type or type variable to the expression and each subexpression. For known expressions, like +, — and so on, use the types known for these expressions. Otherwise, use type variables — placeholders for not known types.
[ ] 2. Generate a set of constraints on types, using the parse tree of the expression. These constraints are something like this: if a function is applied to an integer, then the type of its argument is integer.
[ ] 3. Solve these constraints by unification — an algorithm for solving equations based on substitutions.
Here's an example
We have information from the untyped AST for an incremental function
def
; we get the resulting treeWhere the
@
stands for function application.The new tree resulting from our first step is as follows:
The set of constraints on types can now be generated. (i)
t3: Int
from2.instanceOf(Int)
(ii)t2: Int -> Int -> Int
becauseAdd.instanceOf(Int -> Int -> Int)
(iii)x
stays typet1
(iv) We can now type the function calls.def
has a type based on its parameters and return calls.def:: t0: t1 -> t5
,t5:: t4: t1 -> t5
, andt4:: t2: t3 -> t4
.The final step lists the constraints from the previous step:
We can now infer a 6th constraint
And finally from
6
and2
we can inferAnd the final typed AST for
def inc(x): Add(x, 1)
is