grin-compiler / grin

GRIN is a compiler back-end for lazy and strict functional languages with whole program optimization support.
https://grin-compiler.github.io/
1.02k stars 36 forks source link

Syntactical extensions for GRIN #32

Open Anabra opened 5 years ago

Anabra commented 5 years ago

Syntactical extensions for GRIN

This document proposes new syntactic constructs for the GRIN IR, as well as modifications to some of the already existing ones.

Motivation

The current syntax of GRIN can pose some difficulties for analyses. As an example, the created-by analysis requires each binding to have a variable pattern. That is, each binding must have name. Also, analyzing programs where every intermediate value has an explicit name is much easier.

Furthermore, the current Haskell definition of the syntax allows certain erroneous programs. Currently, we rely on the linter to reject these programs, but a more rigid definition of the syntax could prevent errors statically.

Datalog based analysis

Currently, the analyses are implemented via compiled abstract interpretation, where the abstarct program is generated from Haskell and it can be run in both Haskell and C++. However, in the future, we plan to transition to Soufflé Datalog based analyses. In order to build a Datalog model for the GRIN AST, every single instruction must have a name. It is not a question of convenince, the model requires it. This is yet another reason to give an explicit name to everything in the AST.

Note: Soufflé has many advantages over our current implementation, but this is out of the scope of this proposal.

Naming

These syntactic constraints mainly deal with the naming of intermediate values. They make sure, that most of the currently available values can always be referred to by an explicit name. In other words, constants could only be introduced through the pure function. Everywhere else, we would be passing variables around. All of these constraints can be implemented trivially. Namely, we only have to change the below constructions so that they require a Name instead of a value.

An observant reader might have noticed, that the pattern matching construct for bindings isn't addressed above. That is because we will deal with them separately. Also, note that fetch and update already only accept variable names as arguments.

New

These syntactic constructs are completely new to the GRIN language.

@patterns

As mentioned earlier, each binding should have a name, the variable it binds. Currently, the syntax allows for pattern bindings, which do not bind a variable to the left-hand side computation. This problem could be fixed by introducing @patterns.

(CInt k)@v <- pure n
<both v and k can be referenced here>

@patterns combine the flexibility of value patterns with the rigidness of the naming convention. By removing value patterns from the language, and introducing @patterns, we could achieve the same expressive power as before, meanwhile making sure that everything can be referenced by an explicit name.

Note that unlike the @pattern in Haskell, here the variable name and the pattern are swapped. This is to keep the syntax consistent with named case alternatives.

Furthermore, we should restrict the available patterns only to nodes. Currently literals and even variable names are allowed. This is too general.

Function application (scrapped)

Currently, normal functions and externals share the same syntactic node for application. Externals are stored together with GRIN programs, and they are differentiated from normal functions by looking up the applied function's name in the stored external list. This makes analyzing function applications quite inconvenient.

We could introduce different syntactic nodes for normal functions and externals, but that would pose an unnecessary overhead on analyses and transformations in certain use cases. Instead, we will introduce different types of names for functions and externals. The application node will differentiate functions from externals by wrapping their names in different data constructors.

data AppName
  = Fun { appName :: Name }
  | Ext { appName :: Name }

Exp = ... | SApp AppName [Name] | ...

Named case alternatives

To model the GRIN AST in Datalog, each syntactic construct must have an identifier. Currently, the alternatives in case expression don't have unique identifiers. We can solve this issue by naming each alternative.

case v of
  (CNil)       @ v1 -> <code>
  (CCons x xs) @ v2 -> <code>

The syntax would be similar to that of the @patterns, but the variable name and the pattern would be swapped. This is to improve the readabilityo of the code: with long variable names, the patterns can get lost. Also, there could be an arbitrary number of spaces between the @ symbol and the pattern/variable. This is also to improve readability though proper indentation. Readability is only important for unit tests, not for generated GRIN code.

Semantics

The names of the case alternatives would be the scrutinee restricted to the given alternative. For example, in the above example, we would know that v1 must be a node costructed with the CNil tag. Similarly, v2 is also a node, but is constructed with the CCons tag.

Named alternatives would make dataflow more explicit for case expressions. Currently, the analyses restrict the scrutinee when they are interpreting a given alternative (they are path sensitive), but with these changes, that kind of dataflow would made visible.

Structural

These modifications impose certain structural constraints on GRIN programs.

Last pure

Binding sequences should always end in a pure. This will make the control flow a little bit more explicit. However, this change could be impacted by the introduction of basic blocks. It might be wiser to delay this change.

Program, function, definition

The above-mentioned notions should be different syntactic constructs. They should form a hierarchy, so it is always clear whether a transformation/analysis works on an entire GRIN program, a single function, or only on an expression.

Currently available, but has to be more precise

These modifications would turn currently linter-checked properties into static constraints. For example, only real expressions could appear on the left-hand side of a binding, or the alternatives of a case expression could really only be alternatives.

No longer needed

By introducing the above mentioned syntactic modification, some currently available constraints become redundant: LPat, SimpleVal.

Also, some of the low-level GRIN constructs will be removed from the AST: VarTag node and indexed Fetch. These constructs were originally needed for RISC code generation. Since then, GRIN transitioned to LLVM, and over the years these constructs proved to be unnecessary.

Questions

Parsing function applications (scrapped)

The current syntax does not differentiate normal function application from external function application. The new syntax will differentiate them. This means, we must decide whether a name corresponds to a normal function or to an external while parsing. Two solutions that might work are the following.

Use previoulsy parsed information (scrapped)

We could add some sort of state to the parsers that keeps track of the available externals. Since externals can be added through the PrimOpsPrelude, we would also need to pass those as an extra argument to the parser.

Naming convention for externals (scrapped)

We could introduce a new naming convention for externals. They would always begin with an undersocre.

Implicit parsing of unit patterns

Currently, the parser can implictly parse bindings that have a unit pattern. The most common example for this is update. The string "update p v" is parsed as if it was "() <- update p v". The new syntax does not allow unit patterns. We must think of an alternative way to express this. Maybe a wild card patterns?

_ <- <lhs>
<rhs>

Also, wild card patterns could be parsed implicitly as well:

<lhs>
<rhs>

Answer

Every instruction must have a name, even function calls that only return a unit. Binding their result to a variable is necessary for unique identification, hence, the idea of unit and wildcard patterns are scrapped.

Future

Basic blocks

As for the future, we plan to introduce basic blocks into the language. This will bring its own syntactic modifications, but they will be mostly independent of the above-discussed changes.

GADT-style AST

Also, GRIN will have a GADT-style based AST as well. This is for the front end users of GRIN who want to generate well-structured GRIN programs. It will make all syntactic restrictions explicit, hence it will force the user to build a syntactically sound AST.

Represesnting the AST with GADTs has a serious drawback though: recursion schemes don't support GADTs at the moment. This means that neither the currently implemented analyses, nor the transformations will work on the GADT-style AST. This is the reason why we will only use it for the GRIN code generation. The front end will generate a GADT-style GRIN AST, then we will transform it into a plain old ADT, so that we continue working with it.

Prototype AST

The below examples only include the changes regarding to the naming convention.

New constructs

data BPat
  = VarPat { bPatVar :: Name }
  | AsPat  { bpatVar :: Name
           , bPatVal :: Val  -- TODO: This will be restricted in the future.
           }
  | WildCard

data AppName
  = Fun { appName :: Name }
  | Ext { appName :: Name }

Changes

data Val
  -- CHANGE: Name
  = ConstTagNode  Tag  [Name]
  -- CHANGE: Name
  | VarTagNode    Name [Name]
  | ValTag        Tag
  | Unit
  -- simple val
  | Lit Lit
  | Var Name
  | Undefined     Type

data Exp
  = Program     [External] [Def]
  | Def         Name [Name] Exp
  -- Exp
  -- CHANGE: BPat
  | EBind       SimpleExp BPat Exp
  -- CHANGE: Name
  | ECase       Name [Alt]
  -- Simple Exp
  -- CHANGE: Name
  | SApp        AppName [Name]
  | SReturn     Val
  -- CHANGE: Name
  | SStore      Name
  | SFetchI     Name (Maybe Int)
  -- CHANGE: Name
  | SUpdate     Name Name
  | SBlock      Exp
  -- Alt
  | Alt CPat Name Exp
andorp commented 5 years ago
named case scrutinees
named node fields
name function arguments
named store argument

Please could you give examples how they are now and what will be the intended result? We have named function arguments, probably you meant at the call side.

andorp commented 5 years ago
data FName
  = F { fName :: Name }
  | E { fname :: Name }

How FName will fit to the current analyses? Currently we have unified Names everywhere. When projecting back to source code, type env, chould this introduce difficulties?

andorp commented 5 years ago

Last return

Lets use the pure as it is already established in the syntax, as also doesn't give the false sense of return point of a function.

andorp commented 5 years ago
Program, function, definition
The above mentioned notions should be different syntactic constructs.

This design decision was taken to be compatible with the recursion schemes library, which already had the tradeoff. Combining base functors would leave the intermix of the different levels. Using GADTs and indexed recursion schemes would mean reimplementing the recursion-schemes library supporting indexes.

I was considering introduce a type safe layer which would use a GADT to construct only meaningful programs, but I rejected the idea out of pure laziness (pun intended). Maybe it is a good idea to have that layer for the future frontends, forcing them (another pun intended) to create a correct syntax. But for us GRIN contributors I think it is overkill to create such a type-safety when implementing transformations, analyses, at least for now, as we plan to refactor the IR. If we want to do such a type safe internal representation, I would suggest to implement it when we plan to do the first official release.

Anabra commented 5 years ago
named case scrutinees
named node fields
name function arguments
named store argument

Please could you give examples how they are now and what will be the intended result? We have named function arguments, probably you meant at the call side.

The arguments to the respective data constructors would require Names instead of values. Just like update and fetch do.

Anabra commented 5 years ago
data FName
  = F { fName :: Name }
  | E { fname :: Name }

How FName will fit to the current analyses? Currently we have unified Names everywhere. When projecting back to source code, type env, chould this introduce difficulties?

FName would only be present in the application node, nowhere else. This way, if the analysis/transformation does not want to differentiate normal functio apps from external apps, they could just pattern patch on App and get the stored name via the getter. If someone do want to differentiate them, they will be able to pattern patch on the respective FName data ctors.

During pretty printing, I don't think it would introduce any difficulties. These are just wrappers.

Anabra commented 5 years ago

Last return

Lets use the pure as it is already established in the syntax, as also doesn't give the false sense of return point of a function.

Alright.

Anabra commented 5 years ago
Program, function, definition
The above mentioned notions should be different syntactic constructs.

This design decision was taken to be compatible with the recursion schemes library, which already had the tradeoff. Combining base functors would leave the intermix of the different levels. Using GADTs and indexed recursion schemes would mean reimplementing the recursion-schemes library supporting indexes.

I was considering introduce a type safe layer which would use a GADT to construct only meaningful programs, but I rejected the idea out of pure laziness (pun intended). Maybe it is a good idea to have that layer for the future frontends, forcing them (another pun intended) to create a correct syntax. But for us GRIN contributors I think it is overkill to create such a type-safety when implementing transformations, analyses, at least for now, as we plan to refactor the IR. If we want to do such a type safe internal representation, I would suggest to implement it when we plan to do the first official release.

Makes sense. I did not consider this.

andorp commented 5 years ago

Naming: I see, basically you suggest to introduce constants using the pure construction. v <- pure 1 .

andorp commented 5 years ago

Please could you update the description of the proposal?

Anabra commented 5 years ago

Exactly. Constants could only be introduced in the above form. Everywhere else, we would be just passing variables around.

Anabra commented 5 years ago

The separation of external and ordinary function application has been scrapped.

The reason being is that it makes writing code for function application cumbersome. Whenever we need the fact that the function is an external, we will usually have to look it up from the external mapping anyway. This lookup will determine whether the function is an external or not. This means, that the pattern match on the type of the function application is redundant.

Anabra commented 5 years ago

Extended the proposal with the removal of low-level GRIN construct.

VarTag node and indexed Fetch will be removed. These constructs were originally needed for RISC code generation. Since then, GRIN transitioned to LLVM, and over the years these constructs proved to be unnecessary.

andorp commented 4 years ago

As part of the ICFP tutorial there is a GADT, which helps to create syntactically expected GRIN programs. The rule indexed expressions show what can be combined how.

This approach will live in the Syntax.GADT module. This change can be found in the https://github.com/grin-compiler/grin/pull/44 PR.

Anabra commented 4 years ago

I added a brief description for the GADT-style AST, and the new datalog based model for the AST. The DL model will need named case alternatives.

Anabra commented 4 years ago

Added a detailed explanation for named case alternatives.

andorp commented 4 years ago

Looking at the code from the latest PR, we realized that it is confusing to have two different forms for the @as pattern

case v of
  (CNil)       @ v1 -> <code>
  (CCons x xs) @ v2 -> <code>

v@(CInt k) <- pure nis 
<both v and k can be referenced here>

This should be:

case v of
  (CNil)       @ v1 -> <code>
  (CCons x xs) @ v2 -> <code>

(CInt k)@v <- pure nis 
<both v and k can be referenced here>
andorp commented 4 years ago

In the long run, we should remove the double variable @as pattern construction.

var1@var2 <- lhs

This should be only

var1 <- lhs
Anabra commented 4 years ago

I wrote a few lines about these in the as-pattern section.