AST datatypes don’t support algebraic traversals

robrix commented 5 years ago

Precise ASTs are currently of kind *, and thus don’t support swapping out subterms with some other value, which in turn means we can’t really compute programs over ASTs using recursion schemes. This is made trickier by the fact that subterms aren’t all the same type, i.e. we can’t do e.g. Functor/Foldable instances over the subterm positions.

This also means that semantic’s current approach for representing diffs doesn’t apply, since it represents them as the fixpoint of a flat syntax type of functors, and therefore every recursive position has the same type.

robrix commented 5 years ago

Some possibilities:

Something using syb or similar, maybe? I don’t know much about it.
Use higher-kinded data to represent subterm positions wrapped in some other constructor. This would make pattern matching quite inconvenient but would be reasonably direct: T Identity would represent an AST, while T (Const a) would represent one step in an algebraic fold to/unfold from a.
Use the GHC.Generics types to construct a higher-kinded functor from a given AST datatype, either generically with type families or using TH metaprogramming, with appropriate project-/embed-like functions to map back and forth between ast and Rep' ast ast.

(With either of the latter two approaches we would want to be careful not to wrap leaf fields, i.e. Text, in the functor, since it doesn’t represent a subterm.)

robrix commented 5 years ago

As an example of what I mean by the second suggestion, we currently generate AST datatypes broadly like this:

data Expr
  = LamExpr Lam
  | AppExpr App
  | VarExpr Var
  deriving (Eq, Ord, Show)

data Lam = Lam Identifier Expr
  deriving (Eq, Ord, Show)

data App = App Expr Expr
  deriving (Eq, Ord, Show)

newtype Var = Var Identifier
  deriving (Eq, Ord, Show)

newtype Identifier = Identifier String
  deriving (Eq, Ord, Show)

where by adding a parameter of kind * -> *:

data Expr f
  = LamExpr (Lam f)
  | AppExpr (App f)
  | VarExpr (Var f)
  deriving (Eq, Ord, Show)

data Lam f = Lam (Identifier f) (Expr f)
  deriving (Eq, Ord, Show)

data App f = App (Expr f) (Expr f)
  deriving (Eq, Ord, Show)

newtype Var f = Var (Identifier f)
  deriving (Eq, Ord, Show)

newtype Identifier (f :: * -> *) = Identifier String
  deriving (Eq, Ord, Show)

we could instantiate f to Identity to represent an ordinary AST (i.e. Expr Identity with this reformulation is equivalent to Expr in the original one above), or to Const a to replace all the subterms with values of type a representing them (e.g. during a fold or unfold).

robrix commented 5 years ago

I guess this wouldn’t suffice to compute recursion schemes over this structure on its own since there’s still no (higher-order) fixpoint that we’d use for all of these datatypes, but it would at least allow us to represent the intermediate states.

robrix commented 4 years ago

I’ve implemented a Traversable1 (1 indicating higher-kindedness, à la Show1 or Generic1) class in semantic which implements this sort of thing by allowing you to conveniently map subterms to e.g. Monoids. It’s not full recursion schemes, but it’s a good enough start that I’m going to close this out and open a more specific issue about representing diffs.

patrickt commented 4 years ago

N.B. that deriving Data or Plated instances (cf https://github.com/tree-sitter/haskell-tree-sitter/issues/186) would give us something like catamorphisms and paramorphisms, as seen here.

tree-sitter / haskell-tree-sitter

AST datatypes don’t support algebraic traversals #143