kztk-m / EbU

An implementation of the ICFP 2023 paper "Embedding by Unembedding", extracted from its artifact
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Language with Multiple Syntactic (Semantic?) Categories #3

Open kztk-m opened 2 weeks ago

kztk-m commented 2 weeks ago

Currently, EbU (and the original unembedding) assumes a single syntactic category. However, actual programming languages have multiple ones, such as patterns, statements, and (syntactic) values. In such situations, we often want to have different semantic domains for different syntactic categories, and semantic functions must refer to multiple domains instead of a single one. (A notable exception is patterns because we do not want to make the typing environments implicit in patterns.)

A workaround (see below) exists, but doing so for each DSL is tedious. So, I prefer to have a general solution addressed at the library level, but I want to see more DSLs of this sort to design a good interface effectively.

Workaround (requires no changes to EbU)

Consider an imperative language, which has two syntactic categories, namely expressions and statements. Since the current EbU assumed a single sorted language, a naive remedy is to prepare a unified type as below, where GType represents a kind of guest types.

data UType = ExpType GType |  StatementType GType 

However, in the current EbU, the semantic domain sem is assumed to have kind [k] -> k -> Type, meaning that the context must also be [UType], while the semantic domains for the syntactic categories are naturally SemE, SemS :: [GType] -> GType -> Type. For this particular example, we just can ignore tags in UType, but in general, different syntactic categories may even be indexed by different sorts of types. (For example, a semantic domain for statements may also be indexed by their effects.) To reflect the situation, we change the definition of UType as below

data UType = ExpType GType |  StatementType GTypeS 
SemE :: [GType] -> GType -> Type 
SemS :: [GType] -> GTypeS -> Type 

while keeping the context type.

Unified Semantic Domain

An observation is that every binder must bound variables of a type of kindGType, suggested by the semantic domain types SemExp and SemStatement. Thus, an idea is to pass around the witness that the types in UType in the context must come from types in GType.

data ComesFromGType uenv env where 
  NilOk :: ComesFromGType '[] '[] 
  ConsOk :: ComesFromGType uenv env -> ComesFromGType (ExpType t : uenv) (t : env) 

data USemImpl (env :: [GType]) (u :: UType) where 
  FromE :: SemE env t -> USemImpl env (ExpType t) 
  FromS :: SemS env t -> USemImpl env (StatementType t)

unFromE :: USemImpl env (ExpType t) -> SemE env t 
unFromE (FromE e) = e 

unFromS :: USemImpl env (StatementType t) -> SemS env t 
unFromS (FromS s) = s 

newtype USem uenv u = USem { runUSem :: forall env. ComesFromGType uenv env -> USemImpl env u } 

We can lift the original semantic domains into the unified domain, when wrapped with EnvI

liftE :: EnvI SemE t -> EnvI USem (ExpType t)
liftE (EnvI e) = EnvI $ \tenv -> USem $ \wit -> e (convTEnv tenv wit)
  where 
    convTEnv :: TEnv us -> ComesFromGType as us -> TEnv as 
    ...

unliftE :: EnvI USem (ExpType t) -> EnvI SemE t 
unliftE (EnvI e) = EnvI $ \tenv -> genTEnv $ \wit tenv' -> 
  let FromE e' = runUSem (e tenv') wit in e' 
  where 
    genTEnv :: TEnv as -> (forall us. ComesFromGType us as -> TEnv us -> r) -> r 
    ...

-- We are not able to use EnvI for statements due to its kind EnvI :: ([k] -> k -> Type) -> k -> Type, 
-- but we can do similarly. 

Unified Semantic Functions

For first-order language constructs, we just pass around ComesFromGType.

-- lifting of (+)
addU :: EnvI USem (ExpType Int) -> EnvI USem (ExpType Int) -> EnvI USem (ExpType Int)
addU = UE.liftFO2 addSemU
  where 
    addSemU :: USem uenv (ExpType Int) -> USem uenv (ExpType Int) -> USem uenv (ExpType Int) 
    addSemU (USem u1) (USem u2) = USem $ \wit -> FromE $ addSem (unFromE $ u1 wit) (unFromE $ u2 wit)

    -- addSem :: SemE env Int -> SemE env Int -> SemE env Int 

Manipulation of witness is required for second-order constructs, but this must be doable by leveraging the fact that it must bound variables of typeExpType t.

letU :: EnvI USem a -> (EnvI USem a -> EnvI USem b) -> EnvI USem b 
letU = UE.liftSOn (ol0 :. ol1 :. End) letSemU 
  where 
    letSemU :: USem uenv (ExpType a') -> USem (ExpType a' : uenv) b -> USem uenv b 
    letSemU (USem u1) (USem u2) = USem $ \wit -> FromE $ letSem (unFromE (u1 wit)) (unFromE $ u2 (ConsOk wit)) 

    -- letSem :: SemE env a -> SemE (a : env) b -> SemE env b 

[Editted, Oct 28] The above is not true as we will not obtain the constraint a ~ ExpType a' to use letSemU. So, we either need to change the type of letU to be:

letU :: EnvI USem (ExpType a) -> (EnvI USem (ExpType a) -> EnvI USem (ExpType b)) -> EnvI USem (ExpType b)

Or, use SemE/SemD externally and convert them to USem and back when we implement the semantic function via lifting. This would be necessary if we keep the surface HOAS syntax as:

class Exp e where 
  let_ :: e a -> (e a -> e r) -> e r
  ...

class Exp e => Stm s e | s -> e where   
  bind :: e a -> (e a -> s r) -> s r 
  ...
kztk-m commented 3 days ago

I implemented the idea. Now, lifting functions can take semantic functions involving different semantic domains. A restriction is that the current system only supports variables that live in expressions (semExp in the type of liftSOn').

ghci> :set -XPolyKinds
ghci> :t liftFO2'
liftFO2'
  :: forall {k1} {k2} {k3} {k4} (sem1 :: [k1] -> k2 -> *) (a :: k2)
            (sem2 :: [k1] -> k3 -> *) (b :: k3) (sem3 :: [k1] -> k4 -> *)
            (c :: k4).
     (forall (env :: [k1]). sem1 env a -> sem2 env b -> sem3 env c)
     -> EnvI sem1 a -> EnvI sem2 b -> EnvI sem3 c
ghci> :t liftSOn' (ol1 ::. End')
liftSOn' (ol1 ::. End')
  :: forall {k} {kR} {k'} {semExp :: [k] -> k -> *}
            {proxy :: ([k] -> k -> *) -> *} {semR :: [k] -> kR -> *}
            {sem :: [k] -> k' -> *} {a1 :: k} {a2 :: k'} {r :: kR}.
     LiftVariables semExp =>
     proxy semExp
     -> (forall (env :: [k]). sem (a1 : env) a2 -> semR env r)
     -> (EnvI semExp a1 -> EnvI sem a2)
     -> EnvI semR r

Notice proxy semExp in the type, which is required to determine the type semExp; liftSOn' in general can be applied to non-binding constructs and thus liftSOn d may not refer semExp other than the argument, such as:

ghci> :t liftSOn' (ol0 ::. End')
liftSOn' (ol0 ::. End')
  :: forall {k} {kR} {k'} {semExp :: [k] -> k -> *}
            {proxy :: ([k] -> k -> *) -> *} {semR :: [k] -> kR -> *}
            {sem :: [k] -> k' -> *} {a :: k'} {r :: kR}.
     LiftVariables semExp =>
     proxy semExp
     -> (forall (env :: [k]). sem env a -> semR env r)
     -> EnvI sem a
     -> EnvI semR r

I will close the issue if there are no issues with the new lifting functions.