unsplash / intlc

Compile ICU messages into code. Supports TypeScript and JSX. No runtime.
MIT License
56 stars 2 forks source link

Add pattern functor and utilise recursion schemes/catamorphisms #171

Closed samhh closed 1 year ago

samhh commented 1 year ago

Requires #170, else the catamorphism can no longer handle recursion in its entirety. Closes #48.

This is a pretty substantial PR conceptually. Aside from being fun and interesting, I think it also makes bugs like #101 less likely. Recursion schemes enable us to no longer explicitly recurse and instead, in the case of catamorphisms, merely handle the gathered data at each layer. It's essentially abstracting out the explicit act of recursing, so instead of in that sense saying how data will be transformed, we instead say what we'll do with said data.

NodeF a is a pattern functor rewrite of Node. Wherever Node references itself recursively, the recursive references are replaced with a reference to type argument a. This allows us to derive base typeclass instances like Functor and, more relevantly, Recursive and Corecursive instances which come from the recursion-schemes package. NodeF could be generated by makeBaseFunctor, but I think it's helpful to see how the types relate without the magic of metaprogramming.

If you're wondering how NodeF a can ever become Node and not just end up as NodeF (NodeF (NodeF ..)), check out Fix. It's a bit trippy.

The major winners of this PR are plural expansion and lint rules. Lots of error-prone boilerplate has been removed.

As with lots of other things Haskell, when there are new problems to solve we can do them in a generalised way. Knowledge of recursion schemes is reusable.

I haven't figured out flattening via recursion schemes yet. It's on the list alongside utilising the pattern functor to hold additional information (see https://github.com/unsplash/intlc/issues/48#issuecomment-1199563996).

OliverJAsh commented 1 year ago

@samhh Do you still have the code for the demo you showed earlier? I would love to play with it!

samhh commented 1 year ago
@OliverJAsh Sure thing!: ```diff commit c11a444439d80605273d289f3c16abbb8461154a Author: Sam A. Horvath-Hunt Date: Tue Oct 18 17:18:01 2022 +0100 Demo diff --git a/intlc.cabal b/intlc.cabal index 97ceeb2..c3a8b49 100644 --- a/intlc.cabal +++ b/intlc.cabal @@ -21,8 +21,10 @@ common common build-depends: base ^>=4.15 , bytestring ^>=0.11 + , comonad ^>=5.0 , containers ^>=0.6 , extra ^>=1.7 + , free ^>=5.1 , mtl ^>=2.2 , optics ^>=0.4 , recursion-schemes ^>=5.2 @@ -69,6 +71,7 @@ library Intlc.Backend.TypeScript.Language Intlc.Backend.TypeScript.Compiler Intlc.Core + Intlc.Demo Intlc.ICU Intlc.Linter Intlc.Parser diff --git a/lib/Intlc/Demo.hs b/lib/Intlc/Demo.hs new file mode 100644 index 0000000..8af3f10 --- /dev/null +++ b/lib/Intlc/Demo.hs @@ -0,0 +1,160 @@ +{-# LANGUAGE DeriveAnyClass #-} +{-# LANGUAGE DeriveTraversable #-} +{-# LANGUAGE DerivingStrategies #-} +{-# LANGUAGE PatternSynonyms #-} +{-# LANGUAGE ScopedTypeVariables #-} +{-# LANGUAGE TypeFamilies #-} + +module Intlc.Demo where + +import Control.Comonad +import Control.Comonad.Cofree +import Data.Char (toUpper) +import Data.Functor.Foldable hiding (fold) +import qualified Data.Text as T +import Prelude + +-- Our (reduced) AST represented as a single sum type. +data Node + = Fin + | Char { value :: Char, next :: Node } + | String { argName :: Text, next :: Node } + | Bool { argName :: Text, trueChild :: Node, falseChild :: Node, next :: Node } + | Callback { argName :: Text, child :: Node, next :: Node } + deriving (Show, Eq, Generic, Recursive, Corecursive) + +-- | A "pattern functor" representation of `Node`. +data NodeF a + = FinF + | CharF { valueF :: Char, nextF :: a } + | StringF { argNameF :: Text, nextF :: a } + | BoolF { argNameF :: Text, trueChildF :: a, falseChildF :: a, nextF :: a } + | CallbackF { argNameF :: Text, childF :: a, nextF :: a } + deriving (Show, Eq, Functor, Foldable, Traversable, Generic) + +-- Link the two types. I haven't researched how this works yet. :see-no-evil: +type instance Base Node = NodeF + +-- Char 'a' Fin <> Char 'b' (Char 'c' Fin) = Char 'a' (Char 'b' (Char 'c' Fin)) +instance Semigroup Node where + l <> r = case l of + Fin -> r + Char c l' -> Char c (l' <> r) + Bool n t f l' -> Bool n t f (l' <> r) + String n l' -> String n (l' <> r) + Callback n c l' -> Callback n c (l' <> r) + +instance Monoid Node where + mempty = Fin + +-- "abc" = Char 'a' (Char 'b' (Char 'c' Fin)) +instance IsString Node where + fromString = foldr Char Fin + -- -- Long-form: + -- fromString (xs :: String) = foldr (\c n -> Char c n) Fin xs + +-- +-- Catamorphisms! -- +-- + +-- An example AST representing: +-- Hello {isAdmin, boolean, true {your lordship} false {{name}}}! +ast :: Node +ast = mconcat + [ "Hello " + , Callback + { argName = "bold" + -- The `"!"` could also go here. With the semigroup concat under `mconcat` + -- they're equivalent. + , next = Fin + , child = Bool + { argName = "isAdmin" + , next = Fin + , trueChild = "your lordship" + , falseChild = String + { argName = "name" + , next = Fin + } + } + } + , "!" + ] + +-- Let's make all our text uppercase for some reason. This looks a bit like +-- plural expansion. +toUpper' :: Node -> Node +toUpper' = cata $ \case + CharF c x -> Char (toUpper c) x + x -> embed x + +-- Catamorphisms can fold a tree down to anything! This looks a bit like a lint +-- rule. +allArgNames :: Node -> [Text] +allArgNames = cata $ \case + StringF { argNameF, nextF } -> argNameF : nextF + BoolF { argNameF, trueChildF, falseChildF, nextF } -> argNameF : trueChildF <> falseChildF <> nextF + CallbackF { argNameF, childF, nextF } -> argNameF : childF <> nextF + -- `fold` here specialises to: NodeF [Text] -> [Text] + -- + -- Because `NodeF` is foldable (derived above) and `[Text]` forms a monoid. + -- + -- It could be used more to reduce the need to even directly reference `nextF` + -- et al above. + x -> fold x + +-- How about compilation? This tiny function, which is guaranteed to terminate, +-- can compile any `Node` AST to an ICU message. +compile :: Node -> Text +compile = cata $ \case + FinF -> mempty + CharF c x -> T.singleton c <> x + StringF n x -> mconcat [ "{", n, "}", x ] + BoolF n t f x -> mconcat [ "{", n, ", boolean, true {", t, "} false {", f, "}}", x ] + CallbackF n c x -> mconcat [ "<", n, ">", c, "", x ] + +-- What about something effectful? Here we have a `Reader` which tracks how +-- many interpolation layers deep it is and appends it to the argument name. +layerCounts :: Node -> Node +layerCounts x' = runReader (cata go x') (0 :: Int) where + go :: NodeF (Reader Int Node) -> Reader Int Node + go FinF = pure Fin + go (CharF c x) = Char c <$> x + go (StringF n x) = String <$> appended n <*> incremented x + go (BoolF n t f x) = Bool <$> appended n <*> incremented t <*> incremented f <*> incremented x + go (CallbackF n c x) = Callback <$> appended n <*> incremented c <*> incremented x + appended n = (n <>) . show <$> ask + incremented = local (+1) + +-- +-- Other recursion schemes -- +-- + +-- Histomorphisms allow us to pattern match against previous recursions. Here +-- we'll replace any instance of "Hello" with "Hi". (I don't really understand +-- `Cofree` or comonads more generally yet.) +informal :: Node -> Node +informal = histo go where + go :: NodeF (Cofree NodeF Node) -> Node + go (CharF 'H' (_ :< CharF 'e' (_ :< CharF 'l' (_ :< CharF 'l' (_ :< CharF 'o' x))))) = + "Hi" <> extract x + go x = embed (extract <$> x) + +-- In theory you can write a parser as an anamorphism. (This merely implements +-- plaintext.) +-- +-- Note the reversal of the shape of `f` compared to a catamorphism. This is a +-- coalgegra (`a -> f a`) where catamorphism takes an algebra (`f a -> a`). +parse :: Text -> Node +parse = ana f where + f :: Text -> NodeF Text + f x = case T.uncons x of + Nothing -> FinF + Just (y, zs) -> CharF y zs + +-- A hylomorphism is just an anamorphism followed by a catamorphism, so this +-- could borrow the algebras from `compile` and `parse`, efficiently composing +-- the two. +parseAndCompile :: Text -> Text +parseAndCompile = hylo compileF parseF where + parseF :: Text -> NodeF Text; parseF = undefined + compileF :: NodeF Text -> Text; compileF = undefined ```