tomjaguarpaw / Arrows2

Ideas for the next generation of Haskell's Arrow notation
12 stars 0 forks source link

Refactoring the Arrow hierarchy #8

Open evincarofautumn opened 4 years ago

evincarofautumn commented 4 years ago

This is just to get a conversation going—I plan to add examples and detail as I have time.

I often run into types that are almost, but not quite, an Arrow—but for the fact that they have no implementation of arr usually, but also sometimes because they lack notions of copying, dropping, or swapping (that is, contraction, weakening, and exchange), or most recently because Arrow isn’t polykinded (as a result of containing arr). Sometimes I can work around this with a free arrow over a bifunctor—essentially building a graph with the free arrow where the nodes are annotated with my effects:

data FreeArrow a i o where

  -- | Lift a function into the arrow.
  Pure
    :: (i -> o)
    -> FreeArrow a i o

  -- | Embed an effect in the graph.
  Eff
    :: a i o
    -> FreeArrow a i o

  -- | Sequential composition.
  Seq
    :: FreeArrow a i x
    -> FreeArrow a x o
    -> FreeArrow a i o

  -- | Parallel composition.
  Par
    :: FreeArrow a i1 o1
    -> FreeArrow a i2 o2
    -> FreeArrow a (i1, i2) (o1, o2)

instance Category (FreeArrow a) where
  id = Pure id
  (.) = flip Seq

instance Arrow (FreeArrow a) where
  arr = Pure
  first f = Par f id
  second f = Par id f
  (***) = Par

runFreeArrow
  :: forall f a i o
  . (Arrow a)
  => (forall x y. f x y -> a x y)
  -- ^ Embed an effect in the target arrow.
  -> FreeArrow f i o
  -- ^ Free arrow computation.
  -> a i o
  -- ^ Evaluated arrow (e.g. ‘Kleisli m’ for some ‘Monad m’)
runFreeArrow handle = run
  where
    run :: forall m n. FreeArrow e m n -> a m n
    run = \ case
      Pure f     -> arr f
      Seq f1 f2  -> run f2 . run f1
      Par f1 f2  -> run f1 *** run f2
      Eff effect -> handle effect

But often I can’t get away with this formulation. That’s frustrating, because I really like proc notation and the structural guarantees I can get from it. So I’d like to start a discussion about how we want to refactor the class hierarchy going forward, with an eye to increasing flexibility and preserving backward compatibility as much as possible, with a suitable deprecation plan—e.g. adding a language extension to desugar proc notation in terms of the new hierarchy.

I think a good starting point is the monoidal category formulation that @cgibbard brought up in the Constraint-Based Arrow Notation proposal (rephrased slightly):

assocL :: a (x, (y, z)) ((x, y), z)
assocR :: a ((x, y), z) (x, (y, z))
unitL :: a x ((), x)
unitR :: a x (x, ())
counitL :: a ((), x) x
counitR :: a (x, ()) x
swap :: a (x, y) (y, x)

I think if I were doing this today, I would start from the class hierarchy defined in http://hackage.haskell.org/package/categories-1.0.7 -- from Bifunctor (which gets us (***), first, and second), Associative/Monoidal/Braided/Symmetric (which give the isomorphisms I listed above), and then Cartesian (which gives (&&&) as well as projections and the diagonal).

In particular, figuring out how to divide that up into a hierarchy that’s within a surmountable distance from the current state of affairs. For example, without abstracting over the tensor product and sum types, and just using (,) (as above) and Either:

-- Parallel composition:
class Bifunctor{-ish-} a where
  (***) :: a x y -> a x' y' -> a (x, x') (y, y')
  first :: a x y -> a (x, z) (y, z)
  second :: a x y -> a (z, x) (z, y)

-- Sequential composition:
-- (category)
class Category a where
  id :: a x x
  (.) :: a y z -> a x y -> a x z

-- Shuffling of the “stack”:
-- (monoidal category)
class (Bifunctor a) => Associative a where
  assocL :: a ((x, y), z) (x, (y, z))
  assocR :: a (x, (y, z)) ((x, y), z)
class (Associative a) => Monoidal a where
  counitL :: a ((), x) x
  counitR :: a (x, ()) x
  unitL :: a x ((), x)
  unitR :: a x (x, ())

-- Exchange:
-- (symmetric/braided monoidal category)
class (Associative a) => Braided a where
  swap :: a (x, y) (y, x)
class (Braided a) => Symmetric a
  -- swap >>> swap = id

-- Weakening:
-- (semicartesian monoidal category)
class (Symmetric a, Monoidal a) => Semicartesian a where
  fst :: a (x, y) x
  snd :: a (x, y) y

-- Contraction:
-- (relevance monoidal category)
class (Symmetric a, Monoidal a) => Relevance a where
  dup :: a x (x, x)
  (&&&) :: a x y -> a x z -> a x (y, z)

class (Semicartesian a, Relevance a) => Cartesian a

-- Choice (cf. ArrowChoice):
-- (co-Cartesian monoidal category)
class (Monoidal a, Symmetric a) => CoCartesian a where
  inl :: a x (Either x y)
  inr :: a y (Either x y)
  codup :: a (Either x x) x
  (|||) :: a x z -> a y z -> a (Either x y) z

-- Higher-order arrows (cf. ArrowApply):
-- (Cartesian closed category)
class (Cartesian a) => CartesianClosed a where
  apply :: a (a x y, x) y
  curry :: a (x, y) z -> a x (a y z)
  uncurry :: a x (a y z) -> a (x, y) z

-- Knot-tying (cf. ArrowLoop):
-- (traced monoidal category)
class (Arrow a) => Traced a where
  loop :: a (x, z) (y, z) -> a x y

-- Lifting (cf. Arrow):
class PreArrow a where
  arr :: (x -> y) -> a x y

If anyone has examples of cases that are not Arrows but where proc notation would be really useful, please share them here, as that’s exactly the underserved market I’m hoping to address—and with luck, popularise arrow notation more broadly by making it more useful.

ivanperez-keera commented 4 years ago

Examples where I've needed transformers and arrows would not do:

As a general note to the people participating: following discussions in the GHC repo, I can see that a lot of this work is rooted in mathematical background I do not have or just don't dominate with enough proficiency. Although it may slow everyone else down (and I really hope I'm not a drag), I'd appreciate and think I could be a better asset in this line of work if I understood the discussion well. A lot of this responsibility rests on me (I have to study and read), but I'd appreciate if the concepts were explained for an audience that is not necessarily well-versed in category theory, and potentially, if we have online meetings to discuss some technical details.

One more thing: I've always found the need for proc and do pretty annoying. If we are changing syntax, maybe we can change that too?

evincarofautumn commented 4 years ago

I can see that a lot of this work is rooted in mathematical background I do not have or just don't dominate with enough proficiency. Although it may slow everyone else down (and I really hope I'm not a drag), I'd appreciate and think I could be a better asset in this line of work if I understood the discussion well.

Not at all, in fact I think it’s crucial that we take things slowly and make sure everyone is on the same page in terms of both theory and practical implementation within GHC—moving slowly and deliberately, sharing many examples, is how we achieve that. As a happy side effect, we might even get some high-quality documentation out of it, something that’s sadly lacking about arrows generally.

I've always found the need for proc and do pretty annoying. If we are changing syntax, maybe we can change that too?

Imagine we could magically implement whatever notation you wanted—what would it look like?

The syntax seems to need a few parts:

But there are a few possible avenues for improvement:

  1. Allowing a proc to take multiple arguments in the usual curried fashion like a lambda, e.g.: proc x y -> do … instead of proc (x, y) -> do … (even if the former is desugared to the latter internally somehow)

  2. Replacing proc with \, generalising lambdas to Arrow if they use arrow command notation (-< and -<<), e.g.: \ x -> do …

The former feels tractable to me, but probably has some subtleties that @lexi-lambda would understand better than me, being in the thick of it right now.

The latter seems like a benign surface-level change, but I get the sense it could have a lot of follow-on effects, and it’s not clear to me where to stop if we give that mouse that cookie. For example, to disambiguate lambda abstractions from arrow abstractions, we might have to figure out a consistent semantics of -< /-<< in isolation, then deal with the complicated and fiddly issues of environment scoping and the command analogue of eta-equivalence (fproc x -> do f -< x), then maybe want the addition of row types, then maybe move toward deeply integrating the Arrow and Monad hierarchies—certainly an admirable goal, but a daunting one!

evincarofautumn commented 4 years ago

I’ve been thinking about how to arrive at a robust implementation of desugaring using something like the above class hierarchy. I think the general structure of proc expressions can be handled straightforwardly like so:

  1. Construct the dataflow graph corresponding to the proc expression

  2. Topologically sort the graph into “stages” ([[Component]]) where the outer dimension is sequential composition and the inner dimension is parallel composition

  3. Between each parallel component, emit (***) (or the equivalent with first and second)

  4. Between each sequential component, emit (>>>)

  5. Where a stage requires a different association of tuple elements, permute them with assocl, assocr, and swap as needed

  6. Where a value is copied, insert dup (or (&&&))

  7. Where a value is dropped, insert fst or snd

Here’s a worked example borrowed from the GHC user’s guide:

proc x -> do
  y <- f -< x + 1
  g -< 2 * y
  let z = x + y
  t <- h -< x * z
  returnA -< t + z

I find the dataflow of such programs easier to visualise with a notation I devised for concatenative programs, a “dataflow table”, in which:

(Unfortunately, this example requires odd overlapping colspans that HTML table renderers don’t support very well, so ASCII art it is…)

Below I’ve also included some additional info:

Associativity is left implicit here (i.e., the input and output of assocl and assocr look the same in the table) partly to avoid even more clutter, and partly because in a concatenative setting, associativity of the stack is ordinarily left implicit, since it always associates downward, and all operations take place at the top of the stack (or are explicitly “dipped” under the top elements, analogous to first).

┌─x─────────────────────┐ x
│                       │
│ dup                   │ dup
│                       │
├─x─────────┼─x─────────┤ (x, x)
│           │           │
│           │ dup       │ id *** dup
│           │           │
├─x─────────┼─x─┼─x─────┤ (x, (x, x))
│           │   │       │
│ (+ 1)     │   │       │ arr (+ 1) *** (id *** id)
│           │   │       │
├───────────┤ x │ x     │ (x + 1, (x, x))
│           │   │       │
│ f         │   │       │ f *** (id *** id)
│           │   │       │
├─y─────────┤ x │ x     │ (y, (x, x)) = (f (x + 1), (x, x))
│           │   │       │
│ dup       │   │       │ dup *** (id *** id)
│           │   │       │
├─y─────┼─y─┼─x─┤ x     │ ((y, y), (x, x)) = ( (f (x + 1), f (x + 1)), (x, x) )
│       │       │       │
.       .       .       . assocl
.       .       .       .
. y     . y   x . x     . (y, (y, (x, x))) = ( f (x + 1), (f (x + 1), (x, x)) )
.       .       .       .
.       .       .       . id *** assocr
.       .       .       .
. y     . y   x . x     . (y, ((y, x), x)) = ( f (x + 1), ( (f (x + 1), x), x ) )
│       │       │       │
│ (2 *) │ swap  │       │ arr (2 *) *** (first swap *** id)
│       │       │       │
├───────┼─x─┼─y─┤ x     │ (_, ((x, y), x)) = ( 2 * f (x + 1), ( (x, f (x + 1)), x ) )
│       │       │       │
│ g     │ (+)   │       │ g *** (arr (uncurry (+)) *** id)
│       │       │       │
├───────┼─z─────┼─x─────┤ (_, (z, x)) = ( 2 * f (x + 1), ( x + f (x + 1), x ) )
│       .               .
│ drop  .               . snd
│       .               .
└───────┤ z       x     │ (x + f (x + 1), x)
        │               │
        │ swap          │ swap
        │               │
        ├─x─────┼─z─────┤ (x, z) = (x, x + f (x + 1))
        │       │       │
        │       │ dup   │ id *** dup
        │       │       │
        ├─x─────┼─z─┼─z─┤ (x, (z, z)) = ( x, ( x + f (x + 1), x + f (x + 1) ) )
        .           .   .
        .           .   . assocr
        .           .   .
        . x       z . z . ((x, z), z) = ( ( x, x + f (x + 1) ), x + f (x + 1) )
        │           │   │
        │ (*)       │   │ arr (uncurry (*)) *** id
        │           │   │
        ├───────────┤ z │ ( x * (x + f (x + 1)), x + f (x + 1) )
        │           │   │
        │ h         │   │ h *** id
        │           │   │
        ├─t─────────┼─z─┤ (t, z) = ( h (x * (x + f (x + 1))), x + f (x + 1) )
        │               │
        │ (+)           │ arr (uncurry (+))
        │               │
        ├───────────────┤ h (x * (x + f (x + 1))) + (x + f (x + 1))
        │               │
        │ returnA       │ id
        │               │
        └───────────────┘

Removing the table and stack state, leaving just the combinators, that becomes:

dup
  >>> (id *** dup)
  >>> (arr (+ 1) *** (id *** id))
  >>> (f *** (id *** id))
  >>> (dup *** (id *** id))
  >>> assocl
  >>> (id *** assocr)
  >>> (arr (2 *) *** (first swap *** id))
  >>> (g *** (arr (uncurry (+)) *** id))
  >>> snd
  >>> swap
  >>> (id *** dup)
  >>> assocr
  >>> (arr (uncurry (*)) *** id)
  >>> (h *** id)
  >>> arr (uncurry (+))
  >>> id

Which can be simplified by using first and second instead of (***) and applying the arrow laws, particularly those that extract more parallelism, e.g. (a *** b) >>> (c *** d) = (a >>> c) *** (b >>> d):

dup
  >>> ((arr (+ 1) >>> f >>> dup) *** dup)
  >>> assocl
  >>> second assocr
  >>> (arr (2 *) *** first (first swap >>> arr (uncurry (+))))
  >>> first g
  >>> snd
  >>> swap
  >>> second dup
  >>> assocr
  >>> first (arr (uncurry (*) >>> h)
  >>> arr (uncurry (+))

Unfortunately, I believe the path leading to the dead-end g can’t be erased entirely because we can’t discard its effects; but if we could, it would open further simplifications:

dup
  >>> ((arr (+ 1) >>> f) *** dup)
  >>> assocr
  >>> first (swap >>> arr (uncurry (+)) >>> dup)
  >>> swap
  >>> assocl
  >>> first (arr (uncurry (*) >>> h)
  >>> arr (uncurry (+))

Critically, through this entire process, we never had to reason “under” arr. The goal is that arr is always a black box that only appears when lifting Haskell functions into an arrow, so a proc expression that only connects other arrows together with no intervening computations won’t require PreArrow.

All of the “shuffling” operations that previously required arr are now expressed in terms of the generalised assocl/assocr, swap, dup, and fst/snd, which work on any arrow with the appropriate instances for Associative, Symmetric, Relevance, and Semicartesian, respectively. Any expression that does not use such operations will also not require the corresponding classes.


Next up, I’m still trying to work out how to desugar the more complex components of proc notation.

The first order of business is desugaring if and case without the needless use of arr. Ideally we would only need arr to lift an expression into the arrow for the if condition or case scrutinee, but I haven’t yet found a general method of doing so. Maybe we can draw some inspiration from selective applicative functors? In any case, I think we’ll need to extend Cocartesian slightly to account for it.

The second order (ha) is the desugaring of CartesianClosed and command abstractions.

Ericson2314 commented 4 years ago

I think the existing https://hackage.haskell.org/package/profunctor and https://hackage.haskell.org/package/category classes are all that we need.

@ivanperez-keera they way I think about it is anyone can, and should, cargo cult the cateogry theory text books on this stuff. Expertise is required to justify deviating from those definitions. The problems with arrows today, essentially, is that they deviate without a mathematically-rigorous argument why.

(e.g. a lot of the command stack stuff I don't think deserves the sugar and ad-hoc rules it already has, and would be better served with just a Reader arrow transformer.)

evincarofautumn commented 4 years ago

@Ericson2314: Could you expand a bit onn how those classes would provide the features we need? In particular, it seems like Profunctor has the same problem as arr. Also, can you sketch out what a Reader arrow transformer might look like?

I think some finer-grained structure is warranted because arr isn’t the only reason I’ve been unable to implement Arrow for types where proc notation would’ve been valuable. Breaking up Cartesian in particular lets you make DSLs with different substructural rules, which is currently exceedingly painful in Haskell—linear types will help somewhat with that.

Each of the classes I proposed has a direct categorical analogue, while the existing extension either doesn’t call this out explicitly, or deviates from it. We should certainly depend on existing classes like Profunctor where applicable, but also maybe retain existing names like ArrowLoop instead of Traced for backward compatibility. What I care about here is the simplest and most flexible structure that gets us something appreciably nicer to use.

If there’s a simpler way of handling command abstractions, I’m all for it, because the whole feature seems extremely delicate, and in fact I’m still learning how it all comes together—I figured I would just sit with it for a while and see if my right brain can find a nice pattern.

lexi-lambda commented 4 years ago

As @evincarofautumn notes, arr is easily definable given both Category and Profunctor. In fact, you can get away with just Category and Functor:

arr f = fmap f id
dimap f g h = arr g . h . arr f

It would definitely be nice to get Profunctor and Choice in base so they can be made superclasses of Arrow. Then arr could be given a default implementation. But if you want to get rid of arr, you need to do something far more radical.

lexi-lambda commented 4 years ago

Also, can you sketch out what a Reader arrow transformer might look like?

For what it’s worth, here’s the definition we use:

class Arrow p => ArrowReader r p | p -> r where
  askA :: p a r
  localA :: p a b -> p (a, r) b

newtype ReaderA r p a b = ReaderA { runReaderA :: p (a, r) b }

instance Arrow p => ArrowReader r (ReaderA r p) where
  askA = ReaderA (arr snd)
  localA (ReaderA f) = ReaderA (f . arr fst)

It’s useful, though I can’t say I know what @Ericson2314 has in mind when he says

(e.g. a lot of the command stack stuff I don't think deserves the sugar and ad-hoc rules it already has, and would be better served with just a Reader arrow transformer.)

as you generally want to use control operator syntax when calling runReaderA, i.e. (| runReaderA cmd |) env. And that needs the command stack.

More generally, when you write

they way I think about it is anyone can, and should, cargo cult the cateogry theory text books on this stuff. Expertise is required to justify deviating from those definitions. The problems with arrows today, essentially, is that they deviate without a mathematically-rigorous argument why.

I am not sure what you are alluding to. I suppose it’s true that all the environment/stack-threading business is bespoke, but that’s just part of arrow notation, not arrows proper.

Ericson2314 commented 4 years ago

@evincarofautumn

Yes, as @lexi-lambda says the Cateogry classes avoid arr. Profunctor doesn't but gives one something more restrictive than arr and id, which could be useful. She has also has the definition of ArrowReader I had in mind.


@lexi-lambda

I am not sure what you are alluding to. I suppose it’s true that all the environment/stack-threading business is bespoke, but that’s just part of arrow notation, not arrows proper.

Sorry, I was conflating the arrow class and arrow notation. (The arrow class as a grab-bag of axioms is also suspect, but that's already been discussed.)

I still think Arrow/Arrow notation today is an overly-complex dead end and a new arrows-like thing should start from scratch (though I do think your proposal improves arrows as they exist today).

as you generally want to use control operator syntax when calling runReaderA, i.e. (| runReaderA cmd |) env. And that needs the command stack.

So your tutorial (and that's the first I really know of these banana brackets!) It seems like their original use-case was "smuggling" data into the arguments of functions that transform arrows since closures are not available. But one can just use ReaderA to do that.

Perhaps there are other use-cases developed by those who have actually have used banana brackets I am ignorant of.

lexi-lambda commented 4 years ago

It seems like their original use-case was "smuggling" data into the arguments of functions that transform arrows since closures are not available. But one can just use ReaderA to do that.

Yes, and one can also do it by passing more arguments to functions. Why have abstractions at all?

The usefulness of arrow notation is that it provides a pointful, lexically-scoped embedded language that compiles to the arrow operators, the latter of which are user-specified. You can get lots of useful behaviors by swapping the arrow operators with different ones, without completely giving up on a familiar, intelligible programming model.

But arrows are not always cartesian closed, so arrow notation is distressingly first-order. The illusion breaks down as soon as you need to apply a transformation to an arrow itself: if you exit the notation and re-enter it, none of the “bindings” are in scope. So banana brackets are a syntax to specifically accommodate this “quick exiting and re-entering” so the arrow notation compiler knows to treat the sub-computation as an extension of the enclosing computation.

You can’t really get around this. You can argue the merits of the particular details of the design that was eventually decided upon, but any notation for arrows (or arrow-like things) will face this challenge. You can’t solve this inadequacy of the embedded language with any amount of cleverness in the host language because you aren’t in the host language. Banana brackets are the escape hatch back into the host language.

Ericson2314 commented 4 years ago

Yes, and one can also do it by passing more arguments to functions. Why have abstractions at all?

ReaderA allows:

do a  <- foo -< ()
   xs <- bar -< ()
   runReaderA (mapA (ReaderA $ proc (a, x) -> do
     b <- baz -< x
     qux -< (a, b))) -< (xs, a)

Yeah, it's not the same a so that's less slick, but crucially it is the same mapA.

I think the arrow notation's stack is not unlike a stack of reader effects. I bet somebody could make a HReaderA taking a type-level list and get something closer still.

evincarofautumn commented 4 years ago

I’ve been a bit stuck on the desugaring of case expressions with as few constraints as possible and especially without arr. Just if / Bool is easy to handle: we can desugar to x -> x -> a Bool x or a (Bool, (x, x)) x (cf. Conal Elliott’s IfCat), avoiding the current Arrow requirement of arr in arr (\ … -> if condition then Left … else Right …) >>> (trueBranch ||| falseBranch). It seems like case necessarily depends on the structure of the type, though—given a data type:

data T = Ctor_1 Field_1_1 … Field_1_n | … | Ctor_m Field_m_1 … Field_m_n

A case expression over that type:

case x :: T of { Ctor_1 … -> …; …; Ctor_m … -> … }

Needs to desugar to something that selects one of a series of continuations to apply:

choose_T
  :: a (Field_1_1, …, Field_1_n) r
  -> …
  -> a (Field_m_1, …, Field_m_n) r
  -> a T r

(Note that this takes the continuations as arguments so that it doesn’t require cartesian closure, which is needlessly strong for this.)

Maybe the data type itself could be desugared into sums and products (à la Generic), and we could use some uniform combinator to handle it, but that might not account for GADTs properly. I think the above combinator could support GADTs relatively easily—add a forall for any existentially quantified variables, and constraints for packaged-up dictionaries:

data T t where
  Ctor_1 :: forall a_1_1 … a_1_x. (C_1_1, …, C_1_k) => Field_1_1 -> … -> Field_1_n -> T t_1
  …
  Ctor_m :: forall a_m_1 … a_m_x. (C_1_1, …, C_1_k) => Field_m_1 -> … -> Field_m_n -> T t_m

choose_T
  :: (forall a_1_1 … a_1_x. (C_1_1, …, C_1_k) => a (Field_1_1, …, Field_1_n) r)
  -> …
  -> (forall a_m_1 … a_m_x. (C_1_1, …, C_1_k) => a (Field_m_1, …, Field_m_n) r)
  -> a (T t) r

(Of course, I’m sure there are subtleties I’m completely missing.)

I’m also assuming that each continuation takes all of the fields of the corresponding constructor, and then explicitly drops those that are unused, rather than having the case desugar to something that only passes the matched fields, to be explicit about substructural rules.

Also, this only handles matching each constructor once, with no guards; it’s possible to desugar more complicated case expressions to this form, but may have problems with efficiency or other things I haven’t considered yet.