typelevel / spotted-leopards

Proof of concept for a cats-like library built using Dotty features
Apache License 2.0
118 stars 11 forks source link

Can we enable serialisation of a Free Monad? #3

Closed bblfish closed 2 years ago

bblfish commented 5 years ago

I have an interesting use case of Free Monads/Free applicatives that would when scalability becomes an issue require Free Monads and Free Applicatives, or equivalent framework (using perhaps effects) to make FreeMonads serialisable. I have kept some pointers of the discussion which lead me here in this Twitter thread.

Usually Free Monads are explained to be useful as they allow one to change interpreter. But it is also possible to allow them to be have many interpreters, one for each executed command.

I have implemented this on the rww-play server following an initial implementation sketch written up by @betehess. The algebra is given in LDPCommands.scala. Note that every LDPCommand has a URI. This allows the Akka interpreter of a FreeMonad constructed in this test example to route the FreeMonad datastructure depending on what the next instruction is as in one of these Router Actors. If the URI is local to the server the structure is sent as a message to the actor that deals with that resource, or else to a web actor that fetches it on the web (could also be a cache actor).

This allows one to write generic scripts to fetch data on the web and act on the data received, without having to bother about implementation details. But if one were to extend this to an actor system that spanned multiple actor instances potentially living on different servers, one would need I suppose to serialise these Free Monad/Applicative based scripts. This is not possible with cats now.

Would this be possible in dotty? Or should I consider a completely different framework that could give me the same level of abstraction?

tpolecat commented 5 years ago

If you define case class Free[F[_], A](resume: Either[A, F[Free[F, A]]]) then you have something you can serialize (if you can serialize A) and it will work just like cats.free.Free … but at the expense of laziness and stack-safety which come from reifying flatMap, which is the root of your serialization problem. I don't know of a means to have it both ways.

bblfish commented 5 years ago

Btw. In the cats gitter channel I was pointed to the unison haskell like programming language which allows "all values including functions" to be serialisable. See this discussion on their gitter channel. I am not sure how they do that, but it indicates that it can be done.

That language is only 4 years old, and would be quite a shift and risk to move to.

bblfish commented 5 years ago

I am trying to understand how the FreeMonad creates the problem.

In one of the test examples I build up my Free structure in the usual way:

val createContainerScript = rwwAgent.execute {
      for {
        ldpcUri <- createContainer(baseLdpc, Some("bertails"), Graph.empty)
        ldpc <- getResource(ldpcUri)
        _ <- putLDPR(ldpc.acl.get, graphCollectionACL)
        acl <- getLDPR(ldpc.acl.get)
      } yield {
        assert(acl isIsomorphicWith graphCollectionACL)
      }
}

This creates a free data structure that is tied together with functions. It could be more complicated by taking a returned value say ldpc, and searching the graph contents before proceeding, using perhaps some information from that graph.

Why would that data structure tied together with functions be more serialisable if on the stack?

bblfish commented 5 years ago

ok, so it looks like functions are serialisable in Scala as they are objects.

Spark makes a lot of this feature apparently (see Serialization challenges with Spark and Scala). Scala proposed SIP-21 on spores to help with serialisation issues, but it was not much loved by Spark. Still Spores got revived in nov 2016 but has not moved much since then. It actually looks very useful. Perhaps dotty will help spores solve issues that made its adoption difficult?

So I guess the reification of FlatMap, makes it impossible to know what needs to be serialised next and so would require the whole JVM to be serialised?

bblfish commented 5 years ago

So it looks that some upcoming cool features may be able to solve the Free stack problem.

A coroutine instance does not execute on the same program stack as the caller. Instead, it maintains a separate highly optimized stack, used to store its execution state,

Could these constructs help resolve the problem of a Stackless Free Monad?

Note that cooperative continuations are available in Kotlin.

SethTisue commented 5 years ago

This is not possible with cats now

are you sure? as you point out,

functions are serialisable in Scala

and also, I don't think this argument makes sense:

I guess the reification of FlatMap, makes it impossible to know what needs to be serialised next

as being able to serialize the function should cover it afaics. (both ends need to have the same classes on the classpath, but that's a usual constraint for this kind of thing.)

it's an interesting question/discussion, but \this repo doesn't seem like the place for it\</topic police> 👮‍♂️, as any connection to Dotty seems almost entirely speculative

bblfish commented 5 years ago

I guess the reification of FlatMap, makes it impossible to know what needs to be serialised next

as being able to serialize the function should cover it afaics. (both ends need to have the same classes on the classpath, but that's a usual constraint for this kind of thing.)

I was going on what @tpolecat wrote above

but at the expense of laziness and stack-safety which come from reifying flatMap

So the problem seems to be that with current cats Free one either has a non serialisable structure or one rolls one's own and has one that risks running out of stack space. It seems that if one is looking to see what a future improved version of cats could be it makes sense to investigate what would be needed. I can't tell in advance of asking the question if it is dotty that solves the problem or something else.

bblfish commented 5 years ago

Actually I just realised that a stack un-safe version of Free as mentioned by @tpolecat

If you define case class Free[F[_], A](resume: Either[A, F[Free[F, A]]]) then you have something you can serialize (if you can serialize A) and it will work just like cats.free.Free

would actually work for my use case as each frame of the stack is executed by a different actor on a different stack. So as long as the functions linking the Free Monads commands don't use up a lot of stack, which is something one can warn developers about, then things should be fine.

Lesson: stack safety in Free is not always needed. It depends on the interpreter :-)

jdegoes commented 5 years ago

If you define case class Free[F[_], A](resume: Either[A, F[Free[F, A]]]) then you have something you can serialize (if you can serialize A) and it will work just like cats.free.Free

Unfortunately this won't work. Cats Free doesn't require that F[_] be a functor, whereas the above (classic) definition of Free does have this requirement. So when you have F[_] terms that produce information, you'll end up embedding continuations in those operations:

e.g.:

sealed trait ConsoleF[+A]
final case class ReadLine[+A](next: String => A) extends ConsoleF[A]
final case class WriteLine[+A](line: String, next: A) extends ConsoleF[A]

Any structure with the power of Monad will end up storing a continuation, in one form or another: in this case, either in the F[_] or in the Free.

That said, you can serialize functions "well enough" to make a billion dollar industry (Spark), and fundamentally, serializing Free is all about serializing functions. The main problems (aside from instability of the protocol) are the random objects that will be accidentally captured in the closures, some of which won't be sanely serializable (e.g. a thread pool, an http connection pool, a local cache).

If your code accesses all resources (i.e. non-data) through a reader monad (like ZIO Environment encourages), however, then you can shift these "non-serializable dependencies" onto the remote node, which can satisfy them after deserialization. Now the compiler can't help you do the right thing, but it provides a path toward reasonable serialization.

Even without that, despite its pains, Spark shows serialization can be useful (i.e. solve business problems), with a level of pain that comes with the territory.

tpolecat commented 5 years ago

That's a very good point. Your underlying functor must be serializable. So it would work for Free[List, ?] for instance but won't work in general. In particular the GADT-to-DSL pattern won't work because you'll need to encode the continuation as @jdegoes noted, or you'll need to use a free functor which does the same thing.