leostera / reactor

🚀 Native Actors for Reason and OCaml
70 stars 5 forks source link

Type-safe message passing across worker boundaries #5

Closed leostera closed 5 years ago

leostera commented 5 years ago

This attempt at getting type-safe message passing across worker boundaries had a few concerns involved.

Unfortunately, the Message.t extensible variant being used before loses type information across process boundaries due to a design decision in the Marshal module. This meant that while messages type-checked as valid to send, and arrived perfectly at the workers, they had lost the information that made pattern-matching on them possible and thus were unconsumable.

Removing this variant meant parametrizing Pid.t and Process.t with the type of messages that will be received by the process, so that <- could type-check at the call site that we would only send a message to a process that will actually handle that message.

This meant refactoring the Registry.re module, that is, the Process Registry, to become a heterogeneous collection that can keep several Procees.t('m) together, where each 'm may be a different message time.

The refactoring process tried out:

  1. CCHet.Tbl,
  2. Hmap, and
  3. Gmap

All of which have their draw backs. The first 2 were able to encode uniqueness of all types and thus allow reactor to have type-safe message passing on a per process basis, and the 3rd one relied on a bounded type-universe of messages known upfront.

Because reactor as a library does not know in advance the types of all the messages that are being sent in it, the 3rd option was originally discarded.

Options 1 and 2 have worked perfectly fine in the context of a single process, but since they rely on first-class modules and extensible variants, they suffer the same problems that the original solution.

leostera commented 5 years ago

Ultimately going down the path of the 3rd option would mean functorizing the Process registry to ask the user for the universe of types that are valid message types to be stored.

In the small this isn't that big of a problem, but if the system grows big (and it can quickly do so, given actors are cheap), we're looking at a single GADT definition that pretty much everything depends on, and that depends on pretty much everything.

That is, the module messages.re defined by the user of the library, would have a single GADT that depends on each of the module of each of the actor types; and every module that needs to send a message would depend on the message.re module. This will very quickly get quite messy.

I will instead pursue dynamically-typed messages and encoding them with msgpack in the meantime.