ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
548 stars 66 forks source link

Eio design considerations #459

Closed rand00 closed 1 year ago

rand00 commented 1 year ago

I just saw the introduction to Eio, and took some notes underway concerning issues I'm expecting to have using Eio. I have a bunch of experience with Lwt, so that is my baseline for comparison. I think Eio is great in many respects, so these points are just intended to be food for thought for how these problems maybe could be overcome.

  1. In comparison with Lwt, in Eio the yields are implicit, as they are not in the types. This means that the developer need to think more about if/when there is a need to yield; and to avoid unneccesary manual insertions of yields, need to know if any called procedure yields.
  2. As it is the yield that throws an exception in the lambda given to Fiber.both - then the possibility of cancellation depends on if/when a yield is present in the code running in the lambda. This underlines the first point.
    • This is also a problem with Fiber.first
  3. With Switch, the developer need to pass switches explicitly, which means that one hopefully doesn't pass the wrong switch when multiple are in scope. This breaks the structured concurrency if the developer makes a mistake. In this way Lwt is as well structured concurrency, as long as one remembers to bind on a Lwt.t and not use Lwt.async.
  4. OS security features like seccomp etc. was mentioned - but if the ideal of Eio is to run across any backend (as MirageOS is a goal too), then I guess the most crossplatform way would be to ensure these things on the OCaml side instead.

Again, I find Eio compelling in several ways, and hope these considerations turn out to be useful.

talex5 commented 1 year ago

These are good questions :-)

Yielding

Outside of tutorials and tests, yielding is mostly unnecessary. For example, gemini-eio has no yields, and neither does the HTTP test server. IO operations usually yield implicitly, so it's only CPU-intensive code that needs to yield (and it might be better to use a separate batch domain for that anyway).

More generally, Lwt's types tell you whether a function can switch threads or not. However (surprisingly), this doesn't seem to be all that useful. I think this is because either:

  1. You know what the function you're calling does, including whether it will suspend. e.g. with incr, List.length, Hashtbl.add, etc, you know it won't; or
  2. You don't know what it does (e.g. because you were passed the function as an argument). In that case it might switch threads, allowing another thread to do something unsafe (such as mutating some state you are using). However, in that case you also have to assume that the unknown function itself might perform the dangerous action, so it doesn't really change anything.

It is possible to contrive cases where it matters. e.g.

let run f =
  let finished = ref false in
  Fiber.both
    (fun () -> f (); assert (not !finished))
    (fun () -> finished := true)

Here, f can't set finished directly, but it can do it indirectly by yielding. This doesn't seem to be a problem in practise.

Cancellation

I'm not sure what the problem is here. Fiber.first is similar to Lwt.pick, where one branch finishing will cause an exception (Lwt.Canceled) to be raised in the other.

Ideally, code shouldn't care it if gets cancelled or not and can just let the switches clean things up, but you can use Eio.Cancel.protect to prevent a block of code from getting cancelled from outside.

Switches

This breaks the structured concurrency if the developer makes a mistake.

If I call Foo.run 5 then structured concurrency says that when it returns, all fibers it spawned have finished and any file descriptors it opened have been closed. If run creates multiple switches and uses the wrong one for something then that might cause problems internally for run, but we can still be sure everything is finished when it returns.

In this way Lwt is as well structured concurrency, as long as one remembers to bind on a Lwt.t and not use Lwt.async.

You need to go further than that. e.g.

let run () =
  let a = Lwt_unix.sleep 1.0 >|= fun () -> print_endline "a" in
  let b = Lwt_unix.sleep 2.0 >|= fun () -> print_endline "b" in
  Lwt.choose [a; b]

Here, run () prints "a" and then returns. Later, the other fiber prints "b".

OS security

if the ideal of Eio is to run across any backend (as MirageOS is a goal too), then I guess the most crossplatform way would be to ensure these things on the OCaml side instead.

Eio does intend to ensure that, but OCaml doesn't currently enforce it. Take the hello world example:

let () =
  Eio_main.run @@ fun env ->
  main ~stdout:(Eio.Stdenv.stdout env)

Since we didn't take net from env, we should be able to assume that main won't use the network. However, main could just ignore Eio and do Unix.connect directly. However, if we know that only the Eio APIs are being used then we can use that to know that we may drop access to the network. e.g.

let () =
  Eio_main.run @@ fun env ->
  let stdout = Eio.Stdenv.stdout env in
  Eio.Stdenv.drop_privileges env;
  main ~stdout

Here, drop_privileges knows that it can ask the OS to disable connect and bind (this isn't implemented yet and needs a bit of refining). This gives a bit of extra protection against C code too.

It's less of a concern in unikernels because there usually isn't much to escape to. If you take over a Unix process then you might find other useful stuff on the same system to attack, but usually a VM doesn't have access to unrelated things.

rand00 commented 1 year ago

For example, gemini-eio has no yields, and neither does the HTTP test server.

The problem relative to Lwt here is that for Lwt you know from the types which procedures yield. With Eio, if you pass a lambda that doesn't ever yield to Fiber.both, then cancelling Fiber.both will take as long as it takes to finish the non-yielding lambda.

With Fiber.first it's more of a problem, as the developer will expect this to return right away after the first fiber returns, but this is not the case if the lambda to be cancelled never yield, then it will take as long to return as this non-yielding lambda.

I wonder if with Eio we are in an alike situation as with Lwt - that a library used with Eio e.g. need to be customized to yield once in a while to make cancellation timely.

Is the pie-in-the-sky best semantics maybe that OCaml makes any code yield once in a while (unless protected by some construct)?

Nice with the drop_privileges!

talex5 commented 1 year ago

The problem relative to Lwt here is that for Lwt you know from the types which procedures yield. With Eio, if you pass a lambda that doesn't ever yield to Fiber.both, then cancelling Fiber.both will take as long as it takes to finish the non-yielding lambda.

Lwt doesn't help here: a Lwt.t return type says it might yield, but doesn't ensure that it won't also hog the CPU for a long time. e.g. in

Lwt.pick [f (); Lwt_unix.sleep 10.0]

f can spin on the CPU for an unbounded amount of time.

In some ways, Lwt makes it worse, because a function without a Lwt.t type can't yield, even if it wants to. e.g. a Map.iter f x in Lwt will never be able to yield. And of course, Eio lets you move CPU-intensive jobs to another domain.

Is the pie-in-the-sky best semantics maybe that OCaml makes any code yield once in a while (unless protected by some construct)?

We want to avoid that, because if anything can yield then you get all the problems of parallel programming, even in simple concurrent code.

In summary: if functions spinning on the CPU hasn't been a problem for you in Lwt, then there's no reason why it should be any different in Eio.

rand00 commented 1 year ago

Lwt doesn't help here: a Lwt.t return type says it might yield, but doesn't ensure that it won't also hog the CPU for a long time

There is no guarantee, but I would say that there is a very high probability that it will yield - at least the "unknowing developer" passing the lambda, will at least know from the types that this is expected.

If one makes a library that uses Eio internally; it will be up to the documentation instead of types of the library to warn the developer to call yield inside passed lambdas.

In some ways, Lwt makes it worse, because a function without a Lwt.t type can't yield

Yes Eio is cool in this way!

Related; you mentioned List.iter is equivalent to Lwt_list.iter_s - though to get the semantics of Lwt_list.iter_p, I guess one need to manually use Switch and fork. I then wonder if someone in the end will create e.g. an Eio_list module anyway?

talex5 commented 1 year ago

I then wonder if someone in the end will create e.g. an Eio_list module anyway?

There's one already: https://ocaml-multicore.github.io/eio/eio/Eio/Fiber/List/index.html

rand00 commented 1 year ago

Okay thanks, it was an enlightening discussion (: