ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
560 stars 72 forks source link

Switches documentation #548

Open patricoferris opened 1 year ago

patricoferris commented 1 year ago

It was brought up in the Eio developer meeting that more documentation around switches would be useful (cc @avsm). At the moment, we have the README documentation which is great. Having programmed a bit with switches now I think there are a few common areas of confusion.

How switches interact with different Eio resources

Will it always be the case that an Eio resource that can be closed will be closed when the switch body returns. I always get a little confused because Fibers will be waited upon rather than being cancelled when a switch is ready to be "turned off". But after programming a bit with functions like Path.with_open_in this became a lot clearer. Perhaps some more functions in the documentation like this would make sense.

When to create a switch and where to put it ?

This is quite like the first question but more from an end user perspective. Perhaps some "common mistakes" with switches would help with this, for example:

open Eio

let make_txt dir =
  Switch.run @@ fun sw ->
  Path.(open_out ~create:(`If_missing 0o644) ~sw (dir / "hello.txt"))

let () =
  Eio_main.run @@ fun env ->
  let cwd = Stdenv.cwd env in
  let readme = make_txt cwd in
  Flow.write readme [ Cstruct.of_string "Hello World" ]

I think a lot of people's initial response (like mine) is "Ahhh! This function needs a switch! Quickly, put one at the top of the function that's calling the function that needs a switch".

How to use switches for manual cancellation

This is more of a problem that I stumbled across when using switches in the real-world. I ported OCurrent to Eio (and just updated to eio.0.10.0 and replaced Current.Switch.t with Eio.Switch.t) https://github.com/ocurrent/ocurrent/pull/388. Here the switch was very like an Lwt_switch. One nice thing is they don't force the user to use a function's scope as the switches lifetime which is used in this test for example https://github.com/ocurrent/ocurrent/blob/842667b5aec4ada5f15cb42ba78ab113641aa941/test/test_job.ml#L123-L146. To do something like that in Eio's switches I think you have to sort of invert things and wrap them in a function, see https://github.com/patricoferris/ocurrent/blob/41899443891a71364e4c407f57fbfd7f246fc183/test/test_job.ml#L153-L192.

Anyway, the main thing is that this makes Eio switches a little cumbersome to be used when you really do envisage using them to cancel things, I ended up creating a little helper function:

let with_cancellable_switch fn =
  try
    Eio.Switch.run @@ fun sw ->
    fn (sw, fun () -> Eio.Switch.fail ~bt:(Printexc.get_callstack 5) sw Test_cancel)
  with
    | Test_cancel -> ()

There's probably a nicer way to do this, but I can imagine people finding themselves in a similar position and wanting something like this.

talex5 commented 1 year ago

Will it always be the case that an Eio resource that can be closed will be closed when the switch body returns.

A resource attached to a switch will be closed when the switch finishes (which happens when the main switch body and any attached fibers have all finished). However, note that closing a resource might not immediately close e.g. the underlying file-descriptor if it's still being used by another operation (which must be running in some other switch). I might add a way of transferring resources between switches, in future.

Perhaps some "common mistakes" with switches would help with this, for example

There's a bit more about that at the end of the README: https://github.com/ocaml-multicore/eio#switches-1.

How to use switches for manual cancellation

Yeah, this can be awkward. Though in that example you could just call Job.cancel I think, instead of using the switches. I've thought about having a Fiber.fork_with_cancel or something for that (https://github.com/ocaml-multicore/lwt_eio/blob/220dfd5c57b3d8a620f908daa58dad993719e005/lib/lwt_eio.ml#L21-L32). You can also use Cancel.sub instead of using a whole switch. But perhaps these cases are better handled by taking a ?stop argument with a promise, as Net.run_server does?

SGrondin commented 1 year ago

I've found that a good analogy to help explain Switches to others is the Unix process.

Your running code keeps the process alive, unhandled exceptions/signals kill the process, but no matter what happens all dangling file descriptions (resources) are closed by the OS upon process termination.

Switch ~~ process Fibers ~~ threads/code ~~ keeps the process running Resources ~~ file descriptors ~~ get cleaned up automatically and they don't keep the process alive on their own