robur-coop / miou

A simple scheduler for OCaml 5
https://docs.osau.re/miou/index.html
MIT License
94 stars 6 forks source link

Missing Unix bindings #42

Open kit-ty-kate opened 1 week ago

kit-ty-kate commented 1 week ago

While trying to use miou i encountered some roadblocks. These functions are missing from Miou_unix:

I feel like having these functions in Miou_unix would be highly valuable.

kit-ty-kate commented 1 week ago

After looking around a little more i may start to understand that syscalls non-associated with file descriptors may not be suitable to be part of Miou_unix, and that Lwt seems to use a trick based off a special job queue for those kind of functions. (cc @raphael-proust in case you're interested in the discussion or if i'm saying total garbage, which is probably the case)

If my understanding is correct, i would argue that having these functions in could still be useful even if they are defined the following way:

let readdir dir =
  Miou.yield ();
  Unix.readdir dir

This way, even if those functions block a little bit, we're mostly assured that most jobs have been taken care of semi-recently.

dinosaure commented 1 week ago

The issue is in fact more subtle than that and the choices made can lead to unwanted Miou behaviour. The idea of Miou.file_descr remains centred on sockets (and particularly on the non_blocking flag) in which suspension may be necessary as soon as reading or writing is involved because these can block the process.

The same applies, for example, to file-descrs from Unix.pipe, where reading from one is only available (and will not block) after writing to the other.

What is certain here is that we are talking about a possible suspension due to the fact that the Unix.read and Unix.write (or Unix.connect and Unix.accept) functions can block.

This is where there may be a notable difference. As far as files are concerned (which are also represented via a file-descr), we can be generally sure that Unix.read/Unix.write on them will never block! So the suspend and resume mechanism via select() is largely useless (and may even degrade performance). Worse, Miou's task management could set up a sort of openfile barrier and try to open trillions of files at once, which would lead to an EMFILE error (this is currently the problem with eio with this type of code, where you have to open lots of files in order to calculate the merkle-tree of a folder).

However, even if reading or writing is not blocked, these operations can take a long time. So the issue isn't one of suspending and resuming, but of cooperation when there are several tasks reading/writing files running cooperatively - it should be noted that this cooperation problem no longer exists when these tasks are running in parallel!

As a result:

More generally, it might be a good idea (and could be documented!) to use Unix.file_descr, Unix.read and Unix.write (or Stdlib.open_{in,out}, Stdlib.input & Stdlib.output) directly when manipulating files. The question about Miou.yield is also thorny and this choice must be made taking into account the design of the application (use of Miou.async or Miou.call, or both...).

So, I would perhaps be more inclined to document all this properly in Miou_unix, explaining that this module is particularly concerned with sockets and that the other operations (such as mkdir, readdir, openfile) should rather be done directly with the Unix module (while explaining the interest of cooperation and the use of Miou.yield or the possible parallelization of these operations with Miou.call). WDYT?

kit-ty-kate commented 1 week ago

will never block!

after reading this, i think having a job queue on another thread for IO purpose might be nice to have in miou.

IO functions can take time (e.g. hardware is busy, network file systems like NFS, low priority program, …) and i feel like for most applications of Miou (e.g. web servers, …) such delay can quickly become a huge problem.

Miou.yield would alleviate some of that issue but not for longer IO delays. Miou.call is a bad idea as it rely on the existence of more than 1 domain and would fail on single core processors, so i think using an IO queue using Thread (maybe one per domain) for non-blocking IO, is probably the best solution for this. However this adds complexity so maybe Miou.yield is a reasonable middle-ground for now. I've updated miou_io to reflect that.

In any case, having some sort of documentation explaining all this in Miou_unix would be highly appreciated. I don't think i'm going the only one with that sort of question when porting programs from lwt.

dinosaure commented 1 week ago

i think using an IO queue using Thread (maybe one per domain) for non-blocking IO, is probably the best solution for this.

That's exactly the design I have in mind and it's probably (along with discussions with other people), the most interesting design. At this stage, we could improve Miou_unix (Miou doesn't need changing) and, indeed, spawn one thread per domain in order to manage these syscalls, which can be long. But this would require quite a lot of work - moreover, I'd be more in favour of proposing a new module like Miou_thread rather than modifying Miou_unix, which has the advantage of being very simple.

In any case, having some sort of documentation explaining all this in Miou_unix would be highly appreciated.

The documentation is available here: #43. If you have any comments or improvements.