ocaml-multicore / kcas

Software Transactional Memory for OCaml
https://ocaml-multicore.github.io/kcas/
ISC License
109 stars 11 forks source link

provide an implementation of Stdlib.Semaphore.Counting and Stdlib.Semaphore.Binary #190

Closed UnixJunkie closed 2 months ago

UnixJunkie commented 9 months ago

If you stick to the same interface, then people can easily switch between the two implementations.

https://v2.ocaml.org/api/Semaphore.Counting.html

https://v2.ocaml.org/api/Semaphore.Binary.html

You said you can be much more efficient, right?

polytypic commented 9 months ago

You said you can be much more efficient, right?

No. I don't think I said quite that.

What I did say is that the Stdlib semaphores aren't very fast, and that they don't work nicely with effects based schedulers, and that we already have lock-free queues written in OCaml that are faster than the Stdlib semaphores.

I also gave an example of how one could implement a semaphore using Kcas. Here is the example for future reference:

module Binary_semaphore : sig
  type t
  val make : bool -> t
  val release : t -> unit
  val acquire : ?timeoutf:float -> t -> unit
  val try_acquire : t -> bool
end = struct
  type t = [ `Available | `Unavailable ] Loc.t
  let make available =
    Loc.make ~padded:true (if available then `Available else `Unavailable)
  let release semaphore = Loc.set semaphore `Available
  let acquire ?timeoutf semaphore =
    Loc.modify ?timeoutf semaphore @@ fun state ->
    if state == `Available then `Unavailable else Retry.later ()
  let try_acquire semaphore =
    Loc.compare_and_set semaphore `Available `Unavailable
end

I gave that example to demonstrate that you can use Kcas for synchronization like you use traditional synchronization primitives. I also explicitly said that the semaphore implementation might not be the best possible. I said that because I know that there are many trade-offs related to the implementation of such synchronization primitives. The simple implementation I gave likely isn't too bad under low contention, but it will definitely be possible to implement semaphores with (both) lower overhead and better contention handling.

However, I didn't give that example of a semaphore to argue that you should implement semaphores using Kcas. That would basically be an example of abstraction inversion. That was not my point. The point was:

STM makes it possible to implement the communication and synchronization patterns that the problem at hand requires.

There aren't (m)any problems where a semaphore, in particular, is required. Yes, you can solve many problems with semaphores, but you can also solve those same problems using different abstractions. As an example, see The dining philosophers problem solved using Mvars instead of (binary) semaphores. And, no, I'm not suggesting that you should use Mvars to solve every problem. As another example, see The sleeping barbers problem that is currently not in main as it awaits a proper bounded queue implementation. And, TBH, I generally am not a big fan of toy examples like that, but I can't write examples based on actual problems people haven't yet encountered and asked about.

More to the point, while you definitely can implement replacements for traditional synchronization primitives with Kcas, I would not recommend that. Also, I would not recommend adding such replacements to the Kcas or Kcas_data libraries. The main reason for that is that traditional synchronization primitives do not generally compose in a way that would allow them to work nicely with other transactional abstractions. I talk about fundamentally the same issue in my blog post on Kcas here.

OTOH, providing examples of traditional synchronization primitives implemented using Kcas is fine, IMO, as long as one mentions the caveats (e.g. they are not solving an actual application level problem, they do not compose, you can implement them better without the abstraction inversion).

polytypic commented 2 months ago

Picos comes with replacements for the Stdlib Mutex, Condition, and Semaphore modules (amond some other Stdlib replacements). Those implementations are currently optimized for memory use (i.e. minimize memory use) and low contention use cases and in those cases they outperform the Stdlib versions. The implementations are also specifically designed for effects based schedulers. In high contention scenarios the current implementations in Picos will not perform ideally. For such use cases I would recommend A) rethinking the architecture to avoid contention (locks will not give you parallel scalability — at best you won't degrade as badly under contention), B) using high performance lock-free data structures (including Kcas) instead of locks, C) implementing separate locking primitives optimized for high contention.