rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.61k stars 12.74k forks source link

Why doesn't sync_channel use a more efficient implementation? #41021

Closed ghost closed 6 years ago

ghost commented 7 years ago

The implementation of sync_channel locks the mutex on every send and recv. Is there a reason why Dmitry Vyukov's bounded MPMC queue is not used instead? It should have noticeably better performance than a simple buffer protected by one big mutex.

Dmitry himself experimented with this algorithm for use in Go. The proposal to replace Go's channels with this algorithm ultimately didn't pan out - I believe because they didn't manage to fulfill certain fairness gurantees Go promised without resorting back to mutexes.

This brings me to an interesting safety problem... Dmitry's MPMC queue is actually used in several crates (and I bet @carllerche implemented it multiple times). One of those is tokio-timer. I have a small concern with this algorithm. Please bear with me - you don't have to understand the nitty-gritty details. I just want to show a safety hazard.

The algorithm may run into data races.

(You can skip this part:) The danger lies in this compare-and-swap. Suppose that a thread wants to push a value. Just before the CAS other threads push and pop 2^64 (or 2^32 on 32-bit systems) elements into/from the queue (extremely unlikely!). Then our CAS succeeds and we start writing our value while some other thread may still be writing it's own value to the same slot! The gist of the story is that lower bits in pos are used to index into the buffer, while the upper bits in are used to distinguish between different "laps" around the ring buffer. The number of laps we're keeping track of is limited, of course. We're unlikely to run into a lap conflict, but if we do, data races may happen.

The likelihood of running into a data race is probably comparable to the likelihood of a random bit being flipped in RAM and corrupting your program. I guess. It also depends on the OS scheduler.

While data races may happen in theory, they're unlikely to ever happen in practice. It's really really unlikely. But does that mean we're allowed to expose a safe interface around it? Do such algorithms qualify as "safe" under Rust's safety standards?

As a side note, hazard pointers are usually tagged with version numbers (a similar idea to laps) in order to get around the ABA problem. However, version numbers suffer from the same safety hazard. But they only suffer in theory, never in practice. :)

My questions are:

  1. Why is Dmitry's MPMC queue not used instead of the buffer-behind-mutex approach?
  2. Is there an interest in replacing our sync_channel with Dmitry's MPMC queue?
  3. Can we consider the MPMC implementation in tokio-timer to be safe, even though it's just extremely likely to be safe in practice?
  4. Would it be acceptable to have an algorithm in libstd that is extremely likely to be safe, but not exactly 100% safe in theory?

cc @alexcrichton

alexcrichton commented 7 years ago

I'd personally be totally on board with a different implementation, the current one is just the first one that existed. Using the mpmc queue you mentioned seems great to me!

I think the safety issue is fine in this case, it's similar to the overflowing story for Arc where it's technically unsound but practically impossible to trigger

arthurprs commented 7 years ago

At some point I wanted to improve the current implementation using a VecDeque instead of the Vec<Option> and a use the Two Lock queue algorithm. Your proposal could be faster but complications may arise trying to make it "blocking" and supporting the experimental select macro though.

Regarding the safety of the algorithm I don't have a strong opinion, it's a hard call. In the case of Arc it's possible to detect and abort() at least, I'm unsure we can do the same here.

ghost commented 6 years ago

The idea for faster channels I proposed in April has today become reality. :) https://docs.rs/crossbeam-channel

Comparison of std::sync::mpsc and crossbeam-epoch: https://github.com/crossbeam-rs/crossbeam-channel#comparison-with-stdsyncmpsc

I'm not sure what we should do about std::sync::mpsc, if anything. We could replace the implementation of bounded channels to make them faster, but it'd be a lot of effort for questionable benefit.

Another alternative might be to move the whole crossbeam-channel crate into the standard library, but that would have to pull crossbeam-epoch in as well, since it's a dependency. But this option goes against the principles of the standard library, which strives to be lean and simple.

Closing the issue.