Incompatibility of Rust's stdlib with Coroutines

rust-lang / rust

Empowering everyone to build reliable and efficient software.

https://www.rust-lang.org

Other

97.67k stars 12.63k forks source link

Incompatibility of Rust's stdlib with Coroutines #33368

Closed lhecker closed 4 years ago

lhecker commented 8 years ago

The issue

thread_local! is used in the stdlib, which does not work well with Coroutines. Repeated access to a TLS variable inside a method might be cached and optimized by LLVM. If a coroutine is transferred from one thread to another this can lead to problems, due to a coroutine still using the cached reference to the TLS storage from the previous thread.

What is not the issue

TLS being incompatible with coroutines for the most part (e.g. here) is well known and not an issue per se. You want to use rand::thread_rng() with coroutines? Just use rand::StdRng::new() instead! Most of the time it's just quite easy to circumvent the TLS by simply using something different. This is not true for the stdlib though. One way or the other you're using it somewhere probably.

Possible solutions

Add a option akin to RFC 1513 to replace the stdlib with one with a builtin "libgreen". This might actually be much more practicable than it sounds at first, since the overhead of implementating this is not much larger than of the other options.
Add a option akin to RFC 1513 to control inlining of TLS access at compile time.
Make it possible to hook into thread_local!. I think that this could be hard to achieve in a performant way though.
Reduce the usage of TLS inside the stdlib and instead let crates use it as they please. Panic and unwind semantics could for instance be changed to match C++. This would obviate PANIC_COUNT and it's wonky implementation and still make entirely sure that a stack is unwound twice. Other uses of TLS inside the stdlib could be wrapped inside inline(never) without causing large overheads.
Possibly some other way? I read that TLS variables are rendered as globals inside LLVM. If we mark the suspension function as "can write to any memory location", we could make LLVM stop caching the TLS access...

I hope we can find a solution for this as this is really a huge problem for using stackful coroutines with Rust and who doesn't want "Go" but with Rust's syntax, eh? :wink:

gnzlbg commented 8 years ago

So I just checked and in C++'s Coroutines Technical Specification reading a thread_local from a coroutine returns the value that the thread_local has on the thread currently running the coroutine. It is defined behavior.

And no, thread_local variables are not volatile. What happens is the following.

You initiate the coroutine on a particular thread. The coroutine runs on that thread until its first suspension point. Then it gets suspended.

When you resume the coroutine (in whatever thread you decide to do so), resuming the coroutine is just a function call that calls the system scheduler. The system scheduler then "possibly" migrates the coroutine to a different thread, which resumes the coroutine by calling a function that continues after the suspension point of the coroutine. When the coroutine after this point reads a thread_local variable, it reads the variable from the current thread running the coroutine (in C++ that value might not be initialized in that thread, so doing so might introduce undefined behavior due to a read of uninitialized memory, but not due to a read of a thread_local variable per se).

For this to work the compiler only needs to avoid caching / reordering reads of thread_local variables across suspension points in a coroutin, volatile is not needed. C++ coroutines are actually implemented as state machines internally.

zonyitoo commented 8 years ago

It seems that someone is going to add coroutine support directly to LLVM, which means that it is possible to tell LLVM not to inline TLS calls between context swaps.

https://internals.rust-lang.org/t/llvm-coroutines-to-bring-awarness

pcwalton commented 7 years ago

I think that Rust should never support M:N goroutines as Go implements them. This was decided a long time ago.

Kernel-assisted UMS-style solutions should be fine, however.

lhecker commented 7 years ago

@pcwalton Your comment literally left @zonyitoo and me speachless... First the stdlib is deliberately crippled to make it impossible to implement userland coroutines and then it's made impossible to add the functionality back. Oh man...

P.S.: Just call them suspend-down coroutines. P.P.S.: x64 userland context switching takes consistently about 7ns. Not 179ns like LPC. Or even in the order of µs. Rust's implementation back then was laughable at best and stands to no comparison.

zonyitoo commented 7 years ago

@pcwalton I can totally understand the reason Rust's team does not want to support coroutines officially. But could you please open a door for us to give it a try? Or could you please give us a chance to implement anything like tokio for comparison?

I admit futures-rs is a big step in Rust world for asynchronous programming, but I am sure you can see that not everyone prefers this callback style. Both coroutines and futures require lots of work for building the supporting libraries, such as tokio. Why not just give a shot? Is it because the last libgreen implementation made you feel coroutines will run very slow? Could you please take a glance at coio-rs? It is now stuck in a very early stage due to this issue, but works fine for some benchmarks.

Manishearth commented 7 years ago

To be clear, the door isn't closed to non-callback style. A lot of folks do want JS-style generators in the language. That, and/or async/await syntax, may happen to make this easier. Given that the ecosystem is converging on tokio and futures, that's probably the direction that we'll take.

@pcwalton's comment was about the Go model specifically. The Go model does have plenty of costs associated with it that all programs will have to pay. Rust can solve the same problems without implementing the Go model. I personally am hoping for generator syntax or async/await to clean this up.

The door is open here; you need a better proposal than "stop using TLS in the standard library" (or "mark all TLS as !Send", which is not backwards compatible). This is very similar to the issues we had with libgreen in the first place; folks had to pay an extra cost for it even if they didn't need green threads, which is antithetical to Rust's zero-cost-abstraction philosophy. Something like Brian's proposal would work (https://github.com/rust-lang/rust/issues/33368#issuecomment-224134991). There are other proposals in this thread (some of them yours) that might work as well. I suggest making a comment listing all the viable proposals with their pros and cons, and perhaps making a discussion post on internals.rust-lang.org to figure out what folks like best. Then, make an RFC.

(Discussion of proposals on Rust issues rarely gets anywhere, Rust issues do not have that kind of visibility. This issue tracker is for tracking implementation work that needs to be done on rustc itself, where the user-facing design decisions have already been made.)

lhecker commented 7 years ago

@pcwalton's comment was about the Go model specifically. The Go model does have plenty of costs associated with it that all programs will have to pay.

That's the point I don't get. You can't implement 1:1 scheduling on top of N:M anyways, so the discussion if Go's model is fit for Rust is out of the window anyways. This is only about N:M scheduling and coroutines specifically and can be implemented as a library on top of 1:1 scheduling without hurting the performance of anyone else whatsoever.

The door is open here; you need a better proposal than "stop using TLS in the standard library" (or "mark all TLS as !Send", which is not backwards compatible).

@Manishearth Can you give me an idea what that might be? The Rust stdlib comes prebuilt, which makes it impossible to fix the TLS problem in a way that's comfortable for Rust users. Literally the only thing I can imagine is to make the TLS part pluggable like the allocator has been made (which is one thing Brian recommended right?). I mean what else is possible? Nothing right? But if you make TLS usage in the stdlib pluggable you might as well provide an entire alternative stdlib, providing a "Go-like" environment, because that'd be ironically easier to implement and to use. And in that case I already got you covered here.

Is there possibly anything I missed (seriously)?

P.S.: I'm sorry, but I have to insist on it, because I don't want to deal with the negative connotation associated with Go here… "Go's model" is commonly called the suspend-down version of coroutines. This can be identified by methods not having an explicit declaration of them being async. Suspension also happens seemingly automatically inside the called method. Furthermore it's possible to have specialized user-land schedulers here. The opposite - "async/await" - is called suspend-up, because here you explicitly suspend the callee (await) before executing the async action. This can be implemented as "zero overhead", because suspend-up coroutines can be 1:1 translated by the compiler to simple state machines (which allows function inlining). Since it's zero-overhead I believe this has a proper right to be officially supported by Rust. The "Goroutine" on the other hand is really just a marketing term and nothing new.

Manishearth commented 7 years ago

Can you give me an idea what that might be?

You could possibly have your own marker trait that works similar to Send but bounds the closures used in your coroutine library, and make sure it's not implemented on the TLS keys. This marker trait could be part of the stdlib. It's restrictive, but a solution (I'm not sure if it's actually feasible, just an idea).

A pluggable TLS is a viable solution, and you can try to flesh that out into a pre-RFC. The problem with an "alternate" stdlib is that it can end up being incompatible with large parts of the ecosystem, which we don't want.

Even a flag for volatile TLS sounds like it could work here, though I'm not sure.

The change has to be one that can't affect existing libraries, and if it's a flag or pluggable solution the only effect it can have on existing libraries is a performance difference. That's the standard anything of this form has to go through.

There are probably other solutions this thread hasn't explored.

steveklabnik commented 5 years ago

Triage: we have generators as an experimental RFC, implemented in nightly.

lhecker commented 4 years ago

Should we close this issue for now?

There haven't been any comments for a long while
async/await is stable (and it's hella nice 😍)
I don't want to clutter this poor project with my poorly crafted issues 😄

What do you think?