rmosolgo / graphql-ruby

Ruby implementation of GraphQL
http://graphql-ruby.org
MIT License
5.38k stars 1.39k forks source link

Prefer `Fiber#storage` over copying thread locals #5173

Open ioquatix opened 4 days ago

ioquatix commented 4 days ago

I believe this code is extremely risky.

https://github.com/rmosolgo/graphql-ruby/blob/8a21eb17d58902b20867f18b4c25937b75baa830/lib/graphql/dataloader.rb#L80

rmosolgo commented 1 day ago

Hey, thanks for taking a look and sharing your concerns. The goal here is to retain compatibility with code that uses Thread.current.

I just started using Fiber[...] for library internals in https://github.com/rmosolgo/graphql-ruby/pull/5034, and I'm open to migrating GraphQL-Ruby's usage to Fiber storage instead.

I'm also open to making this Thread.current behavior opt-in somehow (instead of default). But could you help me understand the risk you see with it now?

ioquatix commented 1 day ago

Risks:

  1. Context Misalignment: Thread-local variables are tied to a thread's specific operations, and copying them can lead to incorrect behavior in the new thread's context.

  2. Shared Mutable State Risks: Copying mutable thread-local variables can introduce race conditions and data corruption.

  3. Resource Mismanagement: Thread-local variables managing resources like database connections or file handles may be improperly shared or closed.

  4. Framework/Library Assumptions: Frameworks relying on Thread.current for logs, tracing, or error propagation may break or produce incorrect results.

rmosolgo commented 8 hours ago

Oh, I see, thanks for laying those out. In practice, copying the entries from Thread.current.keys has fixed context-related issues (#3366, #3449, #3461, #4993), mostly because other libraries are already using Thread.current[....] for Fiber-scoped variables.

Maybe that's the catch here: Thread.current[...] is actually Fiber-scoped, right? So GraphQL-Ruby spins up new Fibers based on the parent fiber and runs everything on the same Thread. Those new Fibers are managed by GraphQL-Ruby as "children," so logically, passing along context makes sense (at least, it has so far).

Database connections (etc) is an interesting case. Rails is the elephant in the room, and so far, the best approach has been to manually implement context sharing: https://graphql-ruby.org/dataloader/async_dataloader#activerecord-connections

Have you run into real-world issues with copying context like this, in GraphQL-Ruby or elsewhere?

ioquatix commented 5 hours ago

Maybe that's the catch here: Thread.current[...] is actually Fiber-scoped, right?

Yes, and the problems are the same.

Have you run into real-world issues with copying context like this, in GraphQL-Ruby or elsewhere?

It's hard for me to pinpoint exact failures since they are often soft and/or transient (some requests may fail or behave incorrectly).

While not directly related, an example of how context sharing can lead to incorrect execution: https://github.blog/security/vulnerability-research/how-we-found-and-fixed-a-rare-race-condition-in-our-session-handling/

Since it's entirely possible for things like RequestStore to use Thread.current and so on, there are situations where it can become problematic or behave unexpectedly.

There are probably situations where what you are doing is useful (as you've said for compatibility). If you are sure you control all fibers within a given thread, it might be safe.

However, this code makes me extremely uncomfortable, so I strongly advise you to encourage Fiber storage for inheritable state.

rmosolgo commented 4 hours ago

Moving GraphQL-Ruby's own Thread.current usage to Fiber[...] was easy enough, and seems to work out-of-the-box: https://github.com/rmosolgo/graphql-ruby/pull/5176

I'm open to removing this default behavior from GraphQL-Ruby, so I'll keep this issue open until I get to try it out.