graphiti-api / graphiti

Stylish Graph APIs
https://www.graphiti.dev/
MIT License
960 stars 138 forks source link

Add before_sideload hook #371

Closed jhnvz closed 3 months ago

jhnvz commented 2 years ago

This PR adds a configurable before_sideload hook. This gives us the ability to restore the global state in sideload threads. We use a gem called ActsAsTenant (https://github.com/ErwinM/acts_as_tenant) to make sure we scope queries by a tenant, not having global state available breaks all our sideloads. This is how you can restore global state for existing application code or third-party dependencies:

Graphiti.configure do |c|
  c.before_sideload = proc do |context|
    ActsAsTenant.current_tenant = context[:company]
  end
end

I was not sure how to properly write a test for this new behavior (in the current setup) as I'm not seeing how to perform actual sideloads in a different thread.

jhnvz commented 2 years ago

I just stumbled upon another issue. We use the new multiple database feature of rails 6. In the following example all queries are pointed to the primary database (instead of the replica):

Graphiti.configure do |c|
  c.before_sideload = proc do |context|
    puts ActiveRecord::Base.current_role
  end
end

ActiveRecord::Base.connected_to(role: :reading) do
  SomeResource.all(include: 'other_resource')
end

# => :writing

We could solve this by implementing something that's being called around sideloading (middleware). An interface could look something like this:

connection_handler = proc do |context, &block|
  ActiveRecord::Base.connected_to(role: context[:database_role]) { block.call }
end

Graphiti.configure do |c|
  c.sideload_middleware do |chain|
    chain.add(connection_handler)
  end
end

Graphiti::Rails can add the middleware by default so we stay compatible with Rails out of the box. I'm still not sure how and when to store connection data in Graphiti::Context and I still need to look at the internals of ActiveRecord as they also support sharding and defining/nesting model-specific roles. Before doing so, I'd love to hear your guys' thoughts/take on this.

richmolj commented 2 years ago

Thanks for this @jhnvz ❤️ ! I certainly agree with your middleware proposal. My biggest thought is this type of connection stuff is probably already happening in Rails (or maybe puma) middleware. Maybe take a look and we'll copy what they do? Would be a huge help!

jhnvz commented 2 years ago

@richmolj Thanks for the fast response ✌️ and advice! After doing some research I figured that adding middleware is overkill, and the huge downside of using a middleware strategy is that you have to implement middleware for application code and each third-party library that depends on state in threads (https://github.com/jhnvz/graphiti/commit/846b9b192c4f16debbb74823a2e67d026667f12b Yuck..).

This is what Puma does (single mode with multiple threads):

preload application code
\_ AR sets up connection pool (stored on class instead of thread)
   \_ thread 1 (clean locals)
      \_ We call `.connected_to(role: :reading)`, state gets stored in thread
         \_ Calling `connection.execute` connects with correct connection from handler
   \_ thread 2 (clean locals)
      \_ ...

Puma in clustered mode takes a fork from application code, reloading application code, thus setting up a new connection pool in the ActiveRecord class (instead of a thread).

This is what's happening in our case:

preload application code
\_ AR sets up connection pool (stored on class instead of thread)
   \_ thread 1 (clean locals)
      \_ We call `.connected_to(role: :reading)`, state gets stored in thread
         \_ Sideloading occurs \
     __________________________/
    /
    \_ Thread 3 (clean locals)
       \_ Calling `connection.execute` connects with incorrect connection (due to empty state)
    \_ Thread 4 (clean locals)
       \_ ...                       
   \_ thread 2 (clean locals)
      \_ ...

We can fix this by restoring thread local state of the parent thread. Implementation looks like this:

@query.sideloads.each_pair do |name, q|
  sideload = @resource.class.sideload(name)
  next if sideload.nil? || sideload.shared_remote?
  parent_resource = @resource
  graphiti_context = Graphiti.context
+ thread = {}.tap do |hash|
+   Thread.current.keys.each { |key| hash[key] = Thread.current[key] }
+ end
+ thread_variables = {}.tap do |hash|
+   Thread.current.thread_variables.each { |var| hash[var] = Thread.current.thread_variable_get(var) }
+ end  
  resolve_sideload = -> {
+   thread.each_pair { |key, value| Thread.current[key] = value }
+   thread_variables.each_pair { |key, value| Thread.current.thread_variable_set(key, value) }
    Graphiti.context = graphiti_context
    sideload.resolve(results, q, parent_resource)
    @resource.adapter.close if concurrent
  }
  if concurrent
    promises << Concurrent::Promise.execute(&resolve_sideload)
  else
    resolve_sideload.call
  end
end

This is what I was able to find in Thread.current.keys in our app:

[
    [ 0] :_rollbar_notifier,
    [ 1] :puma_server,
    [ 2] :with_force_shutdown,
    [ 3] :current_attributes_instances,
    [ 4] :"attr_ActionText::Content_renderer",
    [ 5] :"ActiveSupport::Cache::Strategy::LocalCache::LocalCacheRegistry",
    [ 6] :request_store_active,
    [ 7] :"activesupport_tagged_logging_tags:94600",
    [ 8] :"ActiveSupport::Notifications::InstrumentationRegistry",
    [ 9] :"ActiveSupport::SubscriberQueueRegistry",
    [10] :_timestack,
    [11] :"ActiveRecord::RuntimeRegistry",
    [12] :"ActiveRecord::ExplainRegistry",
    [13] :request_store,
    [14] :ar_prepared_statements_disabled_cache,
    [15] :"ActiveRecord::Scoping::ScopeRegistry",
    [27] :i18n_config,
    [28] :rescue_registry_context,
    [29] :searchkick_runtime,
    [30] :context,
    [31] :prevent_writes,
    [32] :time_zone
]

And this in Thread.current.thread_variables:

[
    [ 0] : ar_connection_handler
]

As you can see, currently there might be a lot more broken than active record connection handling (timezones, i18n, activesupport notifications). The great advantage of this solution is that we gain compatibility with all code that uses thread state out of the box. When we implement this solution, as far as I understand Graphiti.context, we can deprecate it or solely use it to make controller context available in resources.

I'm still figuring out a way to properly spec/test this new behavior. Any ideas are welcome. Shall I close this PR and start working on a new one?

richmolj commented 2 years ago

Thanks so much @jhnvz ❤️ That was a great breakdown. I agree with your solution and something similar has come up before. I think Graphiti.context should still stick around (controller/action, backwards compat, .with_context, etc). But otherwise makes sense. I wonder if concurrent-ruby has pointers on testing?

jkeen commented 4 months ago

@jhnvz Thanks for your work on this! This will solve an issue I have in a multi-tenant app I have using graphiti.

I didn't quite follow the conversation re: Threads and middleware and wanted to make sure: as it stands now, is the code in this PR ready for merging?

github-actions[bot] commented 3 months ago

:tada: This PR is included in version 1.5.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

Japestrale commented 1 month ago

@jhnvz / @richmolj I'm currently looking to solve the same issue you outlined above with regard to dealing with multiple databases I.e ensuring that sideloads are sent to the right database, rather than always being sent to the primary.

Before I go digging into writing trying to write tests I just wanted to check that no more work was done on the topic?