ruby-concurrency / concurrent-ruby

Modern concurrency tools including agents, futures, promises, thread pools, supervisors, and more. Inspired by Erlang, Clojure, Scala, Go, Java, JavaScript, and classic concurrency patterns.
https://ruby-concurrency.github.io/concurrent-ruby/
Other
5.68k stars 418 forks source link

Officially support requiring subparts of concurrent-ruby, e.g., require 'concurrent/map' #934

Closed eregon closed 1 year ago

eregon commented 2 years ago

Currently, concurrent-ruby loads every file of the gem eagerly (like most gems actually, this is more reliable than manually setting up autoloads everywhere which is error-prone). It would be nice footprint-wise if only the files needed for the usages are loaded.

The best way to do that seems to adopt https://github.com/fxn/zeitwerk. That would mean dropping support for Ruby < 2.5, which should probably be discussed in a separate issue first.

From a discussion with Petr and @chrisseaton.

sambostock commented 2 years ago

I notice adding a runtime dependency on Zeitwerk would require breaking

The design goals of this gem are:

  • ...
  • Remain free of external gem dependencies
eregon commented 2 years ago

That's true, it might be worth adapting that goal though.

byroot commented 2 years ago

The problem with this idea is that you can easily end up in a situation where one of the concurrent-ruby classes is only used at runtime. Assuming we're talking about a web application, that means the code will be loaded during the first request cycle that uses it, defeating Copy on Write (for most MRI setups) and also busting the global constant cache (big deal for MRI right now, may change in the future).

eregon commented 2 years ago

If the goal is to optimize COW fork / avoiding autoloading then one should eagerly load, Zeitwerk has a mode for that: https://github.com/fxn/zeitwerk#global-eager-load Of course that means everything (e.g., all concurrent-ruby classes) will be loaded eagerly, but that seems generally the case for eager loading. So, this should just work if Rails uses Zeitwerk::Loader.eager_load_all in production, isn't it?

byroot commented 2 years ago

My point is that right now, when you use one of the concurrent-ruby component, you explictly require it, which means you get both:

If you were to migrate to Zeitwerk, you'd have to choose between loading things that are not used, vs maybe not eagerloading things.

So I don't think autoloading (be it with zeitwerk or Kernel#autoload) is a good fit for libraries such as concurrent-ruby which brings lots of code but where most people only pick a couple classes in it.

eregon commented 2 years ago

when you use one of the concurrent-ruby component, you explictly require it

That's what is explicitly not supported currently: https://github.com/ruby-concurrency/concurrent-ruby#usage And the reason for that AFAIK is it's pretty hard to load the necessary dependencies and test it for all possibilities (i.e., test that each file can be loaded on its own in a separate process, and all methods defined in the loaded files work).

In my understanding Zeitwerk solves this issue of loading the necessary dependencies automatically. I thought Zeitwerk would still allow e.g. require 'concurrent/map', but I guess that actually doesn't work with Zeitwerk (i.e., it won't take care of loading file dependencies).

I think the autoloading/constant invalidation/COW issue is far more general, it seems very likely an application will load extra classes on the first request, unless everything is eager loaded.

byroot commented 2 years ago

That's what is explicitly not supported currently

Well, somehow it works because I've never seen any code requiring it all.

In my understanding Zeitwerk solves this issue of loading the necessary dependencies automatically.

What Zeitwerk does it to automatically setup Kernel#autoload for all the files inside your lib/ directory, and does so recursively when files are loaded. So yes you can mix it with require.

eregon commented 1 year ago

I guess a common way to support specific requires is to run each of the spec in isolation, i.e., in a separate process, like here, and make the specs require only the files they need. That should help to catch missing requires. It's not as safe or convenient as zeitwerk but it does avoid the dependency.

eregon commented 1 year ago

I started work on this: https://github.com/ruby-concurrency/concurrent-ruby/pull/980 It's not trivial, I already found a cycle:

Currently:

A cycle: AtomicMarkableReference -> Synchronization::Object

Synchronization::Object (synchronization/object.rb) -> AtomicReference (through generated code for #initialize) AtomicReference -> MutexAtomicReference (but only if non-cruby-ext/non-jruby/non-truffleruby) MutexAtomicReference -> Synchronization::LockableObject Synchronization::LockableObject -> MutexLockableObject, JRubyLockableObject, MonitorLockableObject MutexLockableObject -> AbstractLockableObject AbstractLockableObject -> Synchronization::Object