Support Multithread Execution in RSpec

KarthikDot commented 10 years ago

RSpec today does not support parallel execution of tests in the core. There are bunch of gems that allow for parallel execution, but all of these are process based and not thread based.

With JRuby/Rubinius support proper threading today, and MRI being the only major Ruby with a GIL, it might make more sense for RSpec to start supporting threads now. I understand this a major change, and it might just make sense to do it in the timeframes for RSpec 3.0.

Integration Tests are mostly IO Bound, and the process model though works is a needless use of system resources. Threads here could work really well. With support for parallelization out of the box, and the fact that parallel tests will need tests to run in random order, will give a good incentive for people to write good isolated tests and run them much faster.

Here's the barebones API that I am looking at...

describe TestMeInParallel
  run_order :parallel

  it "does something" do 
  end

  it "does something else" do 
  end
end

Each describe block should allow for setting the order, so that folks can slowly start moving parts of their test-suite to parallel execution. In terms of over-all RSpec run order, I'd expect RSpec to do the same thing here, that Minitest does, that is run all the non parallel tests first in the event there's some shared state, and then run the tests that can execute in parallel.

P.S I decided to open this issue, after a quick chat with @myronmarston on Twitter. https://twitter.com/KarthikDot/status/424608731285176320

JonRowe commented 10 years ago

I'm not sure this is something that can happen in the time frame of RSpec 3, as we're mostly finalised on features and I think this would be a bigger job than you realise. We'd have to double check everything is thread safe, which up to now hasn't been a priority. We could certainly pen this in for RSpec 4.

Incidentally you can already specify a run order per describe e.g describe "Thing", order: defined, and there has been work on building custom runners outside RSpec that do this sort of thing, prehaps this could start as an extension gem and then be pulled in if there's enough support. A bit like how rspec-fire has been mostly incorporated into rspec-mocks.

Also as much as MRI is the only interpreter with a GIL, it's also still the most common Ruby to encounter and from my experience the most trouble free. JRuby is fantastic for long running server processes, less so for short lived testing processes.

myronmarston commented 10 years ago

Thanks for keeping this conversation going, @KarthikDot. As @JonRowe said, I don't see us getting this in the 3.0 release but I don't think we'd have to wait for RSpec 4 to do this -- RSpec has always added lots of nice new features in minor releases and I expect this could be done done in some 3.x release.

We don't have the capacity to consider this now (since we're trying to get the 3.0.0.beta2 out, and then on to RC), but feel free to ping us after 3.0 is out if you haven't seen any movement on this front.

KarthikDot commented 10 years ago

@myronmarston @JonRowe Thanks guys! Looking forward to 3.0 :)

rubemz commented 10 years ago

+1 on thread safety

brodock commented 9 years ago

:+1:

shideneyu commented 9 years ago

+1 :+1:

aprescott commented 9 years ago

Since 3.0 is now out, are there more concrete plans to have this be implemented for a 3.x release? Is there any kind of technical summary somewhere outlining the necessary changes for this to be done properly? Just thinking about lowering the barrier to iterating on the implementation.

JonRowe commented 9 years ago

Well someone has to go over things to find out what isn't thread safe first... Off the top of my head the biggest headache is probably going to be the global state sort of expected at RSpec.configuration

myronmarston commented 9 years ago

Since 3.0 is now out, are there more concrete plans to have this be implemented for a 3.x release? Is there any kind of technical summary somewhere outlining the necessary changes for this to be done properly? Just thinking about lowering the barrier to iterating on the implementation.

All that has been discussed is in this thread. But now is a great time to make more concrete plans, now that you've brought it up!

Well someone has to go over things to find out what isn't thread safe first... Off the top of my head the biggest headache is probably going to be the global state sort of expected at RSpec.configuration

Honestly, I'm not concerned about RSpec.configuration at all. Config is generally set before the first example runs, and then remains unchanged for the rest of the process. Mutating config while examples are running is pretty non-standard, and I don't think we need to make RSpec.configuration mutation thread safe. I think it's sufficient to document the multi-threaded mode with a list of caveats/unsupported things and mention RSpec.configuration mutation as one of them.

More generally, I think we need to define what the goal is. Is it to get RSpec in a state where all public RSpec APIs can be used in any fashion while multithreaded example execution is taking place? I think that's a hard goal to achieve, and is unnecessary to ship something useful we can iterate on. Besides; I don't think it's ever fully achievable -- consider that if you use rspec-mocks to stub a class method, it's going to interfere with any other examples running at the same time that expect the method to be unstubbed. This is simply how stubbing a globally accessible object works...it's not solvable by us; it's solvable by the user using better practices (such as dependency injection).

I think a better goal/plan would be:

Ship support for multi-threaded execution. Announce it as an experimental feature in the initial release (much like refinements in Ruby 2.0). Document it with known caveats (such as "RSpec configuration mutation is not supported from examples that are running concurrently").
Audit rspec-core, rspec-mocks and rspec-expectations for thread safety issues. Solve the ones that have easy solutions, but I think it's OK to document the others in our caveat list. (This is much like our :all/:context hooks -- they are documented as not supporting certain things). Ideally, this audit would happen before the initial release of the multi-threaded execution support, but as long as our initial release is marked as experimental I think it's OK to do this after we first ship -- it would just have to be done before we remove the experimental label from the feature.

IMO, what we need to figure out first is:

How we want the multithreaded execution to work.
What API(s) we want to expose for controlling it.
How multithreaded execution interacts with formatters.

I think it's useful to be able to apply multithreaded execution to only part of your spec suite, so I think it's something that we'll want to support via example group metadata to opt-in to it. A big question in my mind is: should it piggy back on :order metadata (e.g. :order => :parallel)? Or be a separate metadata key?

aprescott commented 9 years ago

Besides; I don't think it's ever fully achievable -- consider that if you use rspec-mocks to stub a class method, it's going to interfere with any other examples running at the same time that expect the method to be unstubbed. This is simply how stubbing a globally accessible object works...it's not solvable by us

This might be a totally idiotic thought that doesn't work, but could method stubbing come with some kind of locking? If a thread stubs Post.published, then if a subsequent thread tries to interact with Post by stubbing or calling Post.published (or any other method on Post, potentially?), it would have to first wait for the first thread/example to finish. I'm worried about the amount of sheer complexity this entails, though, assuming it's even realistic.

I think it's useful to be able to apply multithreaded execution to only part of your spec suite, so I think it's something that we'll want to support via example group metadata to opt-in to it.

I'd guess this is the safest way to get something functional out there and tried out.

How we want the multithreaded execution to work. [...] A big question in my mind is: should it piggy back on :order metadata (e.g. :order => :parallel)? Or be a separate metadata key?

How do you see order: :parallel (or any equivalent, like parallel: true) example groups affecting test ordering in relation to the whole suite? For example, if I have random test ordering for my test suite, and there are 100 example groups with 10 of them marked as running in parallel, do the parallel groups run first, together? Or does marking an example as parallel indicate that it can run concurrently with any other group?

chanks commented 9 years ago

Regarding stubbing, another possibility is to opt-in to parallelization for specific specs (or one could set the default for their spec suite as parallelizable and then opt-out for specific specs), and then raise an error if a spec marked as parallelizable tries to stub anything.

myronmarston commented 9 years ago

This might be a totally idiotic thought that doesn't work, but could method stubbing come with some kind of locking? If a thread stubs Post.published, then if a subsequent thread tries to interact with Post by stubbing or calling Post.published (or any other method on Post, potentially?), it would have to first wait for the first thread/example to finish. I'm worried about the amount of sheer complexity this entails, though, assuming it's even realistic.

That's an interesting idea that I hadn't considered. It does sound complicated, though, and I suspect it would affect perf for non-multihreaded cases. On top of that, we can't assume that another thread is rspec-core running another example. Sometimes users write specs that spawn threads, and the spawned thread can't be forced to block while waiting for the example thread to finish -- that would be a deadlock situation.

Regarding stubbing, another possibility is to opt-in to parallelization for specific specs (or one could set the default for their spec suite as parallelizable and then opt-out for specific specs), and then raise an error if a spec marked as parallelizable tries to stub anything.

It's not stubbing per-se that's the problem. Consider that if you instantiate an object (such as an rspec-mocks double) and stub it and use it within that example, there are no parallelism conflicts, since no other examples have access to that locally created object.

The problem is that when you stub a globally accessible object, you're mutating a global object. The more broad problem is global mutation (of which stubbing a globally accessible object is just one form). Consider, for example, that if I'm writing a spec in rspec-core's suite, and I need RSpec.configuration.color? to return false for a particular example, I can either stub it or set RSpec.configuration.color = false (and then reset it after the example). The effect is the same: RSpec.configuration.color? will return false, and any other example that depends upon it returning true will not pass while being run in parallel.

I'm convinced that the broader problem of global mutation in specs is not solvable or preventable by us. Thus, I don't think we should even attempt to do anything for the "stubbing a global object" issue.

How do you see order: :parallel (or any equivalent, like parallel: true) example groups affecting test ordering in relation to the whole suite? For example, if I have random test ordering for my test suite, and there are 100 example groups with 10 of them marked as running in parallel, do the parallel groups run first, together? Or does marking an example as parallel indicate that it can run concurrently with any other group?

These are great questions. First off, I think we would only support marking example groups as parallel, not individual examples. The way RSpec runs the suite is to delegate to teach of the top-level example groups, which in turn run their direct examples and also delegate to nested example groups, etc. Given that...there's really no way to run a single example in parallel with anything else when the group itself is not parallel.

Here's how I'm imagining this working:

At the top level, RSpec would partition its top-level groups into the parallel and non-parallel ones. It would then run all the parallel groups concurrently.
Once all the parallel groups are done, it would run the non-parallel ones. Alternately, we could run the non-parallel first but on non-GILd rubies (e.g. JRuby and RBX) the parallelism could provide a speed up on multi-core machines (which is pretty much everyone's machines these days...) so it seems desirable to run the parallel specs first.

One consequence of this is that once a group has been marked as parallel, we can't support anything nested under that to be marked as non-parallel, because there may be other global groups running at the same time. Ideally, whatever API we have for this would make that clear, but doing so with metadata (which is desirable for consistency with the rest of RSpec) is tricky. I'll have to think more about it.

One other thing: I imagine we're going to want to have a pool of N worker threads, where N is either configurable or inferred based on how many cores you have, or both. And then each parallel group would put "jobs" -- nested example or groups -- onto the worker queues and the threads would pop jobs and process them. That way we keep the number of worker threads constant even as we fan out recursively to all the groups.

e2 commented 9 years ago

TL;DR - sadly, I believe it's too detrimental to consider at all (list of issues/alternatives given)

Just my 2 cents:

There will usually be hundreds of tests (and tens of files at least), while there will only be a few cores available - so running simultaneously a handful of rspec processes (each with a portion of all the tests) is as fast as you can get anyway - by contrast, "parallel" specs running under one roof (many threads in one process) will always be slower (if we ignore startup time).
If tests are really slow, that's often due to IO and heavy resource usage (e.g. databases, file systems, bandwidth, RAM) - and those resources may be more limited than cores anyway (e.g. database in Rails). The best solution is mocks (especially with mock verification) and/or adding points of indirection to avoid consuming those resources. Otherwise, just as you'd manage threads and worker queues, you'd have to manage every other resource. Since slow tests are mostly happening "outside" Ruby, it may make sense to only have those slow tests running in a "side" thread. E.g. each full-stack acceptance spec could run remotely on the first available warmed-up docker container from a pool.
Tests and threads and locks are hard. Add mocks and metaprogramming and random failures (--seed won't help and some failures will now become "hardware-dependent") and bugs in JRuby and Rubinius ... and it's probably not worth all dealing within one framework. The extra time spent debugging will quickly nullify any gains from parallel threads. The more locks are added, the slower the tests will be, while "not enough locks" means random errors. (Not to mention how many Ruby platform, interpreter and version specific patches may be needed to get things working). Turn on parallel execution and getting a few failures may mean spending lots of hours just complicating the tests (no benefits related to the app itself) to make them work. This means debugging tests instead of debugging the code being tested. Logging and tracing can also get messed up or become a lot more complicated than necessary.
"Smart workflows", "selective testing" and breaking up projects into smaller gems will pay off faster. Travis builds take long not because there are unused cores - but because there's simply so much work to do anyway. I'd guess statistically most runs involve at least one failure - and since human beings can't fix multiple issues in parallel - then having the RSpec "intelligently" discover the error by postponing tests "likely to succeed" will provide a much bigger productivity boost. And of course such features are much easier to implement without having to worry about breaking parallel testing.
RSpec maintenance costs - if people are going to have problems with this, it's going to be quite hard to understand and reproduce issues, forget fixing them - especially with things going wrong in before/after blocks.
For thread safety, you need exclusivity or independence. If you have exclusivity, there's no gain from parallelism. If you have independence - there's no need to run all the tests (unless you change all the methods in the codebase or global states/names).
Parallelizing code may actually get in the way of JRuby's current and future optimizations (I'm not sure about this, though).
Often times tests in projects out there won't work if you run them one-by-one (e.g. for i in spec/**/*_spec.rb; do rspec $i; done).

I feel like the feature is exciting, but not really as practical as I'd wish it was. As an analogy - faster bandwidth does not necessarily make people more productive (no matter how true that seems at first).

I fear this would actually be detrimental (the number of test cases may just explode - while handling the corner cases and race conditions in the tests framework). And there would be a false sense of security - that having tests running in parallel is at all related to the tested code being thread safe.

I'd argue that instead of faster tests we need better libraries/foundations. And better libraries are going to happen as a result of better tools, not "faster tools".

On the flipside, I'd say that there's a common denominator between managing jobs for worker queues and workflow-related features ("smart selective testing", minimal-reruns based on coverage, realtime failure reporting, etc.).

JonRowe commented 9 years ago

This is mostly about being thread safe so we can be used in a threaded environment, not necessarily about parallel execution ourselves. Theres no harm in general in being thread safe with our code even if we don't go down the threaded runner route.

e2 commented 9 years ago

@JonRowe -

being thread safe so we can be used in a threaded environment

I have no idea what that means in practice. If you take away parallel execution, where's the need for thread safety? Single thread = thread safe.

Even if e.g. a user uses RSpec functionality (expect, etc.) from within different threads (which they created within a single test) - that seems like testing RSpec's internal thread safety for no reason (instead of testing their own code). It's up to the user to synchronize/join threads/queue calls - so those called from the main thread.

I don't see a reason for RSpec to be internally thread safe if we take parallel execution out of the picture.

JonRowe commented 9 years ago

We're designed to be extensible, so we should be good citizens, being thread safe if it doesn't effect our performance unduly is just being nice.

dogweather commented 8 years ago

being thread safe so we can be used in a threaded environment

I have no idea what that means in practice. If you take away parallel execution, where's the need for thread safety? Single thread = thread safe.

I have a concrete use case for thread safety: I run custom matchers which test web apps via the network. I use Sidekiq (thread-based queuing) to run many tests in parallel. This is slow-I/O-bound, and so I run as many workers as I can, constrained only by RAM in the machine — not CPU cores.

Now, since RSpec isn't thread-safe, each of these threaded workers must start up and shell out to a Ruby/RSpec process. And so I can only run about 10 workers concurrently on a typical 512MB Heroku Dyno. But if RSpec were thread-safe, I could increase the concurrency many times over.

I realize this is not a common use case. It is, though, one more datapoint in favor of 'thread-safety'. IMO heading towards thread-safety would be a good goal: avoiding global state and favoring immutability and idempotence are good practices.

@e2 @JonRowe

thedrow commented 8 years ago

@e2 Regarding parallelizing tests using threads: Since in MRI Ruby at least parallel execution is only available when the GIL is released (that is when I/O happens) having a setup of N processes and M threads is very useful for test suites that are I/O intensive. It's pointless to wait for database queries to complete while other tests could be run.

e2 commented 8 years ago

@thedrow - you can simply use parallel_tests or something. This is better, since each process can have it's own database. You're not worried by RAM here, so multiple processes is not your constraint.

@dogweather - you want threads because of RAM savings. Fair enough. It's a valid use case for me. The only reason it's not typical is because usually integration tests are frequently run on powerful desktops.

The two "must have threads" cases for me are: not enough RAM and process startup overhead.

thedrow commented 8 years ago

I still think that the tests will run faster. We can try to compare with MiniTest.

dogweather commented 8 years ago

@thedrow I just happened to read this, Exploring Minitest Concurrency, which found pretty substantial gains in JRuby and Rubinius.

JonRowe commented 8 years ago

Please note our goal here is to be thread safe in order to enable other people to parallelise RSpec for the moment, were not talking about making RSpec run parallel itself just yet.

dogweather commented 8 years ago

Right. So if I understand it, RSpec has the makings of being able to run multiple times w/in one Ruby VM. runner.rb creates a new Runner instance to handle each run.

def self.run(args, err=$stderr, out=$stdout)
  ...
  new(options).run(err, out)
end

Maybe I actually could get it working for my scenario. I'll give it another try.

manther commented 8 years ago

So whats the status on becoming thread safe?

JonRowe commented 8 years ago

No change, if you'd like to take up the mantle go for it!

xaviershay commented 7 years ago

We're designed to be extensible, so we should be good citizens, being thread safe if it doesn't effect our performance unduly is just being nice.

I agree with this in theory, though suspect that in practice @e2's concerns may be more pragmatic - the cost of retrofitting thread safety on such a large project may be prohibitive. I also worry about stuff like #2064.

Leaving open because I'd love someone to prove me wrong :)

zoras commented 5 years ago

Back linking https://github.com/rspec/rspec-rails/issues/2104

Add support for Rails 6 built-in Parallel Testing

Use Rails 6 built-in parallelizer API for running parallel RSpec executors

https://edgeguides.rubyonrails.org/testing.html#parallel-testing

https://github.com/rails/rails/pull/31900#issuecomment-374299402

fdr commented 3 years ago

Going to bump this, mostly because the "Ractor" features in Ruby 3 might make such a feature considerably cheaper.

rspec / rspec-core

Support Multithread Execution in RSpec #1254

Add support for Rails 6 built-in Parallel Testing