Open KarthikDot opened 10 years ago
I'm not sure this is something that can happen in the time frame of RSpec 3, as we're mostly finalised on features and I think this would be a bigger job than you realise. We'd have to double check everything is thread safe, which up to now hasn't been a priority. We could certainly pen this in for RSpec 4.
Incidentally you can already specify a run order per describe
e.g describe "Thing", order: defined
, and there has been work on building custom runners outside RSpec that do this sort of thing, prehaps this could start as an extension gem and then be pulled in if there's enough support. A bit like how rspec-fire
has been mostly incorporated into rspec-mocks
.
Also as much as MRI is the only interpreter with a GIL, it's also still the most common Ruby to encounter and from my experience the most trouble free. JRuby is fantastic for long running server processes, less so for short lived testing processes.
Thanks for keeping this conversation going, @KarthikDot. As @JonRowe said, I don't see us getting this in the 3.0 release but I don't think we'd have to wait for RSpec 4 to do this -- RSpec has always added lots of nice new features in minor releases and I expect this could be done done in some 3.x release.
We don't have the capacity to consider this now (since we're trying to get the 3.0.0.beta2 out, and then on to RC), but feel free to ping us after 3.0 is out if you haven't seen any movement on this front.
@myronmarston @JonRowe Thanks guys! Looking forward to 3.0 :)
+1 on thread safety
:+1:
+1 :+1:
Since 3.0 is now out, are there more concrete plans to have this be implemented for a 3.x release? Is there any kind of technical summary somewhere outlining the necessary changes for this to be done properly? Just thinking about lowering the barrier to iterating on the implementation.
Well someone has to go over things to find out what isn't thread safe first... Off the top of my head the biggest headache is probably going to be the global state sort of expected at RSpec.configuration
Since 3.0 is now out, are there more concrete plans to have this be implemented for a 3.x release? Is there any kind of technical summary somewhere outlining the necessary changes for this to be done properly? Just thinking about lowering the barrier to iterating on the implementation.
All that has been discussed is in this thread. But now is a great time to make more concrete plans, now that you've brought it up!
Well someone has to go over things to find out what isn't thread safe first... Off the top of my head the biggest headache is probably going to be the global state sort of expected at
RSpec.configuration
Honestly, I'm not concerned about RSpec.configuration
at all. Config is generally set before the first example runs, and then remains unchanged for the rest of the process. Mutating config while examples are running is pretty non-standard, and I don't think we need to make RSpec.configuration
mutation thread safe. I think it's sufficient to document the multi-threaded mode with a list of caveats/unsupported things and mention RSpec.configuration
mutation as one of them.
More generally, I think we need to define what the goal is. Is it to get RSpec in a state where all public RSpec APIs can be used in any fashion while multithreaded example execution is taking place? I think that's a hard goal to achieve, and is unnecessary to ship something useful we can iterate on. Besides; I don't think it's ever fully achievable -- consider that if you use rspec-mocks to stub a class method, it's going to interfere with any other examples running at the same time that expect the method to be unstubbed. This is simply how stubbing a globally accessible object works...it's not solvable by us; it's solvable by the user using better practices (such as dependency injection).
I think a better goal/plan would be:
:all
/:context
hooks -- they are documented as not supporting certain things). Ideally, this audit would happen before the initial release of the multi-threaded execution support, but as long as our initial release is marked as experimental I think it's OK to do this after we first ship -- it would just have to be done before we remove the experimental label from the feature.IMO, what we need to figure out first is:
I think it's useful to be able to apply multithreaded execution to only part of your spec suite, so I think it's something that we'll want to support via example group metadata to opt-in to it. A big question in my mind is: should it piggy back on :order
metadata (e.g. :order => :parallel
)? Or be a separate metadata key?
Besides; I don't think it's ever fully achievable -- consider that if you use rspec-mocks to stub a class method, it's going to interfere with any other examples running at the same time that expect the method to be unstubbed. This is simply how stubbing a globally accessible object works...it's not solvable by us
This might be a totally idiotic thought that doesn't work, but could method stubbing come with some kind of locking? If a thread stubs Post.published
, then if a subsequent thread tries to interact with Post
by stubbing or calling Post.published
(or any other method on Post
, potentially?), it would have to first wait for the first thread/example to finish. I'm worried about the amount of sheer complexity this entails, though, assuming it's even realistic.
I think it's useful to be able to apply multithreaded execution to only part of your spec suite, so I think it's something that we'll want to support via example group metadata to opt-in to it.
I'd guess this is the safest way to get something functional out there and tried out.
How we want the multithreaded execution to work. [...] A big question in my mind is: should it piggy back on
:order
metadata (e.g.:order => :parallel
)? Or be a separate metadata key?
How do you see order: :parallel
(or any equivalent, like parallel: true
) example groups affecting test ordering in relation to the whole suite? For example, if I have random test ordering for my test suite, and there are 100 example groups with 10 of them marked as running in parallel, do the parallel groups run first, together? Or does marking an example as parallel indicate that it can run concurrently with any other group?
Regarding stubbing, another possibility is to opt-in to parallelization for specific specs (or one could set the default for their spec suite as parallelizable and then opt-out for specific specs), and then raise an error if a spec marked as parallelizable tries to stub anything.
This might be a totally idiotic thought that doesn't work, but could method stubbing come with some kind of locking? If a thread stubs
Post.published
, then if a subsequent thread tries to interact with Post by stubbing or callingPost.published
(or any other method onPost
, potentially?), it would have to first wait for the first thread/example to finish. I'm worried about the amount of sheer complexity this entails, though, assuming it's even realistic.
That's an interesting idea that I hadn't considered. It does sound complicated, though, and I suspect it would affect perf for non-multihreaded cases. On top of that, we can't assume that another thread is rspec-core running another example. Sometimes users write specs that spawn threads, and the spawned thread can't be forced to block while waiting for the example thread to finish -- that would be a deadlock situation.
Regarding stubbing, another possibility is to opt-in to parallelization for specific specs (or one could set the default for their spec suite as parallelizable and then opt-out for specific specs), and then raise an error if a spec marked as parallelizable tries to stub anything.
It's not stubbing per-se that's the problem. Consider that if you instantiate an object (such as an rspec-mocks double
) and stub it and use it within that example, there are no parallelism conflicts, since no other examples have access to that locally created object.
The problem is that when you stub a globally accessible object, you're mutating a global object. The more broad problem is global mutation (of which stubbing a globally accessible object is just one form). Consider, for example, that if I'm writing a spec in rspec-core's suite, and I need RSpec.configuration.color?
to return false
for a particular example, I can either stub it or set RSpec.configuration.color = false
(and then reset it after the example). The effect is the same: RSpec.configuration.color?
will return false
, and any other example that depends upon it returning true
will not pass while being run in parallel.
I'm convinced that the broader problem of global mutation in specs is not solvable or preventable by us. Thus, I don't think we should even attempt to do anything for the "stubbing a global object" issue.
How do you see
order: :parallel
(or any equivalent, likeparallel: true
) example groups affecting test ordering in relation to the whole suite? For example, if I have random test ordering for my test suite, and there are 100 example groups with 10 of them marked as running in parallel, do the parallel groups run first, together? Or does marking an example as parallel indicate that it can run concurrently with any other group?
These are great questions. First off, I think we would only support marking example groups as parallel, not individual examples. The way RSpec runs the suite is to delegate to teach of the top-level example groups, which in turn run their direct examples and also delegate to nested example groups, etc. Given that...there's really no way to run a single example in parallel with anything else when the group itself is not parallel.
Here's how I'm imagining this working:
One consequence of this is that once a group has been marked as parallel, we can't support anything nested under that to be marked as non-parallel, because there may be other global groups running at the same time. Ideally, whatever API we have for this would make that clear, but doing so with metadata (which is desirable for consistency with the rest of RSpec) is tricky. I'll have to think more about it.
One other thing: I imagine we're going to want to have a pool of N worker threads, where N is either configurable or inferred based on how many cores you have, or both. And then each parallel group would put "jobs" -- nested example or groups -- onto the worker queues and the threads would pop jobs and process them. That way we keep the number of worker threads constant even as we fan out recursively to all the groups.
TL;DR - sadly, I believe it's too detrimental to consider at all (list of issues/alternatives given)
Just my 2 cents:
rspec
processes (each with a portion of all the tests) is as fast as you can get anyway - by contrast, "parallel" specs running under one roof (many threads in one process) will always be slower (if we ignore startup time).--seed
won't help and some failures will now become "hardware-dependent") and bugs in JRuby and Rubinius ... and it's probably not worth all dealing within one framework. The extra time spent debugging will quickly nullify any gains from parallel threads. The more locks are added, the slower the tests will be, while "not enough locks" means random errors. (Not to mention how many Ruby platform, interpreter and version specific patches may be needed to get things working). Turn on parallel execution and getting a few failures may mean spending lots of hours just complicating the tests (no benefits related to the app itself) to make them work. This means debugging tests instead of debugging the code being tested. Logging and tracing can also get messed up or become a lot more complicated than necessary.for i in spec/**/*_spec.rb; do rspec $i; done
).I feel like the feature is exciting, but not really as practical as I'd wish it was. As an analogy - faster bandwidth does not necessarily make people more productive (no matter how true that seems at first).
I fear this would actually be detrimental (the number of test cases may just explode - while handling the corner cases and race conditions in the tests framework). And there would be a false sense of security - that having tests running in parallel is at all related to the tested code being thread safe.
I'd argue that instead of faster tests we need better libraries/foundations. And better libraries are going to happen as a result of better tools, not "faster tools".
On the flipside, I'd say that there's a common denominator between managing jobs for worker queues and workflow-related features ("smart selective testing", minimal-reruns based on coverage, realtime failure reporting, etc.).
This is mostly about being thread safe so we can be used in a threaded environment, not necessarily about parallel execution ourselves. Theres no harm in general in being thread safe with our code even if we don't go down the threaded runner route.
@JonRowe -
being thread safe so we can be used in a threaded environment
I have no idea what that means in practice. If you take away parallel execution, where's the need for thread safety? Single thread = thread safe.
Even if e.g. a user uses RSpec functionality (expect
, etc.) from within different threads (which they created within a single test) - that seems like testing RSpec's internal thread safety for no reason (instead of testing their own code). It's up to the user to synchronize/join threads/queue calls - so those called from the main thread.
I don't see a reason for RSpec to be internally thread safe if we take parallel execution out of the picture.
We're designed to be extensible, so we should be good citizens, being thread safe if it doesn't effect our performance unduly is just being nice.
being thread safe so we can be used in a threaded environment
I have no idea what that means in practice. If you take away parallel execution, where's the need for thread safety? Single thread = thread safe.
I have a concrete use case for thread safety: I run custom matchers which test web apps via the network. I use Sidekiq (thread-based queuing) to run many tests in parallel. This is slow-I/O-bound, and so I run as many workers as I can, constrained only by RAM in the machine — not CPU cores.
Now, since RSpec isn't thread-safe, each of these threaded workers must start up and shell out to a Ruby/RSpec process. And so I can only run about 10 workers concurrently on a typical 512MB Heroku Dyno. But if RSpec were thread-safe, I could increase the concurrency many times over.
I realize this is not a common use case. It is, though, one more datapoint in favor of 'thread-safety'. IMO heading towards thread-safety would be a good goal: avoiding global state and favoring immutability and idempotence are good practices.
@e2 @JonRowe
@e2 Regarding parallelizing tests using threads: Since in MRI Ruby at least parallel execution is only available when the GIL is released (that is when I/O happens) having a setup of N processes and M threads is very useful for test suites that are I/O intensive. It's pointless to wait for database queries to complete while other tests could be run.
@thedrow - you can simply use parallel_tests
or something. This is better, since each process can have it's own database. You're not worried by RAM here, so multiple processes is not your constraint.
@dogweather - you want threads because of RAM savings. Fair enough. It's a valid use case for me. The only reason it's not typical is because usually integration tests are frequently run on powerful desktops.
The two "must have threads" cases for me are: not enough RAM and process startup overhead.
I still think that the tests will run faster. We can try to compare with MiniTest.
@thedrow I just happened to read this, Exploring Minitest Concurrency, which found pretty substantial gains in JRuby and Rubinius.
Please note our goal here is to be thread safe in order to enable other people to parallelise RSpec for the moment, were not talking about making RSpec run parallel itself just yet.
Right. So if I understand it, RSpec has the makings of being able to run multiple times w/in one Ruby VM. runner.rb
creates a new Runner instance to handle each run.
def self.run(args, err=$stderr, out=$stdout)
...
new(options).run(err, out)
end
Maybe I actually could get it working for my scenario. I'll give it another try.
So whats the status on becoming thread safe?
No change, if you'd like to take up the mantle go for it!
We're designed to be extensible, so we should be good citizens, being thread safe if it doesn't effect our performance unduly is just being nice.
I agree with this in theory, though suspect that in practice @e2's concerns may be more pragmatic - the cost of retrofitting thread safety on such a large project may be prohibitive. I also worry about stuff like #2064.
Leaving open because I'd love someone to prove me wrong :)
Back linking https://github.com/rspec/rspec-rails/issues/2104
Add support for Rails 6 built-in Parallel Testing
Use Rails 6 built-in parallelizer API for running parallel RSpec executors
https://edgeguides.rubyonrails.org/testing.html#parallel-testing
https://github.com/rails/rails/pull/31900#issuecomment-374299402
Going to bump this, mostly because the "Ractor" features in Ruby 3 might make such a feature considerably cheaper.
RSpec today does not support parallel execution of tests in the core. There are bunch of gems that allow for parallel execution, but all of these are process based and not thread based.
With JRuby/Rubinius support proper threading today, and MRI being the only major Ruby with a GIL, it might make more sense for RSpec to start supporting threads now. I understand this a major change, and it might just make sense to do it in the timeframes for RSpec 3.0.
Integration Tests are mostly IO Bound, and the process model though works is a needless use of system resources. Threads here could work really well. With support for parallelization out of the box, and the fact that parallel tests will need tests to run in random order, will give a good incentive for people to write good isolated tests and run them much faster.
Here's the barebones API that I am looking at...
Each
describe
block should allow for setting the order, so that folks can slowly start moving parts of their test-suite to parallel execution. In terms of over-all RSpec run order, I'd expect RSpec to do the same thing here, that Minitest does, that is run all the non parallel tests first in the event there's some shared state, and then run the tests that can execute in parallel.P.S I decided to open this issue, after a quick chat with @myronmarston on Twitter. https://twitter.com/KarthikDot/status/424608731285176320