skroutz / rspecq

Distribute and run RSpec suites among parallel workers; for faster CI builds
https://rubygems.org/gems/rspecq
MIT License
161 stars 24 forks source link

High redis commands/second on parallelization > 1 #67

Open siassaj opened 3 years ago

siassaj commented 3 years ago

Hi all

I'm experiencing weird redis thrash that's causing the test suite to go very long, and smash redis;

1 worker: image

2 workers: image

8 workers: image

Further information: 1030 tests in spec suite redis 3.2 running in a docker container ruby 2.7.4 running in a docker container

gems: rspec 3.9.0 rspecq 0.7.1 redis 4.1.2 redis-rails 5.0.2

Interestingly the cmd/s is fine for 1 worker, but with 2 or more workers it jumps orders of magnitudes into the 10s or 100s of thousaands and saturates the redis instance, slowing everything down.

I am running the commands with the following:

RSPECQ_BUILD=<8 digit randomly generated string>
RSPECQ_REDIS_URL=redis://redis:6379/8
RSPECQ_MAX_REQUEUES=3

for i in $sequence; do
    echo "Testing $i"
    TEST_ENV_NUMBER=$i bundle exec rspecq --worker=$i "$FILES" > /tmp/rspecq_${RSPECQ_BUILD}_${i} 2>&1 &
    pids[${i}]=$!
done
fragoulis commented 3 years ago

Hello @siassaj.

That is indeed very interesting. I just want you to know that unfortunately at the time we do not have the resources to catch up on rspecq, so this is going to have to wait. Thank you for the report however. It will be on our radar.

In the meantime, post your rspecq & redis configurations in order to cross-check them with ours, in case there is something obvious we can point you at.

siassaj commented 3 years ago

Figured it out, It's partly my own fault;

Calling an rspecq worker with an empty file string is what causes it.

My script looks a bit like below, and where $FILES is empty it passes "" to rspecq, which doesn't handle it well.

for i in $range; do
  bundle exec rspecq --worker=$i "$FILES"
end

It causes most of rspecq's workers to just loop endlessly without any pause/sleep while (I think) 1 worker actually does run the test suite. Not sure why, I didn't dive in, but I think it's not populating redis properly. I can take a look later although at a glance it looks like perhaps there are no guards for empty ARGV members, I noticed

# bin/rspecq

...

worker.files_or_dirs_to_run = ARGV if ARGV.any?

...

so [""] would be passed along.

EDIT: To add, when timings aren't accurate & some workers are "done" but others still have work the "done" workers may start to show the same behaviour, so it looks like a defensive return/sleep might be necessary for the loop.

I can put together a PR if you're likely to review it?