ParallelTests hangs with 0.19.0 if custom code includes `wait_for_other_processes_to_finish`

simplecov-ruby / simplecov

Code coverage for Ruby with a powerful configuration library and automatic merging of coverage across test suites

MIT License

4.77k stars 552 forks source link

ParallelTests hangs with 0.19.0 if custom code includes `wait_for_other_processes_to_finish` #922

Open MichaelHoste opened 4 years ago

MichaelHoste commented 4 years ago

We've been using simplecov with rspec and parallel_tests for years (thanks!) and we had some issues when migrating from version 0.18.5 to 0.19.0.

On 0.19.0, after all the tests were executed, it just hangs/freezes and never finishes the job.

We noticed that 2 processes were still active and certainly waiting for each others. So we found out that the issue was this code in our application:

  config.after(:suite) do
    if ParallelTests.first_process?
      ParallelTests.wait_for_other_processes_to_finish

      # some stuff...
    end
  end

If we remove it, it works again.

We need it to execute some code after all the processes finishes, like it's documented here: https://github.com/grosser/parallel_tests#running-things-once

I guess other people, or gems, are using the same snippet and will have the same issue during migration. Do you know if there is a way to fix it, either on simplecov, parallel_tests or locally?

Our Ruby is ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin19] but it also doesn't work on 2.7.1, and we're on rails 6.0.3.2 but that doesn't seem the issue here.

MichaelHoste commented 4 years ago

Update:

It works if we replace first_process? with last_process? in our code.

So the freeze is not only triggered by wait_for_other_processes_to_finish but by the combination of first_process? and wait_for_other_processes_to_finish

PragTob commented 4 years ago

:sob: :sob: :sob:

Eh... hi thanks for reporting the issue! :wave: :green_heart:

:sob: :sob: :sob: :sob: :sob: :sob: :sob: :sob: :sob:

Ok I'm good now.

I knew this would break things for some people and I got hopeful that it didn't.

So yeah... the problem is that we wait in the last process for all processes to finish and you wait in the first process for all processes to finish. Hence this problem. Which definitely sucks.

https://github.com/simplecov-ruby/simplecov/blob/9cd3a46169943f2c79d688d96e93871cd7febe2d/lib/simplecov.rb#L276-L280

I did it based on this comment: https://github.com/grosser/parallel_tests/issues/772#issuecomment-672010347

Maybe we could petition parallel_tests to include a method that combines both to decrease fragmentation? One way or another we'd need to add it to the docs unless we come up with a better solution... which I can't right now :(

MichaelHoste commented 4 years ago

I know the feeling, sorry about that 😄

I have a suggestion but maybe it's nonsense because I don't know your codebase very well...

What if on this method, you use, first_process? instead of last_process?.

https://github.com/simplecov-ruby/simplecov/blob/9cd3a46169943f2c79d688d96e93871cd7febe2d/lib/simplecov.rb#L268-L271

Wouldn't the behaviour still be the same? It doesn't matter which process we select since it will wait for the others and always finish last.

Since first_process? is explicitly used in the ParallelTests README, it should be more commonly used by the community, and it will not freeze.

Also, don't forget that last_process? doesn't mean "last finished process", but "last started process" (see comment here: https://github.com/grosser/parallel_tests#running-things-once). So you don't really know which process will finish last anyway.

This method name should be more explicit to avoid confusion (last_started_process?).

PragTob commented 4 years ago

:wave:

Yeah, thanks we're well aware that last_process? is the last started process or well I'm since a couple of weeks :grin:

I guess we could change it and it would improve compatibility, it still won't solve the core issue that basically every gem and every code base has to wait for the same process or we're experiencing some sort of lock. I'd rather have a fuller solution, albeit this might be as full as it gets :)

MichaelHoste commented 4 years ago

Do you know some gems/applications that have incompatibilities with last_process?.

Since first_process? is explicitely used in the ParallelTests README, it should be more commonly used and cause less issues when upgrading to 0.19.0.

Maybe we could petition parallel_tests to include a method that combines both to decrease fragmentation?

A solution could be to ask them to add new methods like these:

ParallelTests.when_started do
  # do_something
end

ParallelTests.when_finished do
  # do_something
end

That would do:

def when_started(max_execution_time = 1.0)
  if ParallelTests.first_process?
    yield
  else
    sleep(max_execution_time)
  end
end

and

def when_finished    
  if ParallelTests.first_process?
    ParallelTests.wait_for_other_processes_to_finish
    yield
  end
end

It would certainly decrease the fragmentation between first_process? and last_process?