simplecov-ruby / simplecov

Code coverage for Ruby with a powerful configuration library and automatic merging of coverage across test suites
MIT License
4.77k stars 553 forks source link

ParallelTests - Coverage Report Overwritten #1019

Open sundarraja opened 2 years ago

sundarraja commented 2 years ago

Test Frameworks: Minitest and ParallelTests SimpleCov Version: 0.21.2 Ruby Description: ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x86_64-linux] Rails Version: 6.0.3.7 Tests Invocation: COVERAGE=true parallel_test -n 3 <space-separated-test-file-paths>

Issue: In the screenshot below, the highlighted part should have been 'JSON Coverage Report generated for (1/3), (2/3), (3/3)'. But the coverage result from the first 2 processes got overwritten somehow.

Issue Frequency: Not reproducible (Happens Randomly)

coverage-report

lluis commented 2 years ago

it happens the same to me, with rspec + parallel_tests

run where report 12/12 overrides coverage:

lluis@laptop:~/git/vz (1685-test-coverage)$ rm -r coverage/ ; bundle exec rake parallel:spec
# (skipped non-coverage lines)
Coverage report generated for (3/12) to /home/lluis/git/vz/coverage. 5755 / 24443 LOC (23.54%) covered.
Coverage report generated for (1/12), (3/12) to /home/lluis/git/vz/coverage. 6570 / 23897 LOC (27.49%) covered.
Coverage report generated for (1/12), (3/12), (9/12) to /home/lluis/git/vz/coverage. 6861 / 23729 LOC (28.91%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (9/12) to /home/lluis/git/vz/coverage. 7244 / 23523 LOC (30.8%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (7/12), (9/12) to /home/lluis/git/vz/coverage. 7963 / 23049 LOC (34.55%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (5/12), (7/12), (9/12) to /home/lluis/git/vz/coverage. 8551 / 22627 LOC (37.79%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (5/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 8941 / 22428 LOC (39.87%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (4/12), (5/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 9276 / 22230 LOC (41.73%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (4/12), (5/12), (6/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 9753 / 21953 LOC (44.43%) covered.
Coverage report generated for (1/12), (11/12), (2/12), (3/12), (4/12), (5/12), (6/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 9920 / 21898 LOC (45.3%) covered.
Coverage report generated for (1/12), (10/12), (11/12), (2/12), (3/12), (4/12), (5/12), (6/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 10042 / 21782 LOC (46.1%) covered.
Coverage report generated for (12/12) to /home/lluis/git/vz/coverage. 5295 / 24832 LOC (21.32%) covered.

correct run:

lluis@laptop:~/git/vz (1685-test-coverage)$ rm -r coverage ; bundle exec rake parallel:spec
Coverage report generated for (1/12) to /home/lluis/git/vz/coverage. 4994 / 25407 LOC (19.66%) covered.
Coverage report generated for (1/12), (3/12) to /home/lluis/git/vz/coverage. 5683 / 24936 LOC (22.79%) covered.
Coverage report generated for (1/12), (3/12), (9/12) to /home/lluis/git/vz/coverage. 6480 / 24503 LOC (26.45%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (9/12) to /home/lluis/git/vz/coverage. 6762 / 24331 LOC (27.79%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (6/12), (9/12) to /home/lluis/git/vz/coverage. 7274 / 24074 LOC (30.22%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (6/12), (7/12), (9/12) to /home/lluis/git/vz/coverage. 8642 / 22574 LOC (38.28%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (5/12), (6/12), (7/12), (9/12) to /home/lluis/git/vz/coverage. 9135 / 22266 LOC (41.03%) covered.
Coverage report generated for (1/12), (11/12), (3/12), (4/12), (5/12), (6/12), (7/12), (9/12) to /home/lluis/git/vz/coverage. 9329 / 22197 LOC (42.03%) covered.
Coverage report generated for (1/12), (11/12), (2/12), (3/12), (4/12), (5/12), (6/12), (7/12), (9/12) to /home/lluis/git/vz/coverage. 9584 / 22086 LOC (43.39%) covered.
Coverage report generated for (1/12), (11/12), (2/12), (3/12), (4/12), (5/12), (6/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 9798 / 21954 LOC (44.63%) covered.
Coverage report generated for (1/12), (10/12), (11/12), (2/12), (3/12), (4/12), (5/12), (6/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 9961 / 21875 LOC (45.54%) covered.
Coverage report generated for (1/12), (10/12), (11/12), (12/12), (2/12), (3/12), (4/12), (5/12), (6/12), (7/12), (8/12), (9/12) to /home/lluis/git/vz/coverage. 10166 / 21722 LOC (46.8%) covered.

gem versions:

lluis@laptop:~/git/vz (1685-test-coverage)$ ruby -e "puts RUBY_DESCRIPTION"
ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
lluis@laptop:~/git/vz (1685-test-coverage)$ bundle show simplecov
/home/lluis/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/simplecov-0.21.2
lluis@laptop:~/git/vz (1685-test-coverage)$ bundle show parallel_tests
/home/lluis/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/parallel_tests-3.11.0
lluis@laptop:~/git/vz (1685-test-coverage)$ bundle show rspec-rails
/home/lluis/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/rspec-rails-3.8.3
lluis@laptop:~/git/vz (1685-test-coverage)$ bundle show capybara
/home/lluis/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/capybara-3.35.3
lluis@laptop:~/git/vz (1685-test-coverage)$ bundle show rails
/home/lluis/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/rails-4.2.11.3
Canertasan commented 2 years ago

Same issue here with parallel test and rspec, what i discover, if you compare resultset.json between bundle exec parallel_test and bundle exec rspec in parallel test i have 100k lines but in normal test i have 10k lines(reason is i have 10 cpu so 10 database is running). So it is not basically overriding, just adding at the end of the file. So if you check file names it will be database count which in this case i have 10 duplicate resultset for 1 file. Somehow we need to merge those. Maybe this is possible with JsonFormatter or SimpleFormatter If i have find a solution i will let you know

tjallingvanderwal commented 2 years ago

Observation

Also using parallel_test. Running on CircleCI. Using SimpleCov::Formatter::CoberturaFormatter to generate coverage.xml before uploading results to codecov.io.

Caught the following output at the end of a test job:

Coverage report generated for (1/8), (2/8), (3/8), (4/8), (5/8), (6/8), (7/8), (8/8) to /home/circleci/********/coverage/coverage.xml. 19217 / 25717 LOC (74.72%) covered.
Coverage report generated for (1/8), (2/8), (3/8), (4/8), (5/8), (6/8), (8/8) to /home/circleci/********/coverage/coverage.xml. 18241 / 25690 LOC (71.0%) covered.

This is the same as what others have reported here.

Hypothesis

Looking through the code, my hypothesis is that .resultset.json is protected by a file lock when reading/merging results. But anything that happens after that (anything any Formatter does), is unprotected.

process A calls `store_result`                          # `.resultset.json` protected by file lock
process A calls `merged_result`                         # `.resultset.json` protected by file lock
process A holds @results in memory
process B calls `store_result`                          # `.resultset.json` protected by file lock
process B calls `merged_result`                         # `.resultset.json` protected by file lock
process B holds @results in memory
process B calls `at_exit { SimpleCov.result.format! }`  # `coverage.xml` not protected
process A calls `at_exit { SimpleCov.result.format! }`  # `coverage.xml` not protected

ref: https://github.com/simplecov-ruby/simplecov/blob/efdb08db63b35577b3b0d79db893ead1d848d8dd/lib/simplecov.rb#L107-L115

Workaround / Our Solution

Let every parallel job write to a different directory:

if ENV['CI']
  SimpleCov.coverage_dir "coverage/#{ENV['CI_NODE_INDEX']}/"
end

If the tooling (or cloud service) you use to track coverage can handle multiple coverage reports, then this is already enough.

Otherwise, use a custom script to aggregate everything using SimpleCov.collate.

ahukkanen commented 2 years ago

I am not very familiar with the inner workings of Simplecov but we are also experiencing this issue at decidim/decidim and based on this conversation I found the file locking at the ResultMerger suspicious: https://github.com/simplecov-ruby/simplecov/blob/efdb08db63b35577b3b0d79db893ead1d848d8dd/lib/simplecov/result_merger.rb#L153-L166

It sure locks the file for the running process but it is not checking the file lock for any other processes which to me seems that if two processes are writing their results at the same time, it would get mixed.

As said, I am no expert with the Simplecov codebase but this just looked suspicious to me so I thought I'd mention it here.

Our run environment is basically the same as already reported here:

knovoselic commented 1 year ago

As far as I can see, there are no bugs related to the locking logic. I've had the same issue, and the problem turned out to be in the SimpleCov.merge_timeout option. By default, its set to 10 minutes, meaning that any runs/code coverage results which are older than 10 minutes will not get merged into the final report. Maybe it helps someone :)

rangerscience commented 9 months ago

I think I've run into this - we're using parallel_test, and when I run bundle exec rake parallel:spec I reliably get what seem to be unmerged coverage reports.

I dug into SimpleCov just a wee bit (in the ReadMe, it shows you the commit that's supposed to support parallel test), and this bit jumped out at me:

if ENV['PARALLEL_TEST_GROUPS'] && ENV['TEST_ENV_NUMBER']

I poked around inside of parallel_test, and it does set that env var at some point (it also unsets it), so I suspect that this is a gem ordering issue. Regardless, running the tests with the following seems to have resolved the issue:

PARALLEL_TEST_GROUPS='' bundle exec rake parallel:spec

Edit: This worked to fix thing locally, but didn't fix things in the CI/CD system (github actions)