onsi / ginkgo

A Modern Testing Framework for Go
http://onsi.github.io/ginkgo/
MIT License
8.22k stars 650 forks source link

Same test executed multiple times in parallel on different processes and some of them not at all #1242

Open bartam1 opened 1 year ago

bartam1 commented 1 year ago

Dear Community!

Im working on an E2E test using ginkgo. I do the same that @onsi advise to run tests with different parameters in parallel (https://github.com/onsi/ginkgo/issues/561#issuecomment-459727212)

Currently my setup looks like:

Describe("root", func() {
   Describe(fmt.Sprintf("PID: %d:", GinkgoParallelProcess()), Ordered, func() {
          Describe(test, Ordered, func() {
                 It(...)
                 It(...)
                 It(...)
          })
    })
    Describe(fmt.Sprintf("PID: %d:", GinkgoParallelProcess()), Ordered, func() {
          Describe(test, Ordered, func() {
                 It(...)
                 It(...)
                 It(...)
          })
    }) 
    ...
})

When I run this in serial everyting is good but when I go with parallel I have different test results every time.

In the following example I run 5 specs: MockTest1 has MockTest1-1 and MockTest1-2 MockTest2 has MockTest2-1 MockTest2-2 and MockTest2-3

Expected: (what we have in serial):

(all tests executed in order)

Will run 5 of 5 specs
------------------------------
 PID: 1: MockTest1(testContextName3) MockTest1-1 [clusterID:clusterID3, testID:9112f3d1ae]
• [1.002 seconds]
------------------------------
 PID: 1: MockTest1(testContextName3) MockTest1-2 [clusterID:clusterID3, testID:9112f3d1ae]
• [1.001 seconds]
------------------------------
 PID: 1: MockTest2(testContextName2) MockTest2-1 [clusterID:clusterID2, testID:005f598d5b]
• [1.001 seconds]
------------------------------
 PID: 1: MockTest2(testContextName2) MockTest2-2 [clusterID:clusterID2, testID:005f598d5b]
• [1.001 seconds]
------------------------------
 PID: 1: MockTest2(testContextName2) MockTest2-3 [clusterID:clusterID2, testID:005f598d5b]
• [1.001 seconds]
------------------------------
Ran 5 of 5 Specs in 5.009 seconds
SUCCESS! -- 5 Passed | 0 Failed | 0 Pending | 0 Skipped

Actual what we have in parallel:

Will run 5 of 5 specs
Running in parallel across 7 processes
------------------------------
• [1.001 seconds]
 PID: 4: MockTest1(testContextName3) MockTest1-1 [clusterID:clusterID3, testID:9112f3d1ae]
------------------------------
• [1.001 seconds]
 PID: 3: MockTest1(testContextName1) MockTest1-1 [clusterID:clusterID1, testID:0aa3b528cc]
------------------------------
• [1.001 seconds]
 PID: 3: MockTest1(testContextName1) MockTest1-2 [clusterID:clusterID1, testID:0aa3b528cc]
------------------------------
• [1.001 seconds]
 PID: 4: MockTest1(testContextName3) MockTest1-2 [clusterID:clusterID3, testID:9112f3d1ae]
------------------------------
Ran 4 of 5 Specs in 2.015 seconds
SUCCESS! -- 4 Passed | 0 Failed | 0 Pending | 0 Skipped

This is also a buggy ran for second time:

Will run 5 of 5 specs
Running in parallel across 7 processes
------------------------------
• [1.001 seconds]
 PID: 5: MockTest2(testContextName1) MockTest2-1 [clusterID:clusterID1, testID:4050bf3235]
------------------------------
• [1.001 seconds]
 PID: 1: MockTest2(testContextName2) MockTest2-1 [clusterID:clusterID2, testID:005f598d5b]
------------------------------
• [1.001 seconds]
 PID: 5: MockTest2(testContextName1) MockTest2-2 [clusterID:clusterID1, testID:4050bf3235]
------------------------------
• [1.001 seconds]
 PID: 1: MockTest2(testContextName2) MockTest2-2 [clusterID:clusterID2, testID:005f598d5b]
------------------------------
• [1.001 seconds]
 PID: 1: MockTest2(testContextName2) MockTest2-3 [clusterID:clusterID2, testID:005f598d5b]
------------------------------
• [1.002 seconds]
 PID: 5: MockTest2(testContextName1) MockTest2-3 [clusterID:clusterID1, testID:4050bf3235]
------------------------------

Ran 6 of 5 Specs in 3.020 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 0 Skipped

Can you give me please some explanation why is this happen? I cannot figure out :(

onsi commented 1 year ago

hey @bartam1 is there any chance you can share the actual code so I can see how the clusterId and testId parameters are being injected - and how the test is being constructed? Sometimes what you are seeing can happen if different test processes construct different trees (Ginkgo's parallelisation model requires that the constructed tree is 100% consistent between among test runners.)

bartam1 commented 1 year ago

Hello @onsi! Thank you for the fast reply!!!

This is the PR that Im working on. https://github.com/banzaicloud/koperator/pull/1020 Important part: https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L358 https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L92 Currently it is configured to do the same output that I wrote in the issue

bartam1 commented 1 year ago

hey @bartam1 is there any chance you can share the actual code so I can see how the clusterId and testId parameters are being injected - and how the test is being constructed? Sometimes what you are seeing can happen if different test processes construct different trees (Ginkgo's parallelisation model requires that the constructed tree is 100% consistent between among test runners.)

This can be the problem. Im looking into it. Thank you for the clue!

onsi commented 1 year ago

hey there - i think the issue might be on line 92-95 of tests.go. the closure captures tests which will be changed as you loop through testsByClusterID. you'll want tests := tests between lines 92 and 93. give that a try.

bartam1 commented 1 year ago

hey there - i think the issue might be on line 92-95 of tests.go. the closure captures tests which will be changed as you loop through testsByClusterID. you'll want tests := tests between lines 92 and 93. give that a try.

Thanks that one helped. I also had a random mix between runs which caused specs tree inconsistency. (https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L175) I still could produce an issue and probably because iterating through the map keys the order can be different in each run thus different tree can be constructed.

onsi commented 1 year ago

hey - that makes sense. the map issue should have been fixed in v2.7.0 - since that release ginkgo breaks ties for specs in the same location using the spec name. but you do need to make sure the spec names are identical between processes (eg that random test id you are generating needs to be the same for each process).

bartam1 commented 1 year ago

hey - that makes sense. the map issue should have been fixed in v2.7.0 - since that release ginkgo breaks ties for specs in the same location using the spec name. but you do need to make sure the spec names are identical between processes (eg that random test id you are generating needs to be the same for each process).

testID is fixed. Is this can be a problem: https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L92 Here I using GinkgoParallelProcess() in each Describe container name so this will be different in each process

onsi commented 1 year ago

oh yes good catch - that’ll be a problem too. sorry this is so confusing :( there’s not a great way to ensure consistent order when a loop like this is in play.

you can attach the process index as metadata via AddReportEntry or as a Label if you’d like

onsi commented 1 year ago

i really should add a checksum to validate that all trees are the same and give the user feedback. i’ll add that to the backlog.

bartam1 commented 1 year ago

Thank you for the helps! It looks like works fine so far. I could fix every problem with your help. Thank you for the fast replies!

bartam1 commented 1 year ago

i really should add a checksum to validate that all trees are the same and give the user feedback. i’ll add that to the backlog.

It is a good idea. Please let me know when the PR is up :)