Closed ddelnano closed 9 months ago
Until this is fixed here is my solution:
$ comm -2 -3 <(git grep TestAcc | cut -d ':' -f2 | awk '{print $2}' | grep '(t' | grep -Po '.*(?=..$)' | sort) <(grep 'PASS:' ouptut.txt | awk '{print $3}' | sort ) | tr '\n' '\|'
Repeat with each successive test run.
I've evaluated gotestsum. Unfortunately it doesn't provide the functionality I need. It does allow re-running failed tests, but we need terraform's test sweeping to run before each additional retry (to purge VMs and other failed resources).
From reading it's source code and rakyll/go-test-trace, I believe writing a custom test runner is a good option. Essentially what it will do is keep track of all tests, run go test ... -sweep=true -json
and parse the streamed output. Once the test timeout is reached it will remove the successful tests from the test list and run go test ... -sweep=true -json
again with a smaller subset of tests. Separately I'd like to better understand these test failures, but I'd like to attempt this test runner first since my past attempts at understanding the tests's performance hasn't been successful.
I have a work in progress branch that I believe should be close to an initial prototype.
I've evaluated gotestsum. Unfortunately it doesn't provide the functionality I need. It does allow re-running failed tests, but we need terraform's test sweeping to run before each additional retry (to purge VMs and other failed resources).
I took another look at gotestsum as I was debugging my custom test runner and my initial assessment isn't quite correct. It should work with terraform's test sweeping (re-running go test
after each set of failures). However when it encounters a test panic (exactly the situation I'd like to re-run in), it simply gives up.
My branch mentioned above has a github.com/ddelnano/terraform-provider-xenorchestra/testing/parallel
test suite that passes with -timeout=32s
, but fails with a -timeout=22s
to better acceptance test whatever solution I use. Here is the behavior of gotestsum
against those tests.
ddelnano@ddelnano-desktop:~/go/src/github.com/ddelnano/terraform-provider-xenorchestra$ ~/code/gotestsum/gotestsum --rerun-fails --packages="./..." -- -count=1 -timeout=22s -run="TestParallel" -v
∅ .
∅ cmd/testing (3ms)
∅ xoa/internal (11ms)
∅ xoa (12ms)
✖ cmd/testing/parallel (22.015s)
=== Failed
=== FAIL: cmd/testing/parallel TestParallelHangging (unknown)
DONE 3 tests, 1 failure in 22.558s
ERROR rerun aborted because previous run had a suspected panic and some test may not have run
ddelnano@ddelnano-desktop:~/go/src/github.com/ddelnano/terraform-provider-xenorchestra$ ~/code/gotestsum/gotestsum --rerun-fails --packages="./..." -- -count=1 -timeout=32s -run="TestParallel" -v
∅ .
∅ cmd/testing (8ms)
∅ xoa/internal (12ms)
∅ xoa (13ms)
✓ cmd/testing/parallel (30.022s)
DONE 3 tests in 30.561s
So it seems gotestsum
may be able to be extended to not error in this situation, but I need to better understand what is more feasible.
I've also created a post on the Hashicorp discuss forums. I'm hoping to receive more guidance from other provider developers / Hashicorp on other things I haven't considered.
While gotestsum
does not handle go test
timeouts that result in panic
s, I've determined that I can extend it easily to support the behavior I need (https://github.com/gotestyourself/gotestsum/commit/8c70ff05e83bc93e5e042151a4d25343f1dfe4f8). My sample test package that panics if a timeout occurs within 30 seconds proves that my patch works as seen below.
I will attempt to upstream this behavior, but for now I believe that gotestsum
is the correct approach to easily run the test suite.
I have to modify how the terraform test setup code works, so the solution isn't finished yet. However, I'm confident that it will be workable to solve this problem at this point. My previous work to write my own test runner ran into issues and I think it's better to build on top of an existing project than piece together a minimal one off test runner.
As an extension of this issue, I'd like to setup continuous integration so PRs are able to be tested automatically. Once I have this issue resolved, I will be looking into the CI job further.
I have confirmed that the changes in #217 result in a build that succeeds even when a subset of tests hit the timeout and panic. This significantly reduces the manual work to run the test suite and will drastically improve development and release time.
This is no longer an issue as there is build infrastructure within the Vates lab now.
The current acceptance test suite has grown since the project started. It is now comprised of 30
xenorchestra_vm
resource tests, which often cause thego test
command to timeout (runningTIMEOUT=XXXm make testacc
never succeeds on its own). This is starting to become a significant time sink on development velocity. In addition, the process listed below is time consuming and requires that I correctly identify what tests failed or were skipped:make testacc
go test
timed out before that test was run)TEST='Test1|Test2|Test3|....' make testacc
There have been past efforts to improve the test suite quality:
84
149
148
While these have improved the test suite, it does not help with managing the test suite over time as its performance changes (test suite becomes slower, certain tests become problematic, identifying bad tests).
The goal of this issue is to allow the test suite to pass by running a single bash command. This will prevent the frequent loop identified above where commands are issued until all acceptance tests have passed.
One possible idea is to explore alternative ways of running the test suite (shell script, more proper test runner). It seems that gotestsum test runner is worth exploring. It has builtin support for re-running failed tests and can also identify what tests are slow.