Closed hughesjj closed 2 weeks ago
/label ci-cd
I am gonna look into this issue. Seems like the gotestsum
used for running unit and also integration tests is not propagating the exit code of go test
which is ran in the background, since gotestsum
is just a wrapper around it. I am gonna investigate the possible options here
@odubajDT I'll assign this to you, feel free to assign to me if you don't have the time to take it up I don't have assignment permissions, sorry
Adding some useful links for context..
make gointegration-test GROUP=${{ matrix.group }}
integration-test
without the go
prefix!)Thoughts on adding a canary to integration tests? (ex replace a random file from the above integration targets with /dev/random
or a known failing file and see if the whole workflow fails?)
Hey @odubajDT I'd like to take a stab at this tomorrow if you don't have a fix/root cause for it yet, wanted to give you the opportunity to share anything before I jump the gun
Hey @odubajDT I'd like to take a stab at this tomorrow if you don't have a fix/root cause for it yet, wanted to give you the opportunity to share anything before I jump the gun
Hey @hughesjj sorry for not responding, I was offline for a week. Sure take a look if you are interested
From what I can see is that the tests actually passes, therefore the status is reported correctly
running go integration test ./... in /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/jmxreceiver
/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 360s -parallel 4 -tags=integration,""
∅ internal/metadata
✖ internal/subprocess (776ms)
✖ . (28.327s)
DONE 51 tests, 2 failures in 30.608s
✓ . (1.047s)
✓ internal/subprocess (1.01s)
DONE 2 runs, 51 tests in 34.243s
As we can see, the second run of the internal/subprocess
passes, therefore the correct status is reported.
The test is re-run due to the presence of --rerun-fails=1
parameter in the gotestsum
.
This PR showcases it https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34729
Also one additional (not related) issue I spotted, seems like that in certain cases unit tests
are run as part of the integration tests
test-run (and maybe vice-versa).
See that internal/filter
test in receiver-1
is run in integration-tests and unit-tests as well. This is due to that tests are not correctly tagged.
No worries, I jinxed myself by saying I'd have time to do something 😅
Just sharing info from my side:
You're 100% correct. It's interesting how consistently this fails (at least for the jmx cases), and super annoying when using goland for integ tests -- they fail every time with the "default" run configuration, so breakpoints get annoying.
For jmx in particular, I was getting some complaints about a ryuk container not terminating in goland.
I'll rename this issue to reflect the current understanding of things and see about removing the p1 label
No worries, I jinxed myself by saying I'd have time to do something 😅
Just sharing info from my side:
You're 100% correct. It's interesting how consistently this fails (at least for the jmx cases), and super annoying when using goland for integ tests -- they fail every time with the "default" run configuration, so breakpoints get annoying.
For jmx in particular, I was getting some complaints about a ryuk container not terminating in goland.
I'll rename this issue to reflect the current understanding of things and see about removing the p1 label
Thanks! The PR should be ready for review if you have time.
Component(s)
ci/cd
What happened?
Description
Integration tests don't fail the parent workflow if an integration test failsSteps to Reproduce
Example job
At least two tests in this spot check fail but the workflow succeeds. Search
jmx
in the build logs, output is below though.Expected Result
I expect the failing test to fail the workflow
Actual Result
Collector version
latest (main)
Environment information
No response
OpenTelemetry Collector configuration
No response
Log output
However, on rerun, it passes more or less immediately
another receiver: