open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.76k stars 2.19k forks source link

[exporter/clickhouse] Integration test failing due to time out #32275

Open crobert-1 opened 3 months ago

crobert-1 commented 3 months ago

Component(s)

exporter/clickhouse

Describe the issue you're reporting

CI/CD link here.

Failure output:

Running target 'mod-integration-test' in module 'exporter/clickhouseexporter' as part of group 'exporter-0'
make --no-print-directory -C exporter/clickhouseexporter mod-integration-test
running go integration test ./... in /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/clickhouseexporter
/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 360s -parallel 4 -tags=integration,"" -run=Integration
go: downloading github.com/jmoiron/sqlx v1.3.5
go: downloading github.com/testcontainers/testcontainers-go v0.29.1
go: downloading dario.cat/mergo v1.0.0
go: downloading github.com/cpuguy83/dockercfg v0.3.1
go: downloading github.com/containerd/containerd v1.7.12
go: downloading github.com/moby/patternmatcher v0.6.0
go: downloading github.com/moby/term v0.5.0
go: downloading golang.org/x/exp v0.0.0-20230711023510-fffb14384f22
go: downloading go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.45.0
go: downloading github.com/containerd/log v0.1.0
go: downloading github.com/moby/sys/sequential v0.5.0
go: downloading github.com/morikuni/aec v1.0.0
go: downloading github.com/felixge/httpsnoop v1.0.3
go: downloading github.com/moby/sys/user v0.1.0
∅  internal/metadata
∅  internal (1.018s)
✖  . (6m0.053s)

=== Failed
=== FAIL: .  (0.00s)
panic: test timed out after 6m0s
ERROR rerun aborted because previous run had a suspected panic and some test may not have run
running tests:
    TestIntegration (6m0s)
    TestIntegration/test_clickhouse_23-alpine (5m28s)

goroutine 172 [running]:
testing.(*M).startAlarm.func1()
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:2259 +0x259
make[2]: *** [../../Makefile.Common:[142](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8619482433/job/23624324767?pr=32271#step:5:143): mod-integration-test] Error 3
make[1]: *** [Makefile:165: exporter/clickhouseexporter] Error 2
make: *** [Makefile:122: gointegration-test] Error 2
created by time.goFunc
    /opt/hostedtoolcache/go/1.21.9/x64/src/time/sleep.go:176 +0x45

goroutine 1 [chan receive, 6 minutes]:
testing.(*T).Run(0xc0000a2820, {0x182feec, 0xf}, 0x18f1488)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1649 +0x871
testing.runTests.func1(0x0?)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:2054 +0x85
testing.tRunner(0xc0000a2820, 0xc0005a7998)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1595 +0x262
testing.runTests(0xc00030d860?, {0x238a6e0, 0x14, 0x14}, {0xc0005a7a68?, 0x448785?, 0x2399080?})
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:2052 +0x8ae
testing.(*M).Run(0xc00030d860)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1925 +0xcd8
go.uber.org/goleak.VerifyTestMain({0x1ab2620, 0xc00030d860}, {0x0, 0x0, 0x0})
    /home/runner/go/pkg/mod/go.uber.org/goleak@v1.3.0/testmain.go:53 +0x65
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter.TestMain(...)
    /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/clickhouseexporter/package_test.go:13
main.main()
    _testmain.go:89 +0x317

goroutine 20 [chan receive, 5 minutes]:
testing.(*T).Run(0xc0000a29c0, {0x183a982, 0x19}, 0xc0001f8a50)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1649 +0x871
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter.TestIntegration(0x0?)
    /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/clickhouseexporter/integration_test.go:44 +0x17a
testing.tRunner(0xc0000a29c0, 0x18f1488)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1595 +0x262
created by testing.(*T).Run in goroutine 1
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1648 +0x846

goroutine 180 [IO wait]:
internal/poll.runtime_pollWait(0x7f36d170ec90, 0x72)
    /opt/hostedtoolcache/go/1.21.9/x64/src/runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc000412520, 0xc000410000?, 0x0)
    /opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
    /opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000412500, {0xc000410000, 0x1000, 0x1000})
    /opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_unix.go:164 +0x405
net.(*netFD).Read(0xc000412500, {0xc000410000, 0x1000, 0x1000})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/fd_posix.go:55 +0x4b
net.(*conn).Read(0xc0003a61f0, {0xc000410000, 0x1000, 0x1000})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/net.go:179 +0xad
net/http.(*persistConn).Read(0xc0003bb0e0, {0xc000410000, 0x1000, 0x1000})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transport.go:1954 +0x105
bufio.(*Reader).fill(0xc00039f020)
    /opt/hostedtoolcache/go/1.21.9/x64/src/bufio/bufio.go:113 +0x29a
bufio.(*Reader).ReadSlice(0xc00039f020, 0xd1?)
    /opt/hostedtoolcache/go/1.21.9/x64/src/bufio/bufio.go:379 +0x85
net/http/internal.readChunkLine(0xc000410000?)
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/internal/chunked.go:156 +0x34
net/http/internal.(*chunkedReader).beginChunk(0xc00050c030)
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/internal/chunked.go:49 +0x5a
net/http/internal.(*chunkedReader).Read(0xc00050c030, {0xc0002586a4, 0x195c, 0x195c})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/internal/chunked.go:125 +0x2b4
net/http.(*body).readLocked(0xc000512000, {0xc0002586a4, 0x195c, 0x195c})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transfer.go:839 +0xa9
net/http.(*body).Read(0xc000512000, {0xc0002586a4, 0x195c, 0x195c})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transfer.go:831 +0x185
net/http.(*bodyEOFSignal).Read(0xc000512040, {0xc0002586a4, 0x195c, 0x195c})
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transport.go:2810 +0x[143](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8619482433/job/23624324767?pr=32271#step:5:144)
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*wrappedBody).Read(0xc000426040, {0xc0002586a4, 0x195c, 0x195c})
    /home/runner/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.45.0/transport.go:173 +0x6b
io.ReadAll({0x7f36d137d2a8, 0xc0002c2040})
    /opt/hostedtoolcache/go/1.21.9/x64/src/io/io.go:704 +0x87
github.com/testcontainers/testcontainers-go.(*DockerProvider).attemptToPullImage(0xc000114f70, {0x1abe768, 0x23cb500}, {0x184b098, 0x26}, {0x0, {0xc00048a0d0, 0xd0}, 0x0, {0x0, ...}})
    /home/runner/go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.29.1/docker.go:1272 +0x392
github.com/testcontainers/testcontainers-go.(*DockerProvider).CreateContainer(_, {_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...})
    /home/runner/go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.29.1/docker.go:1044 +0x1f68
github.com/testcontainers/testcontainers-go.GenericContainer({_, _}, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...}, ...})
    /home/runner/go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.29.1/generic.go:73 +0x350
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter.getContainer(_, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...})
    /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/clickhouseexporter/integration_test.go:78 +0xd4
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter.TestIntegration.func1(0xc00040a000)
    /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/clickhouseexporter/integration_test.go:52 +0x308
testing.tRunner(0xc00040a000, 0xc0001f8a50)
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1595 +0x262
created by testing.(*T).Run in goroutine 20
    /opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1648 +0x846

goroutine 194 [chan receive, 5 minutes]:
github.com/testcontainers/testcontainers-go.(*Reaper).Connect.func1({0x1ac69d0?, 0xc00006e6e8})
    /home/runner/go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.29.1/reaper.go:358 +0xa0e
created by github.com/testcontainers/testcontainers-go.(*Reaper).Connect in goroutine 180
    /home/runner/go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.29.1/reaper.go:323 +0x2ec

goroutine 183 [select, 5 minutes]:
net/http.(*persistConn).writeLoop(0xc0003bb0e0)
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transport.go:2421 +0x1bc
created by net/http.(*Transport).dialConn in goroutine 181
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transport.go:1777 +0x266b

goroutine 182 [select, 5 minutes]:
net/http.(*persistConn).readLoop(0xc0003bb0e0)
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transport.go:2238 +0x[153](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8619482433/job/23624324767?pr=32271#step:5:154)d
created by net/http.(*Transport).dialConn in goroutine 181
    /opt/hostedtoolcache/go/1.21.9/x64/src/net/http/transport.go:1776 +0x25da
FAIL    github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter   360.053s

=== FAIL: . TestIntegration/test_clickhouse_23-alpine (unknown)

=== FAIL: . TestIntegration (unknown)
github-actions[bot] commented 3 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 commented 3 months ago

This may just be a test issue, but I'm labelling as a bug until it's been investigated 👍

SpencerTorres commented 3 months ago

I might have seen this even when running locally. The tests do pass, this is likely a one-off error with a container failing to start in time.

Is this the first/only run for the CI? @crobert-1

Any thoughts on how we could make this more robust? @Frapschen

crobert-1 commented 3 months ago

That's the first run on that PR. I think I might have seen it before, but I don't definitively remember. I'm aware of general stability issues around testcontainers-go, and from the stack trace it looks like the panic might have been hit because the docker image itself wasn't able to be downloaded in time, potentially.

github.com/testcontainers/testcontainers-go.(*DockerProvider).attemptToPullImage(0xc000114f70, {0x1abe768, 0x23cb500}, {0x184b098, 0x26}, {0x0, {0xc00048a0d0, 0xd0}, 0x0, {0x0, ...}})

Just a couple initial thoughts.

SpencerTorres commented 3 months ago

looks like the panic might have been hit because the docker image itself wasn't able to be downloaded in time

This was my interpretation as well. It's what usually happens when docker integration tests fail. If the CI has a shared image cache we could try using an exact image hash instead of always pulling the latest for that tag. Quickest workaround for now is to re-run and see if docker is quicker.

Maybe we can shorten the timeout for testcontainers-go so that it can retry instead of hanging? I don't like the solution being "just run it again", I would like consistency in the CI

crobert-1 commented 3 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8807361369/job/24174370591

crobert-1 commented 3 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8793809980/job/24132264654?pr=32481

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

SpencerTorres commented 1 month ago

Should be investigated with #32530

As suggested in a comment over there these could probably be fixed in a single PR