openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.48k stars 4.7k forks source link

unit test k8s.io/kubernetes/pkg/kubelet/rkt TestVersion #17757

Closed deads2k closed 6 years ago

deads2k commented 6 years ago

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17749/test_pull_request_origin_unit/6882/

github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/rkt TestVersion 0.01s

<autogenerated>:1: 

    Error Trace:    remote_runtime_test.go:66
Error:          Received unexpected error:
rpc error: code = Unavailable desc = grpc: the connection is unavailable
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x17bcc91]

goroutine 20 [running]:
testing.tRunner.func1(0xc42060e0f0)
/usr/local/go/src/testing/testing.go:711 +0x5d9
panic(0x1904bc0, 0x24bfc70)
/usr/local/go/src/runtime/panic.go:491 +0x2a2
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/remote.TestVersion(0xc42060e0f0)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/remote/remote_runtime_test.go:67 +0x181
testing.tRunner(0xc42060e0f0, 0x1aec938)
/usr/local/go/src/testing/testing.go:746 +0x16d
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:789 +0x569
FAIL    github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/remote 0.646s

Seen post 1.9 rebase.

@sjenning are flakes still automatic p1s?

bparees commented 6 years ago

evidently still happening: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18250/test_pull_request_origin_unit/8486/

RobertKrawitz commented 6 years ago

Two issues: 1) Issue is that fakeRuntime.Start() is returning prior to the listener being set up, and the remote runtime test is assuming that everything is ready. Whether this happens correctly or not depends upon factors such as speed of and load on the system under test among others. Solution: fakeRuntime.Start() must not return until it has successfully created the listener. remote.createAndStartFakeRemoteRuntime() must call fakeremote.Start() normally, and fakeRemote.Start() must call RemoteRuntime.server.Serve() as a goroutine.

2) Various places in remote_runtime_test.go must require.noError() rather than assert.noError().

Preparing the PR now. Tested by introducing a 1 second sleep in util.createListener, test failed every time. Fix #1, problem did not reproduce in about 10 tries. Also running a long loop, and adding load to the system.

mfojtik commented 6 years ago

This flakes seems to occur 1 per day or so https://snowstorm-origin-ci.svc.ci.openshift.org