kubeflow / arena

A CLI for Kubeflow.
Apache License 2.0
733 stars 177 forks source link

tf-job-operator CrashLoopBackOff #298

Open toyow opened 4 years ago

toyow commented 4 years ago

`{"filename":"app/server.go:75","level":"info","msg":"EnvKubeflowNamespace not set, use default namespace","time":"2020-02-24T06:12:54Z"} {"filename":"app/server.go:79","level":"info","msg":"Using cluster scoped operator","time":"2020-02-24T06:12:54Z"} {"filename":"app/server.go:85","level":"info","msg":"[API Version: v1 Version: v0.1.0-alpha Git SHA: Not provided. Go Version: go1.12 Go OS/Arch: linux/amd64]","time":"2020-02-24T06:12:54Z"} {"filename":"tf-operator.v1/main.go:40","level":"info","msg":"Setting up client for monitoring on port: 8443","time":"2020-02-24T06:12:54Z"} W0224 06:12:54.926349 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. {"filename":"tensorflow/controller.go:123","level":"info","msg":"Creating TFJob controller","time":"2020-02-24T06:12:54Z"} {"filename":"tensorflow/controller.go:130","level":"info","msg":"Creating Job controller","time":"2020-02-24T06:12:54Z"} I0224 06:12:54.972835 1 leaderelection.go:185] attempting to acquire leader lease default/tf-operator... E0224 06:12:54.982793 1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /usr/local/go/src/runtime/panic.go:522 /usr/local/go/src/runtime/panic.go:82 /usr/local/go/src/runtime/signal_unix.go:390 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/apis/tensorflow/validation/validation.go:96 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/apis/tensorflow/validation/validation.go:79 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/controller.v1/tensorflow/informer.go:99 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/controller.v1/tensorflow/job.go:36 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 /usr/local/go/src/runtime/asm_amd64.s:1337 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x115d5e8]

goroutine 146 [running]: github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105 panic(0x12e8880, 0x2317ec0) /usr/local/go/src/runtime/panic.go:522 +0x1b5 github.com/kubeflow/tf-operator/pkg/apis/tensorflow/validation.validateV1ReplicaSpecs(0xc0000c34a0, 0x147a19f, 0x5) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/apis/tensorflow/validation/validation.go:96 +0x208 github.com/kubeflow/tf-operator/pkg/apis/tensorflow/validation.ValidateV1TFJobSpec(...) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/apis/tensorflow/validation/validation.go:79 github.com/kubeflow/tf-operator/pkg/controller.v1/tensorflow.tfJobFromUnstructured(0x1471920, 0xc00087c088, 0x0, 0xc0007f8180, 0x0) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/controller.v1/tensorflow/informer.go:99 +0x1a1 github.com/kubeflow/tf-operator/pkg/controller.v1/tensorflow.(TFController).addTFJob(0xc0001ee900, 0x1471920, 0xc00087c088) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/pkg/controller.v1/tensorflow/job.go:36 +0x50 github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195 github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache.(processorListener).run.func1.1(0x42d26d, 0xc00054ade8, 0x11934d1) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554 +0x26d github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc0007bde38, 0x42cdcf, 0xc0008140d0) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0xde github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache.(processorListener).run.func1() /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x89 github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00054af68) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54 github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007bdf68, 0xdf8475800, 0x0, 0x1640c01, 0xc00081a060) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xf8 github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache.(processorListener).run(0xc000768980) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x9c github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(Group).Start.func1(0xc0004a8bc0, 0xc000856010) /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f created by github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(Group).Start /mnt/test-data-volume/tf-operator-release-d746bde9-kunming/go/src/github.com/kubeflow/tf-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62`

cheyang commented 3 years ago

Could you please let us know how to reproduce it? Thanks.