neblen commented 2 years ago

What happened:

The spod daemonset can't work.I deploy spo0.4.1 on ubuntu k8s,but the spod can't work correctly.

What you expected to happen:

The spod daemonset work correctly.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The error log is as follows.

I0418 05:30:48.045731 1 logr.go:252] "msg"="Set logging verbosity to 0"
I0418 05:30:48.045731 1 logr.go:252] "msg"="Profiling support enabled: false"
fatal error: timer when must be positive
goroutine 4 [running]:
runtime.throw({0x1b3a37a, 0x0})
runtime/panic.go:1198 +0x71 fp=0xc000059ee8 sp=0xc000059eb8 pc=0x438b31
runtime.modtimer(0xb418cff40fe67fd0, 0xc00004a000, 0xc00004a000, 0x1, {0x0, 0x0}, 0x1a43301)
runtime/time.go:425 +0x4a7 fp=0xc000059f30 sp=0xc000059ee8 pc=0x458407
runtime.resettimer(...)
runtime/time.go:539
time.resetTimer(...)
runtime/time.go:229
runtime.scavengeSleep(0x1e8480)
runtime/mgcscavenge.go:237 +0x73 fp=0xc000059f80 sp=0xc000059f30 pc=0x423a53
runtime.bgscavenge()
runtime/mgcscavenge.go:366 +0x205 fp=0xc000059fe0 sp=0xc000059f80 pc=0x423cc5
runtime.goexit()
runtime/asm_amd64.s:1581 +0x1 fp=0xc000059fe8 sp=0xc000059fe0 pc=0x46c9c1
created by runtime.gcenable
runtime/mgc.go:182 +0x65
goroutine 1 [runnable]:
flag.sortFlags(0x0)
flag/flag.go:353 +0x51
flag.(*FlagSet).Visit(0xc000082d80, 0xc000461b48)
flag/flag.go:407 +0x25
github.com/urfave/cli/v2.normalizeFlags({0xc000082c80, 0x3, 0xc0000b2d80}, 0xc000327200)
github.com/urfave/cli/v2@v2.3.0/context.go:174 +0xec
github.com/urfave/cli/v2.(*Command).parseFlags(0xc0000b2d80, {0x1e12ab8, 0xc0002425b8}, 0xb0)
github.com/urfave/cli/v2@v2.3.0/command.go:194 +0x1d8
github.com/urfave/cli/v2.(*Command).Run(0xc0000b2d80, 0xc000082900)
github.com/urfave/cli/v2@v2.3.0/command.go:106 +0x1ef
github.com/urfave/cli/v2.(*App).RunContext(0xc0001a0680, {0x1df3d48, 0xc000122000}, {0xc000134000, 0x2, 0x2})
github.com/urfave/cli/v2@v2.3.0/app.go:313 +0x81e
github.com/urfave/cli/v2.(*App).Run(...)
github.com/urfave/cli/v2@v2.3.0/app.go:224
main.main()
sigs.k8s.io/security-profiles-operator/cmd/security-profiles-operator/main.go:188 +0xd70
goroutine 19 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x0)
k8s.io/klog/v2@v2.40.1/klog.go:1283 +0x6a
created by k8s.io/klog/v2.init.0
k8s.io/klog/v2@v2.40.1/klog.go:420 +0xfb
goroutine 20 [sleep]:
time.Sleep(0x6fc23ac00)
runtime/time.go:193 +0x12e
sigs.k8s.io/controller-runtime/pkg/log.init.0.func1()
sigs.k8s.io/controller-runtime@v0.11.0/pkg/log/log.go:63 +0x38
created by sigs.k8s.io/controller-runtime/pkg/log.init.0
sigs.k8s.io/controller-runtime@v0.11.0/pkg/log/log.go:62 +0x25

Environment:

Cloud provider or hardware configuration:k8s
OS (e.g: cat /etc/os-release):ubuntu 20.04
Kernel (e.g. uname -a):Linux version 5.13.0-39-generic (buildd@lcy02-amd64-080) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022
Others:

saschagrunert commented 2 years ago

This seems to be an issue with controller runtime :thinking: Hm, never seen this before.

neblen commented 2 years ago

I deploy two worker nodes.The log of the second node is as follows.

 "msg"="Set logging verbosity to 0"
I0418 05:30:48.045731 1 logr.go:252] "msg"="Profiling support enabled: false"
I0418 05:30:48.045731 1 logr.go:252] setup "msg"="starting component: spod" "buildDate"="1980-01-01T00:00:00Z" "compiler"="gc" "gitCommit"="unknown" "gitTreeState"="clean" "goVersion"="go1.17.6" "libbpf"="0.6.1" "libseccomp"="2.5.3" "platform"="linux/amd64" "version"="0.4.1"
E0418 05:32:32.045731 1 logr.go:270] "msg"="Failed to get API Group-Resources" "error"="Get \"https://10.43.0.1:443/api?timeout=32s\": dial tcp 10.43.0.1:443: i/o timeout"
E0418 05:32:32.045731 1 logr.go:270] setup "msg"="running security-profiles-operator" "error"="create manager: Get \"https://10.43.0.1:443/api?timeout=32s\": dial tcp 10.43.0.1:443: i/o timeout"

saschagrunert commented 2 years ago

Do you have cert manager deployed and is it healthy?

neblen commented 2 years ago

@saschagrunert Thank you for your reply.I have deployed cert manager and it runs correctly.The cert manager version is v1.6.1.

saschagrunert commented 2 years ago

Alright, so can this be closed now? :)

neblen commented 2 years ago

I’m sorry.The cert manager runs correctly,but spod can't run.

jhrozek commented 2 years ago

btw @bachradsusi was able to reproduce this issue locally. I don't know what causes it.

ccojocar commented 2 years ago

I am now also encountering this issue when running the e2e tests locally on kind:

I0907 12:15:04.002569       1 logr.go:261]  "msg"="Set logging verbosity to 0"
I0907 12:15:36.002569       1 logr.go:261]  "msg"="Profiling support enabled: false"
I0907 12:00:56.002568       1 logr.go:261] setup "msg"="starting component: spod"  "buildDate"="1980-01-01T00:00:00Z" "buildTags"="netgo,osusergo,seccomp,apparmor" "cgoldFlags"="-lseccomp -lelf -lz -lbpf" "compiler"="gc" "dependencies"="github.com/ReneKroon/ttlcache/v2 v2.11.0 ,github.com/acobaugh/osrelease v0.0.0-20181218015638-a93a0a55a249 ,github.com/aquasecurity/libbpfgo v0.2.5-libbpf-0.7.0 ,github.com/beorn7/perks v1.0.1 ,github.com/blang/semver v3.5.1+incompatible ,
github.com/cespare/xxhash/v2 v2.1.2 ,github.com/containers/common v0.47.5 ,github.com/cpuguy83/go-md2man/v2 v2.0.1 ,github.com/crossplane/crossplane-runtime v0.14.1-0.20210713194031-85b19c28ea88 ,github.com/davecgh/go-spew v1.1.1 ,github.com/evanphx/json-patch v4.12.0+incompatible ,github.com/fsnotify/fsnotify v1.5.1 ,github.com/go-logr/logr v1.2.3 ,github.com/gogo/protobuf v1.3.2 ,github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da ,github.com/golang/protobuf
 v1.5.2 ,github.com/google/go-cmp v0.5.6 ,github.com/google/gofuzz v1.2.0 ,github.com/google/uuid v1.3.0 ,github.com/googleapis/gnostic v0.5.5 ,github.com/imdario/mergo v0.3.12 ,github.com/jetstack/cert-manager v1.7.2 ,github.com/json-iterator/go v1.1.12 ,github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 ,github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd ,github.com/modern-go/reflect2 v1.0.2 ,github.com/nxadm/tail v1.4.
8 ,github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417 ,github.com/openshift/api v0.0.0-20220209124712-b632c5fc10c0 ,github.com/pjbgf/go-apparmor v0.0.7 ,github.com/pkg/errors v0.9.1 ,github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.56.0 ,github.com/prometheus/client_golang v1.12.1 ,github.com/prometheus/client_model v0.2.0 ,github.com/prometheus/common v0.32.1 ,github.com/prometheus/procfs v0.7.3 ,github.com/russross/black
friday/v2 v2.1.0 ,github.com/seccomp/libseccomp-golang v0.9.2-0.20210429002308-3879420cc921 ,github.com/sirupsen/logrus v1.8.1 ,github.com/spf13/afero v1.6.0 ,github.com/spf13/pflag v1.0.5 ,github.com/urfave/cli/v2 v2.4.0 ,golang.org/x/net v0.0.0-20220225172249-27dd8689420f ,golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 ,golang.org/x/sync v0.0.0-20210220032951-036812b2e83c ,golang.org/x/sys v0.0.0-20220315194320-039c03cc5b86 ,golang.org/x/term v0.0.0-202109272227
41-03fcf44c2211 ,golang.org/x/text v0.3.7 ,golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac ,gomodules.xyz/jsonpatch/v2 v2.2.0 ,google.golang.org/genproto v0.0.0-20220118154757-00ab72f36ad5 ,google.golang.org/grpc v1.45.0 ,google.golang.org/protobuf v1.28.0 ,gopkg.in/inf.v0 v0.9.1 ,gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 ,gopkg.in/yaml.v2 v2.4.0 ,gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b ,k8s.io/api v0.23.6 ,k8s.io/apiextensions-apiserver v0.23.5
 ,k8s.io/apimachinery v0.23.6 ,k8s.io/client-go v0.23.6 ,k8s.io/component-base v0.23.5 ,k8s.io/klog/v2 v2.60.1 ,k8s.io/kube-openapi v0.0.0-20211115234752-e816edb12b65 ,k8s.io/utils v0.0.0-20220210201930-3a6ce19ff2f9 ,sigs.k8s.io/controller-runtime v0.11.2 ,sigs.k8s.io/json v0.0.0-20211020170558-c049b76a60c6 ,sigs.k8s.io/release-utils v0.6.0 ,sigs.k8s.io/structured-merge-diff/v4 v4.2.1 ,sigs.k8s.io/yaml v1.3.0 " "gitCommit"="4b6bac26f9c0b151ea3183087f924a0ee6d9ed54" "gitCo
mmitDate"="2022-04-28T09:18:43Z" "gitTreeState"="dirty" "goVersion"="go1.18" "ldFlags"="-s -w -linkmode external -extldflags \"-static\" -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.buildDate=1980-01-01T00:00:00Z -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.version=0.4.3-dev" "libbpf"="v0.7" "libseccomp"="2.5.3" "platform"="linux/amd64" "version"="0.4.3-dev"
fatal error: timer when must be positive

goroutine 1 [running]:
runtime.throw({0x1b747fa?, 0xc000500000?})
        runtime/panic.go:992 +0x71 fp=0xc000272348 sp=0xc000272318 pc=0x43a1d1
runtime.addtimer(0x48ca01?)
        runtime/time.go:255 +0x12e fp=0xc000272390 sp=0xc000272348 pc=0x45928e
time.startTimer(0x46ae09?)
        runtime/time.go:213 +0x19 fp=0xc0002723a8 sp=0xc000272390 pc=0x469df9
time.NewTimer(0x1e46700?)
        time/sleep.go:96 +0xb7 fp=0xc0002723e8 sp=0xc0002723a8 pc=0x48cb97
net/http.setRequestCancel(0xc00018b400, {0x1e46700?, 0xc0005994a0}, {0xc000329aa0?, 0x10?, 0x2b85740?})
        net/http/client.go:394 +0x375 fp=0xc0002724c8 sp=0xc0002723e8 pc=0x689235
net/http.send(0xc00018b300, {0x1e46700, 0xc0005994a0}, {0x1aee740?, 0x48e901?, 0x2b85740?})
        net/http/client.go:250 +0x5a6 fp=0xc0002726a8 sp=0xc0002724c8 pc=0x6886e6
net/http.(*Client).send(0xc0000252c0, 0xc00018b300, {0x7fb1655323b0?, 0xf8?, 0x2b85740?})
        net/http/client.go:176 +0x9b fp=0xc000272720 sp=0xc0002726a8 pc=0x687fbb
net/http.(*Client).do(0xc0000252c0, 0xc00018b300)
        net/http/client.go:725 +0x8f5 fp=0xc000272920 sp=0xc000272720 pc=0x68a3f5
net/http.(*Client).Do(...)
        net/http/client.go:593
k8s.io/client-go/rest.(*Request).request(0xc00018b100, {0x1e5b9c8, 0xc000345c20}, 0x1?)
        k8s.io/client-go@v0.23.6/rest/request.go:980 +0x419 fp=0xc000272a20 sp=0xc000272920 pc=0x10c1b19
k8s.io/client-go/rest.(*Request).Do(0x2?, {0x1e5b990?, 0xc00011a008?})
        k8s.io/client-go@v0.23.6/rest/request.go:1038 +0xc7 fp=0xc000272ad0 sp=0xc000272a20 pc=0x10c2567
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroups(0xc0005994e0)
        k8s.io/client-go@v0.23.6/discovery/discovery_client.go:161 +0xae fp=0xc000272d38 sp=0xc000272ad0 pc=0x1232b0e
k8s.io/client-go/discovery.ServerGroupsAndResources({0x1e60dd8, 0xc0005994e0})
        k8s.io/client-go@v0.23.6/discovery/discovery_client.go:258 +0x42 fp=0xc000272ea8 sp=0xc000272d38 pc=0x12338c2
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources.func1()
        k8s.io/client-go@v0.23.6/discovery/discovery_client.go:223 +0x25 fp=0xc000272ec8 sp=0xc000272ea8 pc=0x12335c5
k8s.io/client-go/discovery.withRetries(0x2, 0xc000272ef0)
        k8s.io/client-go@v0.23.6/discovery/discovery_client.go:452 +0x72 fp=0xc000272ee0 sp=0xc000272ec8 pc=0x1235712
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources(0x0?)
        k8s.io/client-go@v0.23.6/discovery/discovery_client.go:222 +0x3a fp=0xc000272f10 sp=0xc000272ee0 pc=0x123357a
k8s.io/client-go/restmapper.GetAPIGroupResources({0x1e60dd8?, 0xc0005994e0?})
        k8s.io/client-go@v0.23.6/restmapper/discovery.go:148 +0x42 fp=0xc0002730f8 sp=0xc000272f10 pc=0x12383c2
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper.func1()
        sigs.k8s.io/controller-runtime@v0.11.2/pkg/client/apiutil/dynamicrestmapper.go:85 +0x25 fp=0xc000273120 sp=0xc0002730f8 pc=0x12399e5
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).setStaticMapper(...)
        sigs.k8s.io/controller-runtime@v0.11.2/pkg/client/apiutil/dynamicrestmapper.go:118
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper(0xc0000cefc0?, {0x0, 0x0, 0x766a0abb650a7301?})
        sigs.k8s.io/controller-runtime@v0.11.2/pkg/client/apiutil/dynamicrestmapper.go:98 +0x16b fp=0xc000273168 sp=0xc000273120 pc=0x123992b
sigs.k8s.io/controller-runtime/pkg/cluster.setOptionsDefaults.func1(0x0?)
        sigs.k8s.io/controller-runtime@v0.11.2/pkg/cluster/cluster.go:216 +0x25 fp=0xc000273198 sp=0xc000273168 pc=0x1486985
sigs.k8s.io/controller-runtime/pkg/cluster.New(0xc0002d7440, {0xc0002737f0, 0x1, 0x0?})
        sigs.k8s.io/controller-runtime@v0.11.2/pkg/cluster/cluster.go:158 +0x18d fp=0xc000273440 sp=0xc000273198 pc=0x14863ad
sigs.k8s.io/controller-runtime/pkg/manager.New(_, {0x0, 0x0, 0x2b0aaf0, {{0x1e5e610, 0xc00004a0c0}, 0x0}, 0x0, {0x0, 0x0}, ...})
        sigs.k8s.io/controller-runtime@v0.11.2/pkg/manager/manager.go:313 +0xdc fp=0xc000273868 sp=0xc000273440 pc=0x14b5ffc
main.runDaemon(0xc00015f6c0, 0xc00015f6c0?)
        sigs.k8s.io/security-profiles-operator/cmd/security-profiles-operator/main.go:401 +0x1e2 fp=0xc000273bf8 sp=0xc000273868 pc=0x1618a22
main.main.func3(0xc000229680?)
        sigs.k8s.io/security-profiles-operator/cmd/security-profiles-operator/main.go:131 +0x1d fp=0xc000273c18 sp=0xc000273bf8 pc=0x161745d
github.com/urfave/cli/v2.(*Command).Run(0xc000229680, 0xc00015f4c0)
        github.com/urfave/cli/v2@v2.4.0/command.go:163 +0x5bb fp=0xc000273d28 sp=0xc000273c18 pc=0xbfbe9b
github.com/urfave/cli/v2.(*App).RunContext(0xc00058aea0, {0x1e5b990?, 0xc00011a000}, {0xc00012c000, 0x2, 0x2})
        github.com/urfave/cli/v2@v2.4.0/app.go:313 +0xb48 fp=0xc000273eb8 sp=0xc000273d28 pc=0xbf9688
github.com/urfave/cli/v2.(*App).Run(...)
        github.com/urfave/cli/v2@v2.4.0/app.go:224
main.main()
        sigs.k8s.io/security-profiles-operator/cmd/security-profiles-operator/main.go:219 +0x107b fp=0xc000273f80 sp=0xc000273eb8 pc=0x16172bb
runtime.main()
        runtime/proc.go:250 +0x212 fp=0xc000273fe0 sp=0xc000273f80 pc=0x43c8f2
runtime.goexit()
        runtime/asm_amd64.s:1571 +0x1 fp=0xc000273fe8 sp=0xc000273fe0 pc=0x46cfc1

ccojocar commented 2 years ago

My assumption is that this might be related to the system timer. It might be caused by a desynchronisation but I am not totally sure.

jhrozek commented 2 years ago

I'm occasionally hitting the issue also with OCP builds:

fatal error: timer when must be positive

goroutine 19 [running]:
runtime.throw({0x1af70a3?, 0x2c6a980?})
    runtime/panic.go:992 +0x71 fp=0xc00006ae88 sp=0xc00006ae58 pc=0x4394b1
runtime.modtimer(0xb42b06ab9201df43?, 0xc000000000?, 0xc0003be000?, 0x424f00?, {0x0?, 0x0?}, 0x0?)
    runtime/time.go:426 +0x4a7 fp=0xc00006aed0 sp=0xc00006ae88 pc=0x459127
runtime.resettimer(...)
    runtime/time.go:540
time.resetTimer(...)
    runtime/time.go:230
runtime.scavengeSleep(0x4631da68f2e17000)
    runtime/mgcscavenge.go:244 +0x73 fp=0xc00006af20 sp=0xc00006aed0 pc=0x424893
runtime.bgscavenge(0x0?)
    runtime/mgcscavenge.go:394 +0x1af fp=0xc00006afc8 sp=0xc00006af20 pc=0x424aef
runtime.gcenable.func2()
    runtime/mgc.go:178 +0x26 fp=0xc00006afe0 sp=0xc00006afc8 pc=0x41c926
runtime.goexit()
    runtime/asm_amd64.s:1571 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x46c181
created by runtime.gcenable
    runtime/mgc.go:178 +0xaa

goroutine 1 [runnable]:
syscall.Syscall6(0x106, 0xffffffffffffff9c, 0xc0000463f0, 0xc000544108, 0x0, 0x0, 0x0)
    syscall/asm_linux_amd64.s:43 +0x5
syscall.fstatat(0xc000356000?, {0x7fff7a5bdac1?, 0xc000356000?}, 0xc000544108?, 0xc0000545c0?)
    syscall/zsyscall_linux_amd64.go:1450 +0x10f
syscall.Stat(...)
    syscall/syscall_linux_amd64.go:60
os.statNolog.func1(...)
    os/stat_unix.go:32
os.ignoringEINTR(...)
    os/file_posix.go:245
os.statNolog({0x7fff7a5bdac1, 0x29})
    os/stat_unix.go:31 +0x58
os.Stat({0x7fff7a5bdac1, 0x29})
    os/stat.go:13 +0x34
github.com/urfave/cli/v2.compileTime()
    github.com/urfave/cli/v2@v2.5.1/app.go:104 +0x33
github.com/urfave/cli/v2.NewApp()
    github.com/urfave/cli/v2@v2.5.1/app.go:121 +0x77
main.main()
    sigs.k8s.io/security-profiles-operator/cmd/security-profiles-operator/main.go:79 +0x2b

goroutine 21 [sleep]:
time.Sleep(0x6fc23ac00)
    runtime/time.go:194 +0x12e
sigs.k8s.io/controller-runtime/pkg/log.init.0.func1()
    sigs.k8s.io/controller-runtime@v0.11.2/pkg/log/log.go:63 +0x38
created by sigs.k8s.io/controller-runtime/pkg/log.init.0
    sigs.k8s.io/controller-runtime@v0.11.2/pkg/log/log.go:62 +0x25

We've seen this today on a 6-node cluster, the interesting thing was that only one of six nodes was showing the issue. I didn't find anything different on that node compared to other nodes (time was in sync and set to UTC on all nodes for example)

jhrozek commented 2 years ago

I really wonder if this can be an issue with the Go version we are currently using? Because all the frames in the stack trace of the go routine that is crashing seem to be pointing to the go stdlib:

runtime.throw({0x1af70a3?, 0x2c6a980?})
    runtime/panic.go:992 +0x71 fp=0xc00006ae88 sp=0xc00006ae58 pc=0x4394b1
runtime.modtimer(0xb42b06ab9201df43?, 0xc000000000?, 0xc0003be000?, 0x424f00?, {0x0?, 0x0?}, 0x0?)
    runtime/time.go:426 +0x4a7 fp=0xc00006aed0 sp=0xc00006ae88 pc=0x459127
runtime.resettimer(...)
    runtime/time.go:540
time.resetTimer(...)
    runtime/time.go:230
runtime.scavengeSleep(0x4631da68f2e17000)
    runtime/mgcscavenge.go:244 +0x73 fp=0xc00006af20 sp=0xc00006aed0 pc=0x424893
runtime.bgscavenge(0x0?)
    runtime/mgcscavenge.go:394 +0x1af fp=0xc00006afc8 sp=0xc00006af20 pc=0x424aef
runtime.gcenable.func2()
    runtime/mgc.go:178 +0x26 fp=0xc00006afe0 sp=0xc00006afc8 pc=0x41c926
runtime.goexit()
    runtime/asm_amd64.s:1571 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x46c181
created by runtime.gcenable
    runtime/mgc.go:178 +0xaa

OTOH, the only somewhat related issue I found is https://github.com/golang/go/issues/51341 but that is 1) BSD and 2) doesn't have the "timer when must be positive" issue, but EFAULT instead. If this was really a golang issue, I would expect many more bug reports and the bug being found much earlier...

B3ns44d commented 2 years ago

I'm currently facing the same issue as well.

+ kubectl logs spod-5bjlf
Defaulted container "security-profiles-operator" out of: security-profiles-operator, metrics, non-root-enabler (init)
fatal error: timer when must be positive

goroutine 4 [running]:
runtime.throw({0x1c412ee?, 0x2cf2e00?})
    runtime/panic.go:992 +0x71 fp=0xc00005fe88 sp=0xc00005fe58 pc=0x43a031
runtime.modtimer(0x10752ed57ccfe0?, 0xc000400000?, 0xc000600000?, 0x425a80?, {0x0?, 0x0?}, 0x0?)
    runtime/time.go:426 +0x4a7 fp=0xc00005fed0 sp=0xc00005fe88 pc=0x459e27
runtime.resettimer(...)
    runtime/time.go:540
time.resetTimer(...)
    runtime/time.go:230
runtime.scavengeSleep(0x8000000000000000)
    runtime/mgcscavenge.go:244 +0x73 fp=0xc00005ff20 sp=0xc00005fed0 pc=0x425413
runtime.bgscavenge(0x0?)
    runtime/mgcscavenge.go:394 +0x1af fp=0xc00005ffc8 sp=0xc00005ff20 pc=0x42566f
runtime.gcenable.func2()
    runtime/mgc.go:178 +0x26 fp=0xc00005ffe0 sp=0xc00005ffc8 pc=0x41d4a6
runtime.goexit()
    runtime/asm_amd64.s:1571 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x46ce81
created by runtime.gcenable
    runtime/mgc.go:178 +0xaa

goroutine 1 [runnable, locked to thread]:
github.com/prometheus/client_golang/prometheus.(*Registry).Register(0xc0005003c0, {0x1f29828?, 0xc00018c090?})
    github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:356 +0xa1a
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0x0?, {0xc000420580?, 0x1, 0xc0005c6cf0?})
    github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:402 +0x53
github.com/prometheus/client_golang/prometheus.MustRegister(...)
    github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:178
github.com/prometheus/client_golang/prometheus.init.0()
    github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:63 +0xf5

goroutine 6 [sleep, 91230067 minutes]:
time.Sleep(0x6fc23ac00)
    runtime/time.go:194 +0x12e
sigs.k8s.io/controller-runtime/pkg/log.init.0.func1()
    sigs.k8s.io/controller-runtime@v0.12.2/pkg/log/log.go:63 +0x38
created by sigs.k8s.io/controller-runtime/pkg/log.init.0
    sigs.k8s.io/controller-runtime@v0.12.2/pkg/log/log.go:62 +0x25

1017

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Pulling    2m                   kubelet            Pulling image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest"
  Normal   Scheduled  2m                   default-scheduler  Successfully assigned security-profiles-operator/spod-sb76b to minikube
  Normal   Pulled     118s                 kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 1.792761921s
  Normal   Created    117s                 kubelet            Created container non-root-enabler
  Normal   Started    117s                 kubelet            Started container non-root-enabler
  Normal   Pulled     114s                 kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 1.445520245s
  Normal   Started    114s                 kubelet            Started container security-profiles-operator
  Normal   Pulled     114s                 kubelet            Container image "quay.io/brancz/kube-rbac-proxy:v0.11.0" already present on machine
  Normal   Started    113s                 kubelet            Started container metrics
  Normal   Created    113s                 kubelet            Created container metrics
  Normal   Killing    84s                  kubelet            Container security-profiles-operator failed startup probe, will be restarted
  Normal   Pulling    84s (x2 over 116s)   kubelet            Pulling image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest"
  Warning  Unhealthy  84s (x10 over 111s)  kubelet            Startup probe failed: Get "http://10.244.0.27:8085/healthz": dial tcp 10.244.0.27:8085: connect: connection refused
  Normal   Created    82s (x2 over 114s)   kubelet            Created container security-profiles-operator
  Normal   Pulled     82s                  kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 1.931630015s

B3ns44d commented 2 years ago

I think that the fatal error: timer when must be positive is related to the addtimer func

// addtimer adds a timer to the current P.
// This should only be called with a newly created timer.
// That avoids the risk of changing the when field of a timer in some P's heap,
// which could cause the heap to become unsorted.
func addtimer(t *timer) {
    // when must be positive. A negative value will cause runtimer to
    // overflow during its delta calculation and never expire other runtime
    // timers. Zero will cause checkTimers to fail to notice the timer.
    if t.when <= 0 {
        throw("timer when must be positive")
    }
    if t.period < 0 {
        throw("timer period must be non-negative")
    }
    if t.status != timerNoStatus {
        throw("addtimer called with initialized timer")
    }
    t.status = timerWaiting

    when := t.when

    // Disable preemption while using pp to avoid changing another P's heap.
    mp := acquirem()

    // Get current P
    pp := getg().m.p.ptr()
    lock(&pp.timersLock)
    // Clear the timer in the header of P's timer list
    cleantimers(pp)
    // add timer to the min heap of P
    doaddtimer(pp, t)
    unlock(&pp.timersLock)
    // wake up sleeping threads in netpoller
    wakeNetPoller(when)

    releasem(mp)
}

addtimer checks the time when the timer is woken up and that the status must be the newly initialized timer.
cleantimers are then called to clean up the head node of the corresponding timer list in P after the lock has been added, and doaddtimer is called to add the timer to the smallest heap of P and release the lock.
wakeNetPoller is called to wake up a dormant thread in the netpoller.

addtimer func: https://github.com/golang/go/blob/master/src/runtime/time.go#L256
modtimer func: https://github.com/golang/go/blob/master/src/runtime/time.go#L430

references:

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jhrozek commented 2 years ago

/remove-lifecycle stale we have a similar bug open against OCP at the moment and while we couldn't reproduce it lately, I'd prefer to keep this issue open for one more "stale/rotten" cycle before closing.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/security-profiles-operator/issues/891#issuecomment-1454788252): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

huang195 commented 1 year ago

Did anyone have a workaround to this problem? I'm running into exactly the same problem with the error message: "fatal error: timer when must be positive" when installing using Helm. I'm running a kind cluster. All the pods in the cluster are running except the spod.

CoreyCook8 commented 9 months ago

/reopen

k8s-ci-robot commented 9 months ago

@CoreyCook8: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes-sigs/security-profiles-operator/issues/891#issuecomment-1939102119): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/security-profiles-operator/issues/891#issuecomment-1996902401): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / security-profiles-operator

The spod daemonset can't work,the log is that fatal error: timer when must be positive. #891

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

1017