openshift / file-integrity-operator

Operator providing OpenShift cluster node file integrity checking
Apache License 2.0
31 stars 27 forks source link

End to end tests broke including the namespace in alerts #257

Open rhmdnd opened 2 years ago

rhmdnd commented 2 years ago

We recently merged support for including the namespace in the NodeHasIntegrityFailure alert [0].

This helps understand where the alert is coming from, but we have some assertions in the end-to-end tests that appear to fail with this new format [1].

Opening this issue to track the work to get e2e tests running again.

[0] https://github.com/openshift/file-integrity-operator/commit/af58faa27382412cefe18435baf4de0b236c40f0 [1] https://github.com/openshift/file-integrity-operator/blob/master/tests/e2e/e2e_test.go#L56

mrogers950 commented 2 years ago

@rhmdnd I haven't seen this in CI, was it when running locally?

rhmdnd commented 2 years ago

I saw it in CI here:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_file-integrity-operator/256/pull-ci-openshift-file-integrity-operator-master-e2e-aws/1544768695480356864#1:build-log.txt%3A1320

Pasting the actual output so it's persisted with the issue:

 --- PASS: TestFileIntegrityConfigurationRevert (252.27s)
=== RUN   TestFileIntegrityConfigurationStatus
I0706 20:42:19.724622   12125 request.go:665] Waited for 1.094093155s due to client-side throttling, not priority and fairness, request: GET:https://api.ci-op-r83l2b9c-eb8a9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1?timeout=32s
I0706 20:42:29.924350   12125 request.go:665] Waited for 11.293718737s due to client-side throttling, not priority and fairness, request: GET:https://api.ci-op-r83l2b9c-eb8a9.origin-ci-int-aws.dev.rhcloud.com:6443/apis/discovery.k8s.io/v1?timeout=32s
I0706 20:42:40.123236   12125 request.go:665] Waited for 8.735896089s due to client-side throttling, not priority and fairness, request: GET:https://api.ci-op-r83l2b9c-eb8a9.origin-ci-int-aws.dev.rhcloud.com:6443/apis/discovery.k8s.io/v1beta1?timeout=32s
    client.go:47: resource type ServiceAccount with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-daemon) created
    client.go:47: resource type ServiceAccount with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-operator) created
    client.go:47: resource type Role with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-daemon) created
    client.go:47: resource type Role with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-operator) created
    client.go:47: resource type Role with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/leader-election-role) created
    client.go:47: resource type ClusterRole with namespace/name (/file-integrity-operator) created
    client.go:47: resource type ClusterRole with namespace/name (/file-integrity-operator-metrics) created
    client.go:47: resource type ClusterRole with namespace/name (/fileintegrity-editor-role) created
    client.go:47: resource type ClusterRole with namespace/name (/fileintegrity-viewer-role) created
    client.go:47: resource type RoleBinding with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-daemon) created
    client.go:47: resource type RoleBinding with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-operator) created
    client.go:47: resource type RoleBinding with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/leader-election-rolebinding) created
    client.go:47: resource type RoleBinding with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/prometheus-k8s) created
    client.go:47: resource type ClusterRoleBinding with namespace/name (/file-integrity-operator) created
    client.go:47: resource type ClusterRoleBinding with namespace/name (/file-integrity-operator-metrics) created
    client.go:47: resource type Deployment with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/file-integrity-operator) created
    helpers.go:272: Initialized cluster resources
    wait_util.go:59: Deployment available (1/1)
    client.go:47: resource type  with namespace/name (osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/e2e-test-configstatus) created
    helpers.go:362: Created FileIntegrity: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:e2e-test-configstatus GenerateName: Namespace:osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704 SelfLink: UID:9d1c1f88-5d21-4425-8ff1-9ed7cb8618f8 ResourceVersion:38637 Generation:1 CreationTimestamp:2022-07-06 20:42:53 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[{Manager:e2e.test Operation:Update APIVersion:fileintegrity.openshift.io/v1alpha1 Time:2022-07-06 20:42:53 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:spec":{".":{},"f:config":{".":{},"f:gracePeriod":{},"f:maxBackups":{}},"f:debug":{},"f:nodeSelector":{".":{},"f:node-role.kubernetes.io/worker":{}},"f:tolerations":{}}} Subresource:}]} Spec:{NodeSelector:map[node-role.kubernetes.io/worker:] Config:{Name: Namespace: Key: GracePeriod:20 MaxBackups:5} Debug:true Tolerations:[{Key:node-role.kubernetes.io/master Operator:Exists Value: Effect:NoSchedule TolerationSeconds:<nil>}]} Status:{Phase:}}
    helpers.go:839: Got (Active) result #1 out of 0 needed.
    helpers.go:850: FileIntegrity ready (Active)
    helpers.go:398: FileIntegrity deployed successfully
    helpers.go:899: Found FileIntegrityStatus event: Active
    helpers.go:839: Got (Active) result #1 out of 0 needed.
    helpers.go:850: FileIntegrity ready (Active)
    helpers.go:899: Found FileIntegrityStatus event: Initializing
    helpers.go:1606: error getting output exit status 7
    helpers.go:1581: metrics output:
        Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "metrics-test" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "metrics-test" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "metrics-test" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "metrics-test" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
        If you don't see a command prompt, try pressing enter.
        pod "metrics-test" deleted
        pod osdk-e2e-710d083a-14f6-4632-848b-aaee5fe71704/metrics-test terminated (Error)

    helpers.go:1629: 0
    helpers.go:1629: 1
    helpers.go:1629: 2
    helpers.go:1629: 3
    helpers.go:1629: 4
    e2e_test.go:449: unexpected metrics value
    helpers.go:1567: wrote logs for file-integrity-operator-65975b7b67-v79p6/self
time="2022-07-06T20:43:54Z" level=info msg="Skipping cleanup function since --skip-cleanup-error is true"
--- FAIL: TestFileIntegrityConfigurationStatus (96.24s)
=== RUN   TestFileIntegrityConfigurationIgnoreMissing
I0706 20:43:55.964928   12125 request.go:665] Waited for 1.029263949s due to client-side throttling, not priority and fairness, request: GET:https://api.ci-op-r83l2b9c-eb8a9.origin-ci-int-aws.dev.rhcloud.com:6443/apis/image.openshift.io/v1?timeout=32s
I0706 20:44:05.964958   12125 request.go:665] Waited for 11.028919399s due to client-side throttling, not priority and fairness, request: GET:https://api.ci-op-r83l2b9c-eb8a9.origin-ci-int-aws.dev.rhcloud.com:6443/apis/security.openshift.io/v1?timeout=32s
I0706 20:44:16.164959   12125 request.go:665] Waited for 8.536386266s due to client-side throttling, not priority and fairness, request: GET:https://api.ci-op-r83l2b9c-eb8a9.origin-ci-int-aws.dev.rhcloud.com:6443/apis/discovery.k8s.io/v1beta1?timeout=32s
    client.go:47: resource type ServiceAccount with namespace/name (osdk-e2e-c8457e75-55c1-4fef-b8a2-858fa90f5ad6/file-integrity-daemon) created
    client.go:47: resource type ServiceAccount with namespace/name (osdk-e2e-c8457e75-55c1-4fef-b8a2-858fa90f5ad6/file-integrity-operator) created
    client.go:47: resource type Role with namespace/name (osdk-e2e-c8457e75-55c1-4fef-b8a2-858fa90f5ad6/file-integrity-daemon) created
    client.go:47: resource type Role with namespace/name (osdk-e2e-c8457e75-55c1-4fef-b8a2-858fa90f5ad6/file-integrity-operator) created
    client.go:47: resource type Role with namespace/name (osdk-e2e-c8457e75-55c1-4fef-b8a2-858fa90f5ad6/leader-election-role) created
    helpers.go:264: failed to initialize cluster resources: clusterroles.rbac.authorization.k8s.io "file-integrity-operator" already exists
--- FAIL: TestFileIntegrityConfigurationIgnoreMissing (29.81s) 
mrogers950 commented 2 years ago

Looks like the metrics failed due to the serving-cert not being available at startup, which is weird because we detect that much earlier and restart.

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_file-integrity-operator/256/pull-ci-openshift-file-integrity-operator-master-e2e-aws/1544768695480356864/artifacts/e2e-aws/test/artifacts/e2e-test-configstatus_file-integrity-operator-65975b7b67-v79p6_self.log

{"level":"error","ts":1657140177.7594256,"logger":"metrics","msg":"Metrics service failed","error":"open /var/run/secrets/serving-cert/tls.crt: no such file or directory"}

Let's keep this issue open for now in case it pops up again.

mrogers950 commented 2 years ago

/lifecycle frozen