openshift / file-integrity-operator

Operator providing OpenShift cluster node file integrity checking
Apache License 2.0
31 stars 27 forks source link

OCPBUGS-18933: Fix reint-on-failed issues #436

Closed Vincent056 closed 5 months ago

Vincent056 commented 1 year ago

This changes prevent all node reinit when reint-on-failed is being used.

OCPBUGS-18933

openshift-ci-robot commented 1 year ago

@Vincent056: This pull request references Jira Issue OCPBUGS-18933, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.15.0) matches configured target version for branch (4.15.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @xiaojiey

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/file-integrity-operator/pull/436): >This changes prevent all node reinit when reint-on-failed is being used. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 1 year ago

@Vincent056: This pull request references Jira Issue OCPBUGS-18933, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.15.0) matches configured target version for branch (4.15.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @xiaojiey

In response to [this](https://github.com/openshift/file-integrity-operator/pull/436): >This changes prevent all node reinit when reint-on-failed is being used. > >[OCPBUGS-18933](https://issues.redhat.com/browse/OCPBUGS-18933) Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/file-integrity-operator/blob/master/OWNERS)~~ [Vincent056] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Vincent056 commented 1 year ago

/retest

xiaojiey commented 1 year ago

/hold for test

Vincent056 commented 1 year ago

failed to get cluster, /retest

BhargaviGudi commented 1 year ago

Verification passed on 4.14.0-0.nightly-2023-09-26-124507 + file-integrity-operator from PR #436

  1. Install FIO
  2. Create fileintegrity
  3. Trigger fileintegrity failures on two nodes
    $ oc get fileintegritynodestatuses.fileintegrity.openshift.io 
    NAME                                                              NODE                                        STATUS
    example-fileintegrity-ip-10-0-27-104.us-east-2.compute.internal   ip-10-0-27-104.us-east-2.compute.internal   Succeeded
    example-fileintegrity-ip-10-0-44-20.us-east-2.compute.internal    ip-10-0-44-20.us-east-2.compute.internal    Failed
    example-fileintegrity-ip-10-0-54-131.us-east-2.compute.internal   ip-10-0-54-131.us-east-2.compute.internal   Failed
    example-fileintegrity-ip-10-0-80-201.us-east-2.compute.internal   ip-10-0-80-201.us-east-2.compute.internal   Succeeded
    example-fileintegrity-ip-10-0-83-23.us-east-2.compute.internal    ip-10-0-83-23.us-east-2.compute.internal    Succeeded
    example-fileintegrity-ip-10-0-9-203.us-east-2.compute.internal    ip-10-0-9-203.us-east-2.compute.internal    Succeeded
  4. oc annotate fileintegrities/ file-integrity.openshift.io/re-init-on-failed=
    $ oc annotate fileintegrity example-fileintegrity file-integrity.openshift.io/re-init-on-failed=
    fileintegrity.fileintegrity.openshift.io/example-fileintegrity annotated

    fileintegritynodestatus for failed nodes are marked are Succeeded and db is re-inted only on failed nodes

    
    $ oc get fileintegritynodestatuses.fileintegrity.openshift.io 
    NAME                                                              NODE                                        STATUS
    example-fileintegrity-ip-10-0-27-104.us-east-2.compute.internal   ip-10-0-27-104.us-east-2.compute.internal   Succeeded
    example-fileintegrity-ip-10-0-44-20.us-east-2.compute.internal    ip-10-0-44-20.us-east-2.compute.internal    Succeeded
    example-fileintegrity-ip-10-0-54-131.us-east-2.compute.internal   ip-10-0-54-131.us-east-2.compute.internal   Succeeded
    example-fileintegrity-ip-10-0-80-201.us-east-2.compute.internal   ip-10-0-80-201.us-east-2.compute.internal   Succeeded
    example-fileintegrity-ip-10-0-83-23.us-east-2.compute.internal    ip-10-0-83-23.us-east-2.compute.internal    Succeeded
    example-fileintegrity-ip-10-0-9-203.us-east-2.compute.internal    ip-10-0-9-203.us-east-2.compute.internal    Succeeded
    $ for node in `oc get node --no-headers | awk '{print $1}'`; do oc debug node/$node -- chroot /host ls -ltr /etc/kubernetes/; done
    Starting pod/ip-10-0-27-104us-east-2computeinternal-debug ...
    To use host binaries, run `chroot /host`
    total 3640
    -rw-------.  1 root root    9249 Sep 29 05:17 kubeconfig
    drwxr-xr-x.  3 root root      19 Sep 29 05:18 cni
    drwxr-xr-x.  3 root root      20 Sep 29 05:18 kubelet-plugins
    -rw-r--r--.  1 root root    8332 Sep 29 05:28 kubelet-ca.crt
    drwxr-xr-x. 16 root root    4096 Sep 29 05:34 static-pod-resources
    drwxr-xr-x.  2 root root     129 Sep 29 05:34 manifests
    -rw-r--r--.  1 root root     107 Sep 29 05:42 apiserver-url.env
    -rw-r--r--.  1 root root    1123 Sep 29 05:42 ca.crt
    -rw-r--r--.  1 root root       0 Sep 29 05:42 cloud.conf
    -rw-r--r--.  1 root root    2906 Sep 29 05:42 kubelet.conf
    -rw-------.  1 root root 1835684 Sep 29 07:20 aide.db.gz.new
    -rw-------.  1 root root 1835684 Sep 29 07:20 aide.db.gz
    -rw-------.  1 root root     777 Sep 29 08:18 aide.log.new
    -rw-------.  1 root root     777 Sep 29 08:18 aide.log

Removing debug pod ... Starting pod/ip-10-0-44-20us-east-2computeinternal-debug ... To use host binaries, run chroot /host total 7224 -rw-------. 1 root root 6050 Sep 29 05:22 kubeconfig drwxr-xr-x. 3 root root 19 Sep 29 05:23 cni drwxr-xr-x. 3 root root 20 Sep 29 05:23 kubelet-plugins drwxr-xr-x. 2 root root 6 Sep 29 05:23 manifests drwxr-xr-x. 3 root root 24 Sep 29 05:24 static-pod-resources -rw-r--r--. 1 root root 8332 Sep 29 05:28 kubelet-ca.crt -rw-r--r--. 1 root root 1123 Sep 29 05:42 ca.crt -rw-r--r--. 1 root root 0 Sep 29 05:42 cloud.conf -rw-r--r--. 1 root root 2906 Sep 29 05:42 kubelet.conf -rw-------. 1 root root 1004 Sep 29 07:28 aide.log.backup-20230929T07_28_05 -rw-------. 1 root root 1835388 Sep 29 07:28 aide.db.gz.backup-20230929T07_28_05 -rw-------. 1 root root 1835409 Sep 29 08:16 aide.db.gz.backup-20230929T08_16_44 -rw-------. 1 root root 1010 Sep 29 08:16 aide.log.backup-20230929T08_16_44 -rw-------. 1 root root 1835494 Sep 29 08:17 aide.db.gz.new -rw-------. 1 root root 1835494 Sep 29 08:17 aide.db.gz -rw-------. 1 root root 777 Sep 29 08:17 aide.log -rw-------. 1 root root 0 Sep 29 08:17 aide.log.new

Removing debug pod ... Starting pod/ip-10-0-54-131us-east-2computeinternal-debug ... To use host binaries, run chroot /host total 7236 -rw-------. 1 root root 9249 Sep 29 05:16 kubeconfig drwxr-xr-x. 3 root root 19 Sep 29 05:17 cni drwxr-xr-x. 3 root root 20 Sep 29 05:17 kubelet-plugins -rw-r--r--. 1 root root 8332 Sep 29 05:28 kubelet-ca.crt drwxr-xr-x. 17 root root 4096 Sep 29 05:35 static-pod-resources drwxr-xr-x. 2 root root 129 Sep 29 05:35 manifests -rw-r--r--. 1 root root 107 Sep 29 05:42 apiserver-url.env -rw-r--r--. 1 root root 1123 Sep 29 05:42 ca.crt -rw-r--r--. 1 root root 0 Sep 29 05:42 cloud.conf -rw-r--r--. 1 root root 2906 Sep 29 05:42 kubelet.conf -rw-------. 1 root root 1835801 Sep 29 07:27 aide.db.gz.backup-20230929T07_27_45 -rw-------. 1 root root 1004 Sep 29 07:27 aide.log.backup-20230929T07_27_45 -rw-------. 1 root root 1835817 Sep 29 08:16 aide.db.gz.backup-20230929T08_16_53 -rw-------. 1 root root 1010 Sep 29 08:16 aide.log.backup-20230929T08_16_53 -rw-------. 1 root root 1835895 Sep 29 08:17 aide.db.gz.new -rw-------. 1 root root 1835895 Sep 29 08:17 aide.db.gz -rw-------. 1 root root 777 Sep 29 08:17 aide.log -rw-------. 1 root root 0 Sep 29 08:18 aide.log.new

Removing debug pod ... Starting pod/ip-10-0-80-201us-east-2computeinternal-debug ... To use host binaries, run chroot /host total 3636 -rw-------. 1 root root 9249 Sep 29 05:15 kubeconfig drwxr-xr-x. 3 root root 19 Sep 29 05:16 cni drwxr-xr-x. 3 root root 20 Sep 29 05:16 kubelet-plugins -rw-r--r--. 1 root root 8332 Sep 29 05:28 kubelet-ca.crt drwxr-xr-x. 16 root root 4096 Sep 29 05:37 static-pod-resources drwxr-xr-x. 2 root root 129 Sep 29 05:37 manifests -rw-r--r--. 1 root root 107 Sep 29 05:43 apiserver-url.env -rw-r--r--. 1 root root 1123 Sep 29 05:43 ca.crt -rw-r--r--. 1 root root 2906 Sep 29 05:43 kubelet.conf -rw-r--r--. 1 root root 0 Sep 29 05:43 cloud.conf -rw-------. 1 root root 1835736 Sep 29 07:20 aide.db.gz.new -rw-------. 1 root root 1835736 Sep 29 07:20 aide.db.gz -rw-------. 1 root root 777 Sep 29 08:18 aide.log -rw-------. 1 root root 0 Sep 29 08:18 aide.log.new

Removing debug pod ... Starting pod/ip-10-0-83-23us-east-2computeinternal-debug ... To use host binaries, run chroot /host total 3628 -rw-------. 1 root root 6050 Sep 29 05:25 kubeconfig drwxr-xr-x. 3 root root 19 Sep 29 05:25 cni drwxr-xr-x. 3 root root 20 Sep 29 05:26 kubelet-plugins drwxr-xr-x. 2 root root 6 Sep 29 05:26 manifests drwxr-xr-x. 3 root root 24 Sep 29 05:28 static-pod-resources -rw-r--r--. 1 root root 8332 Sep 29 05:28 kubelet-ca.crt -rw-r--r--. 1 root root 1123 Sep 29 05:43 ca.crt -rw-r--r--. 1 root root 2906 Sep 29 05:43 kubelet.conf -rw-r--r--. 1 root root 0 Sep 29 05:43 cloud.conf -rw-------. 1 root root 67 Sep 29 07:19 aide.log.backup-20230929T07_19_55 -rw-------. 1 root root 1835396 Sep 29 07:20 aide.db.gz.new -rw-------. 1 root root 1835396 Sep 29 07:20 aide.db.gz -rw-------. 1 root root 777 Sep 29 08:17 aide.log -rw-------. 1 root root 0 Sep 29 08:18 aide.log.new

Removing debug pod ... Starting pod/ip-10-0-9-203us-east-2computeinternal-debug ... To use host binaries, run chroot /host total 3628 -rw-------. 1 root root 6050 Sep 29 05:22 kubeconfig drwxr-xr-x. 3 root root 19 Sep 29 05:23 cni drwxr-xr-x. 3 root root 20 Sep 29 05:23 kubelet-plugins drwxr-xr-x. 2 root root 6 Sep 29 05:23 manifests drwxr-xr-x. 3 root root 24 Sep 29 05:25 static-pod-resources -rw-r--r--. 1 root root 8332 Sep 29 05:28 kubelet-ca.crt -rw-r--r--. 1 root root 1123 Sep 29 05:42 ca.crt -rw-r--r--. 1 root root 0 Sep 29 05:42 cloud.conf -rw-r--r--. 1 root root 2906 Sep 29 05:42 kubelet.conf -rw-------. 1 root root 1835434 Sep 29 07:20 aide.db.gz.new -rw-------. 1 root root 1835434 Sep 29 07:20 aide.db.gz -rw-------. 1 root root 777 Sep 29 08:18 aide.log.new -rw-------. 1 root root 777 Sep 29 08:18 aide.log

Removing debug pod ...

BhargaviGudi commented 1 year ago

/unhold

BhargaviGudi commented 1 year ago

/label qe-approved

Vincent056 commented 1 year ago

/retest

Vincent056 commented 1 year ago

/retest

openshift-ci[bot] commented 1 year ago

@Vincent056: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 10 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 9 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 5 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci-robot commented 5 months ago

@Vincent056: Jira Issue OCPBUGS-18933: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-18933 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/file-integrity-operator/pull/436): >This changes prevent all node reinit when reint-on-failed is being used. > >[OCPBUGS-18933](https://issues.redhat.com/browse/OCPBUGS-18933) Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Ffile-integrity-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.