openshift-kni / lifecycle-agent

Local agent for orchestration of SNO Image Based Upgrade
Apache License 2.0
6 stars 26 forks source link

CNF-12656: Handle SIGTERM for stateroot setup job #498

Closed pixelsoccupied closed 1 month ago

pixelsoccupied commented 1 month ago

This PR aims to handle the case when Stateroot setup job may be in-progress and an unexpected SIGTERM is received (e.g moving to Idle while Prep in progress).

The solution is broken into to two parts:

k8s:

Handler in the code:

/cc @donpenney @jc-rh

openshift-ci-robot commented 1 month ago

@pixelsoccupied: This pull request references CNF-12656 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift-kni/lifecycle-agent/pull/498): >This PR aims to handle the case when Stateroot setup job may be in-progress and an unexpected SIGTERM is received (e.g moving to Idle while Prep in progress). > >The solution is broken into to two parts: > >k8s: >- The max wait time before a SIGKILL is sent is determined with TerminationGracePeriodSeconds. This is now set to 60 secs (default 30) >- PreStop lifecycle handler was investigated but not useful in our case. Generally it's used to delay the container from getting SIGTERM and in the meantime k8s does other housekeeping tasks (e.g de-registering Ingress) to avoid any race conditions. > >Handler in the code: >- Every app is expected to have it's own signal handler. In this case are simply canceling context and sleep at most time set with TerminationGracePeriodSeconds before shutting down. > >/cc @donpenney @jc-rh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift-kni%2Flifecycle-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 1 month ago

@pixelsoccupied: This pull request references CNF-12656 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift-kni/lifecycle-agent/pull/498): >This PR aims to handle the case when Stateroot setup job may be in-progress and an unexpected SIGTERM is received (e.g moving to Idle while Prep in progress). > >The solution is broken into to two parts: > >k8s: >- The max wait time before a SIGKILL is sent is determined with TerminationGracePeriodSeconds. This is now set to 60 secs (default 30) >- PreStop lifecycle handler was investigated but not useful in our case. Generally it's used to delay the container from getting SIGTERM and in the meantime k8s does other housekeeping tasks (e.g de-registering Ingress) to avoid any race conditions. > >Handler in the code: >- Every app is expected to have it's own signal handler. In this case if seed image is present we allow the rest of the stateroot setup to go through by sleeping at most time set with TerminationGracePeriodSeconds before shutting down. > >/cc @donpenney @jc-rh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift-kni%2Flifecycle-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
pixelsoccupied commented 1 month ago

/hold will update the graceperiod to 30 mins + document this behaviour

pixelsoccupied commented 1 month ago

/unhold

pixelsoccupied commented 1 month ago

/retest

pixelsoccupied commented 1 month ago

/test ibu-e2e-flow

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jc-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift-kni/lifecycle-agent/blob/main/OWNERS)~~ [jc-rh] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment