Closed ka7harada closed 6 months ago
@derekpoindexter As record let me post case# here also
CS3920827 is the specific one of worker node not ready issue due to none-disruptive category maintenance.
CS3918206 is the case when customer face unexpectedly long worker node not ready issue caused by live migration that is also falls into none disruptive and notification less maintenance
These 2 cases are raised from a customer who faced more than 10 times workernodenot ready in 2024/APR and May
No changes are going to be made to the documentation at this time. An issue has been opened with development to investigate the reasons behind the behavior.
@kKronstainBrown Both Cases are closed. Especially for CS3920827, can you work w/ Dev again to add the disclaimer to IBM Cloud docs? IKS user has WN not ready issue frequently while non-disruptive category master upgrade maintenance.
Responded in Slack that no changes are going to be made to the documentation about this issue at this time. Development also reported no update.
Request Please add disclaimer for short disruptions of ROKS in docs
Urgency As soon as possible Customer raised many support cases for asking the cause of brief disruption on worker node in last 2 months.
Reason: 1.Impact that customer needs to accept for worker nodes as design is not described in DOCS but support case request customer to accept.
When nondisruptive live migration occurs, the virtual server experiences a brief pause of around 10 seconds, and in some cases up to 30 seconds. You are not notified in advance of nondisruptive migration. The virtual server instance is not restarted as part of this process.
There are mismatch guidance among ACS, docs and each service. To resolve this, the fix needs to be done soon. Please confirm the wording written in "where to change " > "Expected" section w/ IKS dev team if needed.
Where to change Location: IBM Cloud Docs > Red Hat OpenShift on IBM Cloud > Troubleshooting worker nodes in Critical or NotReady state
Current
Expected Update note of “Important” section as below.
Important: Check the IBM Cloud health and status dashboard for any notifications or maintenance updates that might be relevant to your worker nodes. These notifications or updates might help determine the cause of the worker node failures.
When nondisruptive maintenance occurs, worker node experiences a brief pause up to around 60 seconds as same as virtual servers. You can also increase the high availability by distributing your app setup across multiple worker nodes and clusters to mitigate the impact.