zowe / launcher

Eclipse Public License 2.0
0 stars 4 forks source link

Zowe not restarting components after catastrophic failure #84

Closed nosrednayduj closed 1 year ago

nosrednayduj commented 1 year ago

We have been doing some failure behavior testing, and so we have been killing various tasks including our component and other Zowe components with the C SDSF command character.

It appears that the component restart behavior no longer works. Example, I kill with C:

 C   IZPZ0126 ./zssServer --schemas /proj/izpqa/d012/z S0536287 FILE SYS KERNEL

I look in my started task log afterwards, and it says it has had too many restart failures. Note: all components started once properly, then I killed ZSS, then I got this error.

2023-06-16 18:14:17 <ZWELS:132968> IZPSTC INFO (zwe-internal-start-component) starting component zss ...
2023-06-16 14:20:48 <ZWELNCH:16909310> IZPSTC INFO ZWEL0004I component zss(33686528) terminated, status = 0
2023-06-16 14:20:48 <ZWELNCH:16909310> IZPSTC ERROR ZWEL0038E failed to restart component zss, max retries reached

It looks like ZWELS timestamps are in UTC, and ZWELNCH timestamps are in server time. But, the minutes are off. I started everything at 14:12, and killed ZSS around 14:20. So it's a little funny that it's printing this starting message with the 18:14 timestamp. Maybe there's just a buffer that needed flushing.

The real issue is that Zowe does not restart a component after a catastrophic failure.

nosrednayduj commented 1 year ago

Trying this test again in Zowe 2.9, and Zowe is restarting my processes correctly. So, something got fixed, either on purpose or by accident.

nosrednayduj commented 1 year ago

Closed because it doesn't seem to be a problem in current code.