final test - Githubissues

node0101 had rebooted by itself about 7am EDT and it was not longer accessible. We could see that node0101 was pingable but you would not ssh to it. The node was in a "coma" state where it was alive, but not accessible.

We tried to collect apdiag logs, but as node0101 was not reachable via ssh, we could not get logs from it. Wolverine was disabled.

We stopped db2 and restarted wolverine.

This did not help, so we stopped wolverine again, and stopped the platform, and then manually stopped dashDB on each node (for each )

for node in node010{2..7}

echo $node

ssh $node docker stop dashDB

done

Then we started the appliance with apstart. We could see that it was trying to enable node0101, and then the node rebooted.

what seems to happen is that magneto did send the reboot to the node, it took a very long time to go down, and right when we started the plaform it was there. The system is now up and node0101 is up and reachable and accessible.

We are collecting logs again to get all the logs needed to get to RCA.

Need to understand

1- what caused node0101 to go into that state?

2- why WV was reported disabled?

Salesforce Case Number-->TS007288049 Salesforce Account Name-->Bank of America National Association Salesforce Creater-->Salesforce GIT Bridge Salesforce Case creation date-->10/21/2021 01:45:52 Salesforce Product Name-->Integrated Analytics Systems Salesforce Case Severity-->Sev_1 Salesforce Case Status-->Waiting_On_Client

nicholas-kebbas / nickkebbas.com

final test #27