open-power / op-test

Testing Firmware for OpenPOWER systems
Apache License 2.0
37 stars 85 forks source link

MachineConfig.py: osconfig command fails due to no prompt #871

Closed abdhaleegit closed 1 week ago

abdhaleegit commented 1 week ago

For the below combination of MachineConfig the job fails {"lpar":"cpu=dedicated,vpmem=1,vtpm=1","os":"hugepage=2M"}

as the lparconfig power off and on the lpar and than the osconfig do not find the prompt ready to run os commands

so with this fix we are waiting for lpar os to boot till login and closing the console for next osConfig to open and reset the prompt and continue

abdhaleegit commented 1 week ago

before fix

[  OK  ] Started Command Scheduler.
         Starting Hold until boot process finishes up...
         Starting Terminate Plymouth Boot Screen...
[  OK  ] Started SRC-Subsystem Resource Controller.
2024-11-19 03:16:18,232:op-test.common.OpTestUtil:try_recover:WARNING:OpTestSystem detected something, working on recovery
2024-11-19 03:16:18,232:op-test.common.OpTestHMC:connect:INFO:De-activating the console
rmvterm -m ltcever60 -p CR-ltcever60-lp8-austin-tg3
rmvterm -m ltcever60 -p CR-ltcever60-lp8-austin-tg3
Close command sent[console-expect]#echo $?
echo $?
0
[console-expect]#2024-11-19 03:16:18,617:op-test.common.OpTestHMC:connect:INFO:Opening the LPAR console
2024-11-19 03:16:34,835:op-test.common.OpTestUtil:try_recover:WARNING:OpTestSystem recovered from temporary issue, continuing
2024-11-19 03:16:34,835:op-test.common.OpTestUtil:try_sendcontrol:WARNING:OpTestSystem recovered from temporary issue, but the command output is unavailable, raised Exception CommandFailed but continuing
2024-11-19 03:16:36,837:op-test.common.OpTestUtil:run_command:INFO:

OpTestSystem detected a command issue, we will retry the command, this will be retry "02" of a total of "05"

tail /proc/cpuinfo | grep MMU
tail /proc/cpuinfo | grep MMU
Password: 2024-11-19 03:16:51,952:op-test.common.OpTestUtil:try_sendcontrol:WARNING:OpTestSystem detected something, working on recovery
2024-11-19 03:17:03,064:op-test.common.OpTestUtil:try_recover:WARNING:OpTestSystem detected something, working on recovery
2024-11-19 03:17:03,064:op-test.common.OpTestHMC:connect:INFO:De-activating the console
rmvterm -m ltcever60 -p CR-ltcever60-lp8-austin-tg3
rmvterm -m ltcever60 -p CR-ltcever60-lp8-austin-tg3
Close command sent[console-expect]#echo $?
echo $?
0
[console-expect]#2024-11-19 03:17:03,341:op-test.common.OpTestHMC:connect:INFO:Opening the LPAR console
2024-11-19 03:17:19,559:op-test.common.OpTestUtil:try_recover:WARNING:OpTestSystem recovered from temporary issue, continuing
2024-11-19 03:17:19,559:op-test.common.OpTestUtil:try_sendcontrol:WARNING:OpTestSystem recovered from temporary issue, but the command output is unavailable, raised Exception CommandFailed but continuing
2024-11-19 03:17:21,562:op-test.common.OpTestUtil:run_command:INFO:

OpTestSystem detected a command issue, we will retry the command, this will be retry "03" of a total of "05"

tail /proc/cpuinfo | grep MMU
tail /proc/cpuinfo | grep MMU

======================================================================
raised Exception CommandFailed but continuing
ERROR

======================================================================
ERROR: runTest (testcases.MachineConfig.MachineConfig)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/testcases/MachineConfig.py", line 124, in runTest
    self.callConfig(key)
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/testcases/MachineConfig.py", line 200, in callConfig
    status = OsConfig(self.cv_HMC, self.system_name,
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/testcases/MachineConfig.py", line 615, in __init__
    self.mmulist = self.c.run_command("tail /proc/cpuinfo | grep MMU")
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/common/OpTestHMC.py", line 1387, in run_command
    return self.util.run_command(self, i_cmd, timeout)
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/common/OpTestUtil.py", line 1820, in run_command
    raise cf
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/common/OpTestUtil.py", line 1815, in run_command
    output = self.try_command(term_obj, command, timeout)
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/common/OpTestUtil.py", line 1883, in try_command
    output_list, echo_rc = self.try_sendcontrol(term_obj, command)
  File "/data/jenkins/workspace/sandbox/ioenlarge/op-test/common/OpTestUtil.py", line 1351, in try_sendcontrol
    raise CommandFailed(command, "run_command TIMEOUT in try_sendcontrol, we recovered the prompt,"
common.Exceptions.CommandFailed: Command 'tail /proc/cpuinfo | grep MMU' exited with '-1'.
Output
run_command TIMEOUT in try_sendcontrol, we recovered the prompt, but the command output is unavailable
----------------------------------------------------------------------
Ran 1 test in 402.439s
abdhaleegit commented 1 week ago

after fix

[  130.024673] tg3 0010:01:00.1 enP16p1s0f1: EEE is disabled
[  130.024688] IPv6: ADDRCONF(NETDEV_CHANGE): enP16p1s0f1: link becomes ready
[  130.062966] tg3 0010:01:00.0 enP16p1s0f0: Link is up at 1000 Mbps, full duplex
[  130.062991] tg3 0010:01:00.0 enP16p1s0f0: Flow control is on for TX and on for RX
[  130.062996] tg3 0010:01:00.0 enP16p1s0f0: EEE is disabled
[  130.063008] IPv6: ADDRCONF(NETDEV_CHANGE): enP16p1s0f0: link becomes ready
    ⁃   [  130.778210] systemd-journald[716]: Time jumped backwards, rotating.
    ⁃   
    ⁃   Red Hat Enterprise Linux 9.6 Beta (Plow)
    ⁃   Kernel 5.14.0-516.el9.ppc64le on an ppc64le
    ⁃   
    ⁃   Activate the web console with: systemctl enable --now cockpit.socket
    ⁃   
    ⁃   ltcever60-lp8 login:
    ⁃   ltcever60-lp8 login:
    ⁃   
    ⁃   ltcever60-lp8 login: root
    ⁃   root
    ⁃   Password: xxx
    ⁃   
    ⁃   Last login: Tue Nov 19 03:24:57 from 10..63
    ⁃   [root@ltcever60-lp8 ~]# which bash && exec bash --norc --noprofile
    ⁃   PS1=\[console-expect\]#
    ⁃   which bash && exec bash --norc --noprofile
    ⁃   /usr/bin/bash
    ⁃   bash-5.1# PS1=\[console-expect\]#
    ⁃   [console-expect]#which stty && stty cols 300;which stty && stty rows 30
    ⁃   which stty && stty cols 300;which stty && stty rows 30
    ⁃   /usr/bin/stty
    ⁃   /usr/bin/stty
    ⁃   [console-expect]#export LANG=C
    ⁃   export LANG=C
    ⁃   [console-expect]#
    ⁃   
    ⁃   [console-expect]#
    ⁃   date
    ⁃   which whoami && whoami
    ⁃   
    ⁃   [console-expect]#date
    ⁃   Tue Nov 19 03:31:02 CST 2024
    ⁃   [console-expect]#which whoami && whoami
/usr/bin/whoami
root
[console-expect]#echo $?
echo $?
0
[console-expect]#tail /proc/cpuinfo | grep MMU
tail /proc/cpuinfo | grep MMU
MMU     : Radix
[console-expect]#echo $?
echo $?
0
[console-expect]#cat /proc/cmdline
cat /proc/cmdline
BOOT_IMAGE=(ieee1275//vdevice/v-scsi@3000006c/disk@8100000000000000,msdos2)/vmlinuz-5.14.0-516.el9.ppc64le root=/dev/mapper/rhel_ltcever60--lp8-root ro crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G rd.lvm.lv=rhel_ltcever60-lp8/root rd.lvm.lv=rhel_ltcever60-lp8/swap biosdevname=0
[console-expect]#echo $?
echo $?
0