hercules-390 / hyperion

Hercules 390
Other
248 stars 67 forks source link

HHC00166E - ioctl error in hercifc #179

Open fbi-ranger opened 7 years ago

fbi-ranger commented 7 years ago

Running z/VM 6.3 after doing a shutdown with REIPL option and starting the TCPIP virtual machine, the following message is issued on the Hercules console:

HHC00166E hercifc: ioctl error doing SIOCADDRT on tap0: 17 File exists

This message overlays the command line and is not written in the log area.

Environment:

Hercules version: 4.0.0 (4.0.0.8700) on openSUSE leap 42.2

jphartmann commented 7 years ago

Just to be sure: Hercules was not shut down, right? And neither was the virtual machine (virtual devices remain and all that)?

Was the tunnel defined by hercifc? (And do you really need ethernet access to the network--wouldn't an IP tunnel be just as good?)

fbi-ranger commented 7 years ago

Indeed, virtual machine was not stopped. zVM shutdown with REIPL The tunnel is setup by hercifc. LCS is currently for me the best method to integrate the OS in the network (no other subnet, no routeing).

jphartmann commented 7 years ago

And things work if you shut down VM and IPL manually after that? What about shutdown, quit, and restart?

fbi-ranger commented 7 years ago

In case a shutdown the behavior is correct. tun/tap opened and msg HHC00901I LCS: interface tap0, type TAP opened is displayed at the log area. TCPIP start tells twice: HHC00942I CTC lcs device tap0 using mac 36:58:BE:C4:CB:20

mcisho commented 7 years ago

I don't believe this is related to the REIPL, you should see the same effect if you shutdown and restart the TCPIP virtual machine.

When TCPIP starts it sends a STARTLAN command to the LCS, which normally results in one or more ioctl SIOCADDRT requests being issued to add routing table entries. Similarly, when TCPIP terminates it sends, or at least it should send, a STOPLAN command to the LCS, which normally results in one or more ioctl SIOCDELRT requests being issued to remove the routing table entries.

If the STOPLAN isn't sent the ioctl SIOCADDRT requests issued by the next STARTLAN will fail with message HHC00166E as the routing table entries already exist. The STOPLAN won't be sent if the TCPIP virtual machine is forced, which I guess is effectively what is happening with the REIPL.

Use a preconfigured LCS and you won't have any of these problems.

fbi-ranger commented 7 years ago

Could you please specify what is a "preconfigured" LCS? Does it avoid the use of hercifc?

Thank you in advance.

mcisho commented 7 years ago

A preconfigured LCS refers to a network interface that exists in the host system before Hercules starts to use that interface. I use a preconfigured LCS with a z/VM TCPIP virtual machine. Before Hercules is started, issue the following commands (as root, or with sudo if configured):-

ip tuntap add mode tap dev tap10 nmcli dev set tap10 managed no ip link set dev tap10 address C4:D6:C7:1D:40:80 ip link set dev tap10 mtu 1500 ip link set dev tap10 up ip -4 route add 192.168.1.230/32 scope link dev tap10 firewall-cmd --quiet --zone=trusted --change-interface=tap10

tap10 could be any value you desire that doesn't already exist, e.g. tap999, florian, taptcpip, etc. The 192.168.1.230 address should be replaced with whatever you have defined on your TCPIP virtual machines HOME statement. Setting the MAC address is optional.

In the Hercules config you would specify:- 1D40-1D41 LCS tap10

Your TCPIP virtual machine would attach 1D40 and 1D41 to whatever virtual device addresses it used.

To get rid of the interface after Hercules is closed, issue the following commands (as root, or with sudo if configured):- firewall-cmd --quiet --zone=trusted --remove-interface=tap10 ip tuntap del mode tap dev tap10

You may or may not need the firewall-cmd commands, I do on my Fedora host.

fbi-ranger commented 7 years ago

Thank you very much for the quick response. That sounds interesting. I will check this out. BR Florian

mcisho commented 7 years ago

Sorry, I forgot to answer your second question, does it avoid the use of hercifc? Preconfigured interfaces still use hercifc, but not for anything that requires elevated privileges, so hercifc doesn't require elevated privileges.

jphartmann commented 7 years ago

A preconfigured tunnel does not require hercifc or any other help.

fbi-ranger commented 7 years ago

Hi, thanks mcisho, the LCS as you described is working. That are the good news.

The bad news is, that this case a SHUTDOWN REIPL leads to intermittent IFI001 - Abends and a disabled wait 9026. Seems that the LCS device may has a bad state during the REIPL.

Here the extract from the log:

0:19:57 HHC00160I SCP command: shutdown system ranger reipl 20:19:57 20:19:57 SHUTDOWN SYSTEM RANGER REIPL 20:19:57 20:19:57 SYSTEM SHUTDOWN STARTED 20:19:57 HHC00809I Processor CP01: disabled wait state 0002000000000000 0000000000009003 20:19:57 HHC00814I Processor CP00: SIGP CPU reset (0C) CP01, PARM 00000000004C4AD0: CC 0 20:19:57 HHC00809I Processor CP02: disabled wait state 0002000000000000 0000000000009003 20:19:57 20:19:57 Processor 01 offline 20:19:57 HHC00814I Processor CP00: SIGP CPU reset (0C) CP02, PARM 00000000004C4AD0: CC 0 20:19:57 HHC00809I Processor CP03: disabled wait state 0002000000000000 0000000000009003 20:19:57 20:19:57 Processor 02 offline 20:19:57 HHC00814I Processor CP00: SIGP CPU reset (0C) CP03, PARM 00000000004C4AD0: CC 0 20:19:57 HHC00809I Processor CP04: disabled wait state 0002000000000000 0000000000009003 20:19:57 20:19:57 Processor 03 offline 20:19:57 HHC00814I Processor CP00: SIGP CPU reset (0C) CP04, PARM 00000000004C4AD0: CC 0 20:19:59 HHC00814I Processor CP00: SIGP Store status (0E) CP01, PARM 0000000000B65200: CC 0 20:19:59 HHC00814I Processor CP00: SIGP Store status (0E) CP02, PARM 0000000000B67200: CC 0 20:19:59 HHC00814I Processor CP00: SIGP Store status (0E) CP03, PARM 0000000000B69200: CC 0 20:19:59 HHC00814I Processor CP00: SIGP Store status (0E) CP04, PARM 0000000000B6B200: CC 0 20:19:59 20:19:59 HCPWRP963I SHUTDOWN STEP USOAC - JOURNAL USER TERMINATION 20:19:59 20:19:59 HCPWRP963I SHUTDOWN STEP MFRSD - TERMINATE HARDWARE LOADER 20:20:00 20:20:00 HCPWRP963I SHUTDOWN STEP APISD - TERMINATE OTHER PROCESSORS 20:20:00 20:20:00 HCPWRP963I SHUTDOWN STEP ENASD - DISABLE TERMINAL DEVICES 20:20:01 20:20:01 HCPWRP963I SHUTDOWN STEP KCBSD - PERFORM ISFC SHUTDOWN TASKS 20:20:01 20:20:01 HCPWRP963I SHUTDOWN STEP ISHDN - SHUT DOWN I/O SUBSYSTEM 20:20:02 20:20:02 HCPWRP963I SHUTDOWN STEP SGPST - STOP OTHER PROCESSORS 20:20:02 20:20:02 HCPWRP959I RANGER SYSTEM TERMINATION IN PROGRESS ON 2017-01-08 20:20:02 20:20:02 HCPWRP963I SHUTDOWN STEP TXTDS - TERMINATE DATA TRACES 20:20:03 20:20:03 HCPWRP963I SHUTDOWN STEP SVACV - ACTIVATE TERMINATION SAVE AREAS 20:20:03 20:20:03 HCPWRP963I SHUTDOWN STEP CHMOF - DISABLE CHANNEL MEASUREMENT 20:20:04 20:20:04 HCPWRP963I SHUTDOWN STEP ISHDA - DISABLE ALL DEVICES 20:20:05 20:20:05 HCPWRP963I SHUTDOWN STEP CKPSH - TAKE A CHECKPOINT 20:20:05 20:20:05 HCPWRP963I SHUTDOWN STEP OPRCK - SAVE OPERATOR CONSOLE LIST 20:20:06 20:20:06 HCPWRP963I SHUTDOWN STEP MCWMD - DETERMINE MACHINE CHECK STATUS 20:20:06 20:20:06 HCPWRP963I SHUTDOWN STEP SDVRS - RESET IBM DASD CU CHARACTERISTICS 20:20:07 20:20:07 HCPWRP962I VM SHUTDOWN COMPLETED IN 10 SEC 20:20:07 20:20:07 HCPWRP963I SHUTDOWN STEP SGQXX - RESET OTHER PROCESSORS 20:20:07 20:20:07 HCPWRP9277I SYSTEM TERMINATION COMPLETE, ATTEMPTING RESTART 20:20:07 20:20:07 HCPWRP9277I 2017-01-08 SYSTEM RANGER 20:20:07 20:20:07 HCPWRP963I SHUTDOWN STEP SVADV - DEACTIVATE TERMINATION SAVE AREAS 20:20:08 HHC00814I Processor CP00: SIGP Set architecture mode (12) CP00, PARM 0000000000000000: CC 0 20:20:08 HHC00811I Processor CP00: architecture mode ESA/390 20:20:08 HHC00814I Processor CP00: SIGP Set architecture mode (12) CP00, PARM 00000001: CC 0 20:20:08 HHC00811I Processor CP00: architecture mode z/Arch 20:20:08 HHC00814I Processor CP00: SIGP Unassigned (16) CP00, PARM 0000000000000000: CC 1 status 00000002 20:20:10 00:00:00 HCPWRP963I SHUTDOWN STEP SGPST - STOP OTHER PROCESSORS 20:20:11 19:20:11 HCPWRP959I SYSTEM TERMINATION IN PROGRESS ON 2017-01-08 20:20:11 00:00:00 HCPWRP963I SHUTDOWN STEP TXTDS - TERMINATE DATA TRACES 20:20:11 00:00:00 HCPWRP963I SHUTDOWN STEP SVACV - ACTIVATE TERMINATION SAVE AREAS 20:20:12 00:00:00 HCPWRP963I SHUTDOWN STEP CHMOF - DISABLE CHANNEL MEASUREMENT 20:20:12 00:00:00 HCPWRP963I SHUTDOWN STEP ISHDA - DISABLE ALL DEVICES 20:20:13 00:00:00 HCPWRP963I SHUTDOWN STEP DMPDK - TAKE DUMP 20:20:13 19:20:10 HCPDMP908I SYSTEM FAILURE ON CPU 0000, CODE - IFI001 20:20:13 19:20:10 HCPDMP908I 2017-01-08 SYSTEM 20:20:13 HCPDMP9250E SYSTEM DUMP FAILURE; NO DUMP UNIT - INSUFFICIENT SPOOL SPACE 20:20:13 00:00:00 HCPWRP963I SHUTDOWN STEP CKPSH - TAKE A CHECKPOINT 20:20:14 00:00:00 HCPWRP963I SHUTDOWN STEP OPRCK - SAVE OPERATOR CONSOLE LIST 20:20:14 00:00:00 HCPWRP963I SHUTDOWN STEP MCWMD - DETERMINE MACHINE CHECK STATUS 20:20:15 00:00:00 HCPWRP963I SHUTDOWN STEP SDVRS - RESET IBM DASD CU CHARACTERISTICS 20:20:15 00:00:00 HCPWRP963I SHUTDOWN STEP SVADV - DEACTIVATE TERMINATION SAVE AREAS 20:20:16 HCPWRP9026W SYSTEM FAILURE IFI001 OCCURRED DURING SYSTEM INITIALIZATION 20:20:20 HHC00809I Processor CP00: disabled wait state 0002000000000000 0000000000009026 20:20:24 HHC01603I quit

mcisho commented 7 years ago

With no LCS's in the Hercules config, the fourth REIPL of my z/VM 6.1 system resulted in:- 11:00:31 HCPDMP908I SYSTEM FAILURE ON CPU 0000, CODE - IFI001 which suggests, to me at least, that the IFI001 abends have nothing to do with LCS.

mcisho commented 7 years ago

Further experimentation shows:- 1) With no network devices in the Hercules config I can REIPL z/VM 6.1 until my fingers ache without an IFI001 abend. 2) With a single LCS or PTP in the config I can REIPL a dozen or so times without an IFI001 abend. 3) With a single CTCI in the config I get an IFI001 abend on the third or fourth REIPL.

I was ccw tracing the LCS/PTP/CTCI and the only difference I can see between the various types is that CTCI produces dozens and dozens of:- 13:08:17 HHC01317I 0:1C60 CHAN: scsw E0C051C8, stat 0000, count 0000, ccw 775FB280 13:08:17 HHC01318I 0:1C60 CHAN: test I/O: cc=1

Following the first successful REIPL there were the following:- 13:07:09 HHC00904I 0:1C60 CTCI: Halt or clear recognized 13:07:09 HHC01312I 0:1C60 CHAN: stat 0C00, count 8000=>A961E5D4 40D6D5D3 C9D5C540 40404040 z/VM ONLINE 13:07:09 HHC01308I 0:1C60 CHAN: clear completed 13:07:09 HHC01317I 0:1C60 CHAN: scsw 00001001, stat 0000, count 0000, ccw 00000000 13:07:09 HHC01318I 0:1C60 CHAN: test I/O: cc=0 13:07:09 HHC01315I 0:1C60 CHAN: ccw 03000000 00000000 13:07:09 HHC01312I 0:1C60 CHAN: stat 0C00, count 0000 13:07:09 HHC01317I 0:1C60 CHAN: scsw 00C04007, stat 0C00, count 0000, ccw 000E6258 13:07:09 HHC01318I 0:1C60 CHAN: test I/O: cc=0 13:07:09 HHC01317I 0:1C60 CHAN: scsw 00C00000, stat 0C00, count 0000, ccw 000E6258 13:07:09 HHC01318I 0:1C60 CHAN: test I/O: cc=1

Following the second successful REIPL there were the following:- 13:07:45 HHC00904I 0:1C60 CTCI: Halt or clear recognized 13:07:45 HHC01312I 0:1C60 CHAN: stat 0C00, count 8000=>A961E5D4 40D6D5D3 C9D5C540 40404040 z/VM ONLINE 13:07:45 HHC01308I 0:1C60 CHAN: clear completed 13:07:45 HHC01317I 0:1C60 CHAN: scsw 00001001, stat 0000, count 0000, ccw 00000000 13:07:45 HHC01318I 0:1C60 CHAN: test I/O: cc=0 13:07:45 HHC01315I 0:1C60 CHAN: ccw 03000000 00000000 13:07:45 HHC01312I 0:1C60 CHAN: stat 0C00, count 0000 13:07:45 HHC01317I 0:1C60 CHAN: scsw 00C04007, stat 0C00, count 0000, ccw 000E6258 13:07:45 HHC01318I 0:1C60 CHAN: test I/O: cc=0 13:07:45 HHC01317I 0:1C60 CHAN: scsw 00C00000, stat 0C00, count 0000, ccw 000E6258 13:07:45 HHC01318I 0:1C60 CHAN: test I/O: cc=1

Following the third unsuccessful REIPL there were the following:- 13:08:20 HHC00904I 0:1C60 CTCI: Halt or clear recognized 13:08:20 HHC01312I 0:1C60 CHAN: stat 0C00, count 8000=>A961E5D4 40D6D5D3 C9D5C540 40404040 z/VM ONLINE 13:08:20 HHC01308I 0:1C60 CHAN: clear completed 13:08:30 HHC00809I Processor CP00: disabled wait state 0002000000000000 0000000000009025

At the moment I'm not sure what this all means, other than if you want to use REIPL don't have any CTCI's in your Hercules config.

fbi-ranger commented 7 years ago

In my mind it should not be that a certain function can be used and another not. I think we should put all the effort into it, to find the corresponding bugs. Without using it, we would not encounter the bugs. If further traces are needed please let me know. At leased we know that the IFI001 is not necessarily related to zVM 6.3 and the driving OS where Hercules runs. Seems that the state of the CTCI's are not as expected by zVM. Right?