hercules-390 / hyperion

Hercules 390
Other
252 stars 68 forks source link

zVM 5.4.0: Second level install of zVM6.3.0 leads to iterrative DSP005 system errors in the guest #140

Open fbi-ranger opened 8 years ago

fbi-ranger commented 8 years ago

Error description:

Running zVM 5.4.0 as the primary OS on Hercules and performing a normal zVM 6.3.0 install as second level guest leads to iterative DSP005 system errors after restoration of the system mini disks and IPL of that newly generated system. During the first IPL the installation process is not finished yet. The system restores certain disks using DDR restore. During this phase the DSP005 happens, the guest OS takes a dump and restarts. After second DSP005 the system enters disabled wait state 9026.

The collection of that dump information is quite difficult because the system is not installed completely and therefore there is no interactive environment to use.

DSP005:

Explanation: The value of the processing unit timer is greater than it was at user dispatch. There is probably an error in switching timers.

User response: Examine the CP trace table and storage dump for possible solutions. CR12 points to the next available CP trace table entry after the failure.

ivan-w commented 8 years ago

Florian,

Do you have the latest hyperion ?

Thanks,

Ivan

fbi-ranger commented 8 years ago

Hi Ivan,

Yes indeed. Because of the information about the changes done regarding SIE, I compiled yesterday the latest committed version of Hyperion.

In first level, up to now, I didn't face the problem of the TCP/IP start any more. This is great. That bug of getting a network connection or not was a nightmare for many years. However up to now I didn't face any DSP005 abends during a second level installation. That is new.

Some weeks ago I run a first level installation of zVM 6.3.0 and more ore less it was successful. There the problem was that under some unknown conditions CMS faced a corrupted storage area in its nucleus and abended. That affected for instance the ipwizard exec and left the TCP/IP stack unconfigured. So that problem seems to be fixed with the current change.

This time I wanted to run a clean install for 6.3 to replace my old 5.4 version once and for good. Well, unfortunately, there seems to be still at leased one obstacle in front.

If I can provide you with more info please let me know.

Florian

Peter-J-Jansen commented 8 years ago

Dear Florian,

I have been unable to reproduce the DSP005 Abend during the 2nd level z/VM 6.3 (non-SSI) installation. Just to be sure, I tried it two times, but on both occasions all worked perfectly ending up with a clean z/VM 6.3 installation. Both took indeed more than 6 hours. Down below you’ll find an extract of the installation log including the two DDR’s restores which I saw happening.

I tried this under (one of the members of) my z/VM 6.3 (SSI). However, somehow I don’t think that z/VM 5.4 as a 1st level z/VM is the cause of the DSP005, but perhaps rather the fact of z/VM 5.4 having been installed or running under Hercules prior to the TLP/ALB PURGE fix might have become somehow corrupt. I know this is a wild guess. But after the TLB/ALB PURGE fix, my z/VM 6.3 system’s DIRMAINT exhibited a CRC error, which I could correct by performing a USER DIRECT recovery (as prescribed in an IBM manual). This too, I thought, was caused the TLB/ALB PURGE problem (but then subsequently, after the TLB/ALB PURGE fix, recovered).

In summary, I think the TLB/ALB PURGE fix has resulted in SIE working perfectly. It might be improved on performance wise by purging a more limited number of TLB/ALB entries, but I think the functionality now is better than it ever was. Perhaps even perfect. (As usual, until we discover the next problem of course ).

Best regards,

Peter

P.S.: The z/VM 6.3 (non-SSI) log excerpt :

   IUGILB8440I NOW LOADING 6VMRAC30 505 (8E0) DISK 199 OF 205
   IUGILB8440I NOW LOADING 6VMRAC30 599 (8E1) DISK 200 OF 205
   IUGILB8440I NOW LOADING 6VMRAC30 651 (95C) DISK 201 OF 205
   IUGILB8440I NOW LOADING 6VMRSC30 400 (856) DISK 202 OF 205
   IUGILB8440I NOW LOADING 6VMRSC30 493 (85C) DISK 203 OF 205
   IUGILB8440I NOW LOADING 6VMTCP30 491 (86B) DISK 204 OF 205
   IUGILB8440I NOW LOADING 6VMTCP30 492 (86C) DISK 205 OF 205
   IUGIDV8341I USER DIRECTORY HAS BEEN BROUGHT ONLINE SUCCESSFULLY
   IUGIDV8341I SALIPL COMMAND HAS COMPLETED SUCCESSFULLY
   IUGIWF8341I ECKDDUMP OF 2CF0 COMPLETED SUCCESSFULLY
   IUGIWF8338I NOW EXECUTING COPY OF 24CC TO 4CC STEP
   IUGIWF8341I ECKDDUMP OF 24CC COMPLETED SUCCESSFULLY
   IUGIDV8392I INSTDVD EXEC ENDED SUCCESSFULLY
   ****************************************
   *        NOW IPLing VOLUME 0630        *
   *               WITH COMMAND:          *
   *        CP SYSTEM CLEAR               *
   *        TERMINAL CONMODE 3270         *
   *        IPL  0630 CLEAR               *
   ****************************************
   Storage cleared - system reset.
   CONS 0020 DEFINED
   17:40:32 z/VM  V6 R3.0  SERVICE LEVEL 0000 (64-BIT)
   17:40:33 SYSTEM NUCLEUS CREATED ON 2013-05-29 AT 10:48:03, LOADED FROM M00RES
   17:40:33
   17:40:33 ****************************************************************
   17:40:33 * LICENSED MATERIALS - PROPERTY OF IBM*                        *
   17:40:33 *                                                              *
   17:40:33 * 5741-A07 (C) COPYRIGHT IBM CORP. 1983, 2013. ALL RIGHTS      *
   17:40:33 * RESERVED. US GOVERNMENT USERS RESTRICTED RIGHTS - USE,       *
   17:40:33 * DUPLICATION OR DISCLOSURE RESTRICTED BY GSA ADP SCHEDULE     *
   17:40:33 * CONTRACT WITH IBM CORP.                                      *
   17:40:33 *                                                              *
   17:40:33 * * TRADEMARK OF INTERNATIONAL BUSINESS MACHINES.              *
   17:40:33 ****************************************************************
   17:40:33
   17:40:33 HCPZCO6718I Using parm disk 1 on volume VMC0M1 (device 0631).
   17:40:33 HCPZCO6718I Parm disk resides on cylinders 1 through 120.
   17:40:33 The directory on volume M00RES at address 0630 has been brought online.
   17:40:34 HCPWRS2513I
   17:40:34 HCPWRS2513I Spool files available       29
   17:40:35 HCPWRS2512I Spooling initialization is complete.
   17:40:35 DASD 0635 dump unit CP IPL pages 14564
   17:40:35 HCPMLM3016I Management by the Unified Resource Manager is not available for this system.
   17:40:35 HCPAAU2700I System gateway V0 identified.
   17:40:36 HCPLNM101E DASD 0CF1 forced R/O; R/O by SYSTEM; stable by SYSTEM
   17:40:36 HCPLNM101E DASD 0CF3 forced R/O; R/O by SYSTEM; stable by SYSTEM
   17:40:36 z/VM Version 6 Release 3.0, Service Level 0000 (64-bit),
   17:40:36 built on IBM Virtualization Technology
   17:40:36 There is no logmsg data
   17:40:36 FILES:   NO RDR,   NO PRT,   NO PUN
   17:40:36 LOGON AT 17:40:36 EDT SUNDAY 08/07/16
   17:40:36 GRAF  0020 LOGON  AS  MAINT630 USERS = 1
   17:40:36 HCPIOP952I 1024M system storage
   17:40:36 FILES: 0000002 RDR, 0000001 PRT,      NO PUN
   17:40:36 HCPCRC8082I Accounting records are accumulating for userid DISKACNT.
   z/VM V6.3.0    2013-05-29 10:47
   17:40:37 AUTO LOGON  ***       OP1      USERS = 2     BY MAINT630
   17:40:37 HCPCLS6056I XAUTOLOG information for OP1: The IPL command is verified by the IPL command processor.
   z/VM DASD DUMP/RESTORE PROGRAM
   HCPDDR698I DATA DUMPED FROM 0X0100 TO BE RESTORED
   HCPDDR697I NO VOL1 LABEL FOUND
   RESTORING 0X0100
   DATA DUMPED   04/11/13 AT 18.12.19  GMT FROM 0X0100 RESTORED
   INPUT CYLINDER EXTENTS      OUTPUT CYLINDER EXTENTS
         START       STOP         START        STOP
             0       3329             0        3329
   END OF RESTORE
    BYTES RESTORED  2461676300

   END OF JOB
   z/VM DASD DUMP/RESTORE PROGRAM
   HCPDDR698I DATA DUMPED FROM 0X0100 TO BE RESTORED
   HCPDDR697I NO VOL1 LABEL FOUND
   RESTORING 0X0100
   DATA DUMPED   04/11/13 AT 18.12.19  GMT FROM 0X0100 RESTORED
   INPUT CYLINDER EXTENTS      OUTPUT CYLINDER EXTENTS
         START       STOP         START        STOP
             0       3329             0        3329
   END OF RESTORE
    BYTES RESTORED  2461676300

   END OF JOB
   IUGPLD8341I  POSTLOAD PROCESSING STARTED
   DMSACC724I 4CC replaces C (4CC)

   IUGIFP8493I ISSUING XAUTOLOG FOR VMSERVU
   AUTO LOGON  ***       VMSERVU  USERS = 3
   HCPCLS6056I XAUTOLOG information for VMSERVU: The IPL command is verified by the IPL command processor.

   IUGIFP8493I ISSUING XAUTOLOG FOR VMSERVR
   AUTO LOGON  ***       VMSERVR  USERS = 4
   HCPCLS6056I XAUTOLOG information for VMSERVR: The IPL command is verified by the IPL command processor.
jphartmann commented 8 years ago

The CRC error could also be some very old damage. The CRC algorithm used by dirmaint twenty years ago was initially implemented wrong with CMS pipelines. As a result, dirmaint took to using CRC16I. Caveat emptor.

fbi-ranger commented 8 years ago

Dear Peter,

That is quite interesting and good to hear. What I will do is to run a z/VM install under 6.3 as well. I suppose that there might be a bug in 5.4 as it is quite old that might be triggered by a slight deviation from a real zSeries specification. In case 6.3 works well, maybe I could collect the VMDumps of DSP005.

Thank you for this information. I will post my findings.

Regards, Florian

Peter-J-Jansen commented 8 years ago

Thanks for the info John.

In my case, z/VM 6.3 (SSI) installed prior to TLB/ALB PURGE fix worked fine, apart from the intermittent known anomalies (e.g. TCPIP not always starting up, sometimes PIPE errors, etc.). Subsequent to the TLB/ALB PURGE fix all these went away. But then when I tried to perform another DIRM ADD (VMINSTAL, for my 2nd level z/VM install) DirMaint complained about a CRC error, which I recovered from using an earlier USER DIRECT backup following IBM’s published procedure. So to me this looked like the TLB/ALB PURGE problem had created an incorrect / corrupted CRC. Which is what leads me to now distrust any z/VM installed or running with that Hercules problem unsolved. I.e. in my case the CRC algorithm has not changed, but perhaps with the TLB/ALB PURGE problem it may have created wrong CRC’s.

Peter

ivan-w commented 8 years ago

Florian is reporting the issue also occurs when installing z/VM 6.3 under z/VM 6.3... Investigations in progress