hercules-390 / hyperion

Hercules 390
Other
252 stars 68 forks source link

New Crash Dump with Hercules #147

Open erwinmars opened 8 years ago

erwinmars commented 8 years ago

After upgrading some products, i always get a crash dump starting that application. It has occured with 3.12, but also with 4.00. I retest it with 4.00.8597 the same result. I've downloaded the current source, but i'm not so familiar with the VC 2015 debugging environment, so i've no idea, where the issue occurs.

I've attached the log file, where the crash dump occurs and also the crash dump.

Thanks for any help.

Erwin hercules_64_400-160825-160841.txt

Hercules.dmp.zip

Fish-Git commented 8 years ago

Erwin:

I have finished analyzing your Crash Dump (Thank You!) and based on analysis it appears the crash occurred in the cckd garbage collector function where it is accessing the device's level-2 lookup tables (to try and find one to relocate).

This is strongly indicative of one or more potentially damaged dasd image files (or shadow files). Which one(s) I cannot say.

My suggested course of action would be for you to either:

a) Restore your Hercules dasd files from your backups.

Or, if you're foolish enough to not backup your system on a regular basis, to:

b) Use the Hercules cckdcdsk.exe utility to try and repair the damage.

Note: You might need to try check level 2 (extra) or 3 (maximal) if cckdcdsk level 1 (normal) fails to repair the damage.

Please let me know whether the above resolves your problem or not.

Thanks!

Microsoft's Debugging Tools for Windows analysis of your crash dump follows immediately below.

  Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
  Copyright (c) Microsoft Corporation. All rights reserved.

  Loading Dump File [C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS\Hercules.dmp]
  User Mini Dump File: Only registers, stack and portions of memory are available

  WARNING: Minidump contains unknown stream type 0x15
  WARNING: Minidump contains unknown stream type 0x16
  Comment: '4.0.0.8597-g05652fb'
  Comment: '(C) Copyright 1999-2016 by Roger Bowler, Jan Jaeger, and others'
  Comment: 'Running on: EWH-PC (Windows-6.2.9200 Intel(R) x64) LP=8, Cores=4, CPUs=1'
  Comment: 'Built with: Microsoft Visual C 150030729 1'
  Comment: 'Build type: Windows MSVC AMD64 host architecture build'
  Comment: 'Modes: S/370 ESA/390 z/Arch'
  Comment: 'Max CPU Engines: 32'
  Comment: 'Using   Fish threads Threading Model'
  Comment: 'Using   Error-Checking Mutex Locking Model'
  Comment: 'With    Syncio support'
  Comment: 'With    Shared Devices support'
  Comment: 'With    Dynamic loading support'
  Comment: 'Using   shared libraries'
  Comment: 'With    External GUI support'
  Comment: 'With    Partial TCP keepalive support'
  Comment: 'With    IPV6 support'
  Comment: 'With    HTTP Server support'
  Comment: 'With    sqrtl support'
  Comment: 'Without SIGABEND handler'
  Comment: 'With    CCKD BZIP2 support'
  Comment: 'With    HET BZIP2 support'
  Comment: 'With    ZLIB support'
  Comment: 'With    Regular Expressions support'
  Comment: 'With    Object REXX support'
  Comment: 'Without Regina REXX support'
  Comment: 'With    Automatic Operator support'
  Comment: 'Without National Language Support'
  Comment: 'Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 hatomics=msvcIntrinsics'
  Symbol search path is: C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS;srv*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols
  Executable search path is: C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS
  Windows 7 Version 14393 MP (8 procs) Free x64
  Product: WinNt, suite: SingleUserTS
  Machine Name:
  Debug session time: Thu Aug 25 07:13:29.000 2016 (UTC - 7:00)
  System Uptime: not available
  Process Uptime: 0 days 0:04:48.000
  ................................................................
  ........................
  This dump file has an exception of interest stored in it.
  The stored exception information can be accessed via .ecxr.
  (bd0.3228): Access violation - code c0000005 (first/second chance not available)
  ntdll!ZwGetContextThread+0x14:
  00007ffc`ed346b04 c3              ret
  0:041> !analyze -v
  *******************************************************************************
  *                                                                             *
  *                        Exception Analysis                                   *
  *                                                                             *
  *******************************************************************************

  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for user32.dll - 
  Unable to load image C:\Windows\System32\LavasoftTcpService64.dll, Win32 error 0n2
  *** WARNING: Unable to verify timestamp for LavasoftTcpService64.dll
  *** ERROR: Module load completed but symbols could not be loaded for LavasoftTcpService64.dll
  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for ws2_32.dll - 
  Unable to load image C:\Windows\System32\win32u.dll, Win32 error 0n2
  *** WARNING: Unable to verify timestamp for win32u.dll
  *** ERROR: Module load completed but symbols could not be loaded for win32u.dll
  Unable to load image C:\Program Files (x86)\Trusteer\Rapport\bin\x64\rooksbas_x64.dll, Win32 error 0n2
  *** WARNING: Unable to verify timestamp for rooksbas_x64.dll
  *** ERROR: Module load completed but symbols could not be loaded for rooksbas_x64.dll
  Unable to load image U:\z\ctci-w32\x64\bin\TunTap64.dll, Win32 error 0n2
  *** WARNING: Unable to verify timestamp for TunTap64.dll
  *** ERROR: Module load completed but symbols could not be loaded for TunTap64.dll
  GetPageUrlData failed, server returned HTTP status 404
  URL requested: http://watson.microsoft.com/StageOne/hercules_exe/4_0_0_8597/57add362/hdasd_dll/4_0_0_8597/57add32d/c0000005/0000eef0.htm?Retriage=1

  FAULTING_IP: 
  hdasd!cckd_gc_l2+340 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 5224]
  00000000`0042eef0 418b08          mov     ecx,dword ptr [r8]

  EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
  ExceptionAddress: 000000000042eef0 (hdasd!cckd_gc_l2+0x0000000000000340)
     ExceptionCode: c0000005 (Access violation)
    ExceptionFlags: 00000000
  NumberParameters: 2
     Parameter[0]: 0000000000000000
     Parameter[1]: 0000000009a8d008
  Attempt to read from address 0000000009a8d008

  PROCESS_NAME:  hercules.exe

  ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

  EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

  EXCEPTION_PARAMETER1:  0000000000000000

  EXCEPTION_PARAMETER2:  0000000009a8d008

  READ_ADDRESS:  0000000009a8d008 

  FOLLOWUP_IP: 
  hdasd!cckd_gc_l2+340 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 5224]
  00000000`0042eef0 418b08          mov     ecx,dword ptr [r8]

  MOD_LIST: <ANALYSIS/>

  NTGLOBALFLAG:  0

  APPLICATION_VERIFIER_FLAGS:  0

  FAULTING_THREAD:  0000000000003228

  BUGCHECK_STR:  APPLICATION_FAULT_INVALID_POINTER_READ_FILL_PATTERN_ffffffff

  PRIMARY_PROBLEM_CLASS:  INVALID_POINTER_READ_FILL_PATTERN_ffffffff

  DEFAULT_BUCKET_ID:  INVALID_POINTER_READ_FILL_PATTERN_ffffffff

  LAST_CONTROL_TRANSFER:  from 000000000042e459 to 000000000042eef0

  STACK_TEXT:  
  00000000`0b4bfc40 00000000`0042e459 : 00000000`04bcb2c0 00000000`00000000 00000000`00010000 00000000`04bcb2b0 : hdasd!cckd_gc_l2+0x340 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 5224]
  00000000`0b4bfca0 00000000`0042dfcd : 00000000`00000040 00000000`04c1e000 00000000`0042db00 00000000`04bc8a90 : hdasd!cckd_gc_percolate+0x159 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 4931]
  00000000`0b4ffdc0 00000000`00efe424 : 00000000`00003228 00092a7f`57befcfd 00000000`57befcfe 00000000`7312334a : hdasd!cckd_gcol+0x4cd [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 4837]
  00000000`0b4ffe80 00000000`00ef376a : 00000000`08d09b90 00000000`00000000 00000000`00000000 00000000`7312336b : hutil!hthread_func+0x34 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\hthreads.c @ 778]
  00000000`0b4ffeb0 00000000`73122fdf : 00000000`09631390 00000000`00000000 00000000`00000000 00000000`00000000 : hutil!FTWin32ThreadFunc+0x5a [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\fthreads.c @ 787]
  00000000`0b4fff00 00000000`73123080 : 00000000`731acad0 00000000`09631390 00000000`00000000 00000000`00000000 : msvcr90!_callthreadstartex+0x17 [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c @ 348]
  00000000`0b4fff30 00007ffc`ecac8364 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : msvcr90!_threadstartex+0x84 [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c @ 326]
  00000000`0b4fff60 00007ffc`ed305e91 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
  00000000`0b4fff90 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

  FAULTING_SOURCE_CODE:  
    5220:             )
    5221:            )
    5222:         {
    5223:             for (i = 0; i < cckd->cdevhdr[sfx].numl1tab; i++)
  > 5224:                 if (fpos + cckd->free[i].len == (off_t)cckd->l1[sfx][i])
    5225:                     break;
    5226:         }
    5227:         else
    5228:         {
    5229:             for (i = 0; i < cckd->cdevhdr[sfx].numl1tab; i++)

  SYMBOL_STACK_INDEX:  0

  SYMBOL_NAME:  hdasd!cckd_gc_l2+340

  FOLLOWUP_NAME:  MachineOwner

  MODULE_NAME: hdasd

  IMAGE_NAME:  hdasd.dll

  DEBUG_FLR_IMAGE_TIMESTAMP:  57add32d

  STACK_COMMAND:  ~41s; .ecxr ; kb

  FAILURE_BUCKET_ID:  INVALID_POINTER_READ_FILL_PATTERN_ffffffff_c0000005_hdasd.dll!cckd_gc_l2

  BUCKET_ID:  X64_APPLICATION_FAULT_INVALID_POINTER_READ_FILL_PATTERN_ffffffff_hdasd!cckd_gc_l2+340

  WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/hercules_exe/4_0_0_8597/57add362/hdasd_dll/4_0_0_8597/57add32d/c0000005/0000eef0.htm?Retriage=1

  Followup: MachineOwner
  ---------
erwinmars commented 8 years ago

Hi Fish,

thanks, i'll try it. To save time, could you see the volume name, which is involved?

Thanks in advance

Erwin

erwinmars commented 8 years ago

Hi Fish,

i found the volume, although CCKDCDSK doesn't find any issue with that volume. But there is a catalog on that volume. If i process this catalog with a utility like export catalog i get also thhe crash dump.

I'll recreate the catalog.

Erwin

ivan-w commented 8 years ago

On 8/26/2016 5:43 PM, Erwin Marschalk wrote:

Hi Fish,

i found the volume, although CCKDCDSK doesn't find any issue with that volume. But there is a catalog on that volume. If i process this catalog with a utility like export catalog i get also thhe crash dump.

I'll recreate the catalog.

Erwin

Erwin,

It shouldn't crash.. hercules (or any of its utilities) should NEVER crash !

However, it may be crashing in a library outside of hercules (zlib ?) at which point, it is outside of our purview.

--Ivan

Fish-Git commented 8 years ago

If i process this catalog with a utility like export catalog i get also thhe crash dump.

Erwin, what utility are you using? Is it a Hercules utility? If so, which one? dasdpdsu perhaps?

Like Ivan said, none of the Hercules programs -- neither Hercules itself nor any of its utilities -- should ever crash! If a Hercules utility is crashing, then we need to find out why and fix it!

If the "export" utility you are using is a z/OS export utility, then can you provide us with some additional information regarding which z/OS dataset (catalog) it is and how it was created? Is it a private catalog? Or is it one that IBM supplied with z/OS?

I realize it may not be possible but I would very much like to obtain a copy of this dasd that you are having trouble with! If I can recreate your problem on my system then I could fix the problem much easier and much faster!

Fish-Git commented 8 years ago

I'll recreate the catalog.

BEFORE YOU RECREATE YOUR CATALOG, please make a copy of that damaged volume! (Hercules dasd image file)

If we are going to eventually fix Hercules so it does not crash, we need to be able to test our change to verify that it fixes the problem. Your volume (Hercules dasd image file), Erwin, is the only one that exists that causes the problem, so it needs to be preserved. Thanks!

(Once the problem has been fixed, then you can destroy your bad volume. But please, not before! Thanks!)

erwinmars commented 8 years ago

Hi Fish, Ivan,

i checked the volume i expect, but CCKDCDSK -3 find nothing. From the LISTCAT of that volume, i select all volumes, which are related to this catalog and run also a CCKDCDSK with that volumes, but also here no errors. But with the LISTCAT on that catalog, i get a strange error message, but LISTCAT gets RC 0: 14.47.10 ADCD IGD306I UNEXPECTED ERROR DURING ?CBRXVOL PROCESSING RETURN CODE 20 REASON CODE 0 THE MODULE THAT DETECTED THE ERROR IS IGDCAT01 SMS MODULE TRACE BACK - CAT01 CAT00 SSIRT SYMPTOM RECORD CREATED, PROBLEM ID IS IGD00000 14.47.11 ADCD IGD306I UNEXPECTED ERROR DURING ?CBRXVOL PROCESSING RETURN CODE 20 REASON CODE 0 THE MODULE THAT DETECTED THE ERROR IS IGDCAT01 SMS MODULE TRACE BACK - CAT01 CAT00 SSIRT SYMPTOM RECORD CREATED, PROBLEM ID IS IGD00001

No Crash Dump with LISTCAT, but with an other IBM utility the crash dump is recreatable. From my past hercules experience i expect, its an issue with SMS controlled volumes.

Btw. if we would talk about the "other IBM utility" i prefer to do it offline.

Erwin

Fish-Git commented 8 years ago

IGD306I UNEXPECTED ERROR DURING ?CBRXVOL PROCESSING

Do either of the following make any sense to you? (I am not a z/OS person!)

Did you lock your catalog and forget to unlock it?

(even if you did though, Hyperion should not ever crash!)

Fish-Git commented 8 years ago

Btw. if we would talk about the "other IBM utility" i prefer to do it offline.

Understood.

I prefer keeping everything open and public, but I understand sometimes sensitive issues arise that require processing hidden from public view,

Feel free to contact me directly if/when you feel you must/should.

Fish-Git commented 8 years ago

Please also see pg. 464 of:


FYI: IBM's Library Server (where you can find almost any manual) is here:

Hope that helps!

erwinmars commented 8 years ago

Regarding IGD306I you are right. I used parts from my catalog recovery job, which locks the catalog. But because it was not complete, it was no unlocked. Sorry for that.

erwinmars commented 8 years ago

One note more. All related volumes are defined with cu=3990-6.

erwinmars commented 7 years ago

Hi Fish,

in the meantime i've done some more research. My conclusion is, that the crash doesn't depends on an application. It depends on the model and the size of the CCKD volume. I got this crash only on model 27 volumes. And the crash occurs only with large volumes. Below is a table with my largest model 27 volumes.

CUUSizeVolumeCU
0AB03.133.113.539DSKD19
0AAC1.709.341.690DSKD02
0AAD1.662.959.028DSKD09
0AEA1.073.056.019DSKA013990-6
0AE5890.734.070DSK0053990-6
0AE4761.210.724DSK0043990-6
0AE6739.819.109DSK0063990-6
0AE2508.524.255DSK003
0AE1365.887.585DSK002
0AE3311.177.372DSK004
128.229.866DSK001

Depending on the applications i run, the crash occurs only on model 27 with a size of above about 300.000.000.

Also, i run with all model 27 and the corresponding shadow files before and after the CCKDCDSK. No error before and after.

Would it be possible, if there is defined a variable, which is to small and overflows and switches to minus?

Erwin

erwinmars commented 7 years ago

Hi Fish, i tried your corrected version regarding the bug in the dasd cache logic, but no success. The crash occurs on the same place. What about my findings above.

I've uploaded the hercules log and the dump. [Uploading hercules_64_fish-161127- hercules_64_fish-161127-151502.zip 151502.zip…]()

Sorry for the bad news.

Erwin

Fish-Git commented 7 years ago

Since I am no longer a Hercules developer I am unable to help you any further with this issue. Please ask one of the Hercules developers to help you. I am very sorry. :(

erwinmars commented 7 years ago

Hi guys,

is there somebody out there, who can help me in the above issue. I'm a beginner regarding hercules internals. So i need really help from somebody with more knowledge.

Thanks in advance

Erwin

ivan-w commented 7 years ago

Could we get a summed up version of the actual issue. I know there is an actual issue with CCKD cache, but it usually involves list directed IPL (or at least non CCKD DASD ipl).

Greg & Fish (And possibly Mark as well) would be the most relevant people to help there.

--Ivan

On 11/28/2016 4:14 PM, Erwin Marschalk wrote:

Hi guys,

is there somebody out there, who can help me in the above issue. I'm a beginner regarding hercules internals. So i need really help from somebody with more knowledge.

Thanks in advance

Erwin

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/issues/147#issuecomment-263296666, or mute the thread https://github.com/notifications/unsubscribe-auth/ABjMW-mqcCo2K64DTG-xzInE9S_KZ_-Tks5rCu_ogaJpZM4JtaGt.

ivan-w commented 7 years ago

And by the way..

Are we through with the power tantrums.... get Fish back in the loop

And let this be over with.. we are no longer teenagers...(although we wish we were)

--Ivan

On 11/28/2016 4:14 PM, Erwin Marschalk wrote:

Hi guys,

is there somebody out there, who can help me in the above issue. I'm a beginner regarding hercules internals. So i need really help from somebody with more knowledge.

Thanks in advance

Erwin

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/issues/147#issuecomment-263296666, or mute the thread https://github.com/notifications/unsubscribe-auth/ABjMW-mqcCo2K64DTG-xzInE9S_KZ_-Tks5rCu_ogaJpZM4JtaGt.

erwinmars commented 7 years ago

Hi guys,

here is my summary. It needs a little bit longer, because i tested few changes in my configuration. But now, i'm pretty sure when it happens. The crash occurs if the garbage collector works on 3390 Model 27 disks, if the disk is compressed. I'm not 100% sure, if the CU is also involved. Half of the 3390-27 volumes have no CU parm, other half are defined with CU=3990-6. At this time, i've also converted all model 27 volumes to CKD type and hercules works fine.

The crash occurs in cckddasd.c at this line:

    for (i = 0; i < cckd->cdevhdr[sfx].numl1tab; i++) 
        if (fpos + cckd->free[i].len == (off_t)cckd->l1[sfx][i])      /* Crash */ 
            break; 

Because i where not sure about file errors, i run CCKDCDSK on all model 27 disks. I finds no error on any volume.

jphartmann commented 7 years ago

Not that I know anything about the device manager, but I see a cast. Get rid of it!

erwinmars commented 7 years ago

Some more infos. The crash occurs, if i'm creating/updating large files. For example if i upload a 250 MB file with IND$FILE or with unpacking large files with TRSMAIN. Also with other tools, dealing with large files are getting the crash.

Fish-Git commented 7 years ago

I suspect you are running out of room in your CCKD file.

The maximum supported size for a CCKD (compressed CKD) dasd image file is 4GB. The capacity of a 3390-27 dasd however is 27GB. I suspect your CCKD device dasd image (or shadow) files are slightly less than 4GB, and when you try adding "too much" new data to them (pushing them over the 4GB limit), the bug in CCKD's garbage collector logic ends up getting tripped and crash goes Hercules.

The 4GB limitation for CCKD file sizes has been a problem in Hercules for a very long time now. It really needs to be addressed by someone. I began looking into how hard it would be to add "64-bit" support to Hercules's CCKD logic myself, but as usual got sidetracked into other more pressing concerns.

I'm confident it'll happen eventually.

Precisely when however, I cannot say.  :(

In the mean time your only resolution (which is really just a workaround and not a fix of course) is to make frequent backups of your Hercules dasd and to closely monitor the size of your CCKD files. When you see one approaching the 4GB limit, use the sf+cuu (or sf+*) command to cause CCKD logic to begin writing updated tracks to a new (shadow) file, and then begin closely monitoring the size of that (those) file(s) for the same problem.

When you see your 8th shadow file (the maximum CCKD supports) approach the 4GB limit, you would then of course be left with no alternative but to convert your dasd at that time to normal uncompressed CKD type instead, just as you have already done.

Hope that helps!