Open erwinmars opened 8 years ago
Erwin:
I have finished analyzing your Crash Dump (Thank You!) and based on analysis it appears the crash occurred in the cckd garbage collector function where it is accessing the device's level-2 lookup tables (to try and find one to relocate).
This is strongly indicative of one or more potentially damaged dasd image files (or shadow files). Which one(s) I cannot say.
My suggested course of action would be for you to either:
a) Restore your Hercules dasd files from your backups.
Or, if you're foolish enough to not backup your system on a regular basis, to:
b) Use the Hercules cckdcdsk.exe
utility to try and repair the damage.
Note: You might need to try check level 2 (extra) or 3 (maximal) if cckdcdsk
level 1 (normal) fails to repair the damage.
Please let me know whether the above resolves your problem or not.
Thanks!
Microsoft's Debugging Tools for Windows analysis of your crash dump follows immediately below.
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS\Hercules.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available
WARNING: Minidump contains unknown stream type 0x15
WARNING: Minidump contains unknown stream type 0x16
Comment: '4.0.0.8597-g05652fb'
Comment: '(C) Copyright 1999-2016 by Roger Bowler, Jan Jaeger, and others'
Comment: 'Running on: EWH-PC (Windows-6.2.9200 Intel(R) x64) LP=8, Cores=4, CPUs=1'
Comment: 'Built with: Microsoft Visual C 150030729 1'
Comment: 'Build type: Windows MSVC AMD64 host architecture build'
Comment: 'Modes: S/370 ESA/390 z/Arch'
Comment: 'Max CPU Engines: 32'
Comment: 'Using Fish threads Threading Model'
Comment: 'Using Error-Checking Mutex Locking Model'
Comment: 'With Syncio support'
Comment: 'With Shared Devices support'
Comment: 'With Dynamic loading support'
Comment: 'Using shared libraries'
Comment: 'With External GUI support'
Comment: 'With Partial TCP keepalive support'
Comment: 'With IPV6 support'
Comment: 'With HTTP Server support'
Comment: 'With sqrtl support'
Comment: 'Without SIGABEND handler'
Comment: 'With CCKD BZIP2 support'
Comment: 'With HET BZIP2 support'
Comment: 'With ZLIB support'
Comment: 'With Regular Expressions support'
Comment: 'With Object REXX support'
Comment: 'Without Regina REXX support'
Comment: 'With Automatic Operator support'
Comment: 'Without National Language Support'
Comment: 'Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 hatomics=msvcIntrinsics'
Symbol search path is: C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS;srv*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS
Windows 7 Version 14393 MP (8 procs) Free x64
Product: WinNt, suite: SingleUserTS
Machine Name:
Debug session time: Thu Aug 25 07:13:29.000 2016 (UTC - 7:00)
System Uptime: not available
Process Uptime: 0 days 0:04:48.000
................................................................
........................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(bd0.3228): Access violation - code c0000005 (first/second chance not available)
ntdll!ZwGetContextThread+0x14:
00007ffc`ed346b04 c3 ret
0:041> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*** ERROR: Symbol file could not be found. Defaulted to export symbols for user32.dll -
Unable to load image C:\Windows\System32\LavasoftTcpService64.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for LavasoftTcpService64.dll
*** ERROR: Module load completed but symbols could not be loaded for LavasoftTcpService64.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for ws2_32.dll -
Unable to load image C:\Windows\System32\win32u.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for win32u.dll
*** ERROR: Module load completed but symbols could not be loaded for win32u.dll
Unable to load image C:\Program Files (x86)\Trusteer\Rapport\bin\x64\rooksbas_x64.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for rooksbas_x64.dll
*** ERROR: Module load completed but symbols could not be loaded for rooksbas_x64.dll
Unable to load image U:\z\ctci-w32\x64\bin\TunTap64.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for TunTap64.dll
*** ERROR: Module load completed but symbols could not be loaded for TunTap64.dll
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/hercules_exe/4_0_0_8597/57add362/hdasd_dll/4_0_0_8597/57add32d/c0000005/0000eef0.htm?Retriage=1
FAULTING_IP:
hdasd!cckd_gc_l2+340 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 5224]
00000000`0042eef0 418b08 mov ecx,dword ptr [r8]
EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000000000042eef0 (hdasd!cckd_gc_l2+0x0000000000000340)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000009a8d008
Attempt to read from address 0000000009a8d008
PROCESS_NAME: hercules.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 0000000009a8d008
READ_ADDRESS: 0000000009a8d008
FOLLOWUP_IP:
hdasd!cckd_gc_l2+340 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 5224]
00000000`0042eef0 418b08 mov ecx,dword ptr [r8]
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
FAULTING_THREAD: 0000000000003228
BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ_FILL_PATTERN_ffffffff
PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ_FILL_PATTERN_ffffffff
DEFAULT_BUCKET_ID: INVALID_POINTER_READ_FILL_PATTERN_ffffffff
LAST_CONTROL_TRANSFER: from 000000000042e459 to 000000000042eef0
STACK_TEXT:
00000000`0b4bfc40 00000000`0042e459 : 00000000`04bcb2c0 00000000`00000000 00000000`00010000 00000000`04bcb2b0 : hdasd!cckd_gc_l2+0x340 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 5224]
00000000`0b4bfca0 00000000`0042dfcd : 00000000`00000040 00000000`04c1e000 00000000`0042db00 00000000`04bc8a90 : hdasd!cckd_gc_percolate+0x159 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 4931]
00000000`0b4ffdc0 00000000`00efe424 : 00000000`00003228 00092a7f`57befcfd 00000000`57befcfe 00000000`7312334a : hdasd!cckd_gcol+0x4cd [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\cckddasd.c @ 4837]
00000000`0b4ffe80 00000000`00ef376a : 00000000`08d09b90 00000000`00000000 00000000`00000000 00000000`7312336b : hutil!hthread_func+0x34 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\hthreads.c @ 778]
00000000`0b4ffeb0 00000000`73122fdf : 00000000`09631390 00000000`00000000 00000000`00000000 00000000`00000000 : hutil!FTWin32ThreadFunc+0x5a [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_enrico\hercules-0\fthreads.c @ 787]
00000000`0b4fff00 00000000`73123080 : 00000000`731acad0 00000000`09631390 00000000`00000000 00000000`00000000 : msvcr90!_callthreadstartex+0x17 [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c @ 348]
00000000`0b4fff30 00007ffc`ecac8364 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : msvcr90!_threadstartex+0x84 [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c @ 326]
00000000`0b4fff60 00007ffc`ed305e91 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
00000000`0b4fff90 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
FAULTING_SOURCE_CODE:
5220: )
5221: )
5222: {
5223: for (i = 0; i < cckd->cdevhdr[sfx].numl1tab; i++)
> 5224: if (fpos + cckd->free[i].len == (off_t)cckd->l1[sfx][i])
5225: break;
5226: }
5227: else
5228: {
5229: for (i = 0; i < cckd->cdevhdr[sfx].numl1tab; i++)
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: hdasd!cckd_gc_l2+340
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: hdasd
IMAGE_NAME: hdasd.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 57add32d
STACK_COMMAND: ~41s; .ecxr ; kb
FAILURE_BUCKET_ID: INVALID_POINTER_READ_FILL_PATTERN_ffffffff_c0000005_hdasd.dll!cckd_gc_l2
BUCKET_ID: X64_APPLICATION_FAULT_INVALID_POINTER_READ_FILL_PATTERN_ffffffff_hdasd!cckd_gc_l2+340
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/hercules_exe/4_0_0_8597/57add362/hdasd_dll/4_0_0_8597/57add32d/c0000005/0000eef0.htm?Retriage=1
Followup: MachineOwner
---------
Hi Fish,
thanks, i'll try it. To save time, could you see the volume name, which is involved?
Thanks in advance
Erwin
Hi Fish,
i found the volume, although CCKDCDSK doesn't find any issue with that volume. But there is a catalog on that volume. If i process this catalog with a utility like export catalog i get also thhe crash dump.
I'll recreate the catalog.
Erwin
On 8/26/2016 5:43 PM, Erwin Marschalk wrote:
Hi Fish,
i found the volume, although CCKDCDSK doesn't find any issue with that volume. But there is a catalog on that volume. If i process this catalog with a utility like export catalog i get also thhe crash dump.
I'll recreate the catalog.
Erwin
Erwin,
It shouldn't crash.. hercules (or any of its utilities) should NEVER crash !
However, it may be crashing in a library outside of hercules (zlib ?) at which point, it is outside of our purview.
--Ivan
If i process this catalog with a utility like export catalog i get also thhe crash dump.
Erwin, what utility are you using? Is it a Hercules utility? If so, which one? dasdpdsu
perhaps?
Like Ivan said, none of the Hercules programs -- neither Hercules itself nor any of its utilities -- should ever crash! If a Hercules utility is crashing, then we need to find out why and fix it!
If the "export" utility you are using is a z/OS export utility, then can you provide us with some additional information regarding which z/OS dataset (catalog) it is and how it was created? Is it a private catalog? Or is it one that IBM supplied with z/OS?
I realize it may not be possible but I would very much like to obtain a copy of this dasd that you are having trouble with! If I can recreate your problem on my system then I could fix the problem much easier and much faster!
I'll recreate the catalog.
BEFORE YOU RECREATE YOUR CATALOG, please make a copy of that damaged volume! (Hercules dasd image file)
If we are going to eventually fix Hercules so it does not crash, we need to be able to test our change to verify that it fixes the problem. Your volume (Hercules dasd image file), Erwin, is the only one that exists that causes the problem, so it needs to be preserved. Thanks!
(Once the problem has been fixed, then you can destroy your bad volume. But please, not before! Thanks!)
Hi Fish, Ivan,
i checked the volume i expect, but CCKDCDSK -3 find nothing. From the LISTCAT of that volume, i select all volumes, which are related to this catalog and run also a CCKDCDSK with that volumes, but also here no errors. But with the LISTCAT on that catalog, i get a strange error message, but LISTCAT gets RC 0: 14.47.10 ADCD IGD306I UNEXPECTED ERROR DURING ?CBRXVOL PROCESSING RETURN CODE 20 REASON CODE 0 THE MODULE THAT DETECTED THE ERROR IS IGDCAT01 SMS MODULE TRACE BACK - CAT01 CAT00 SSIRT SYMPTOM RECORD CREATED, PROBLEM ID IS IGD00000 14.47.11 ADCD IGD306I UNEXPECTED ERROR DURING ?CBRXVOL PROCESSING RETURN CODE 20 REASON CODE 0 THE MODULE THAT DETECTED THE ERROR IS IGDCAT01 SMS MODULE TRACE BACK - CAT01 CAT00 SSIRT SYMPTOM RECORD CREATED, PROBLEM ID IS IGD00001
No Crash Dump with LISTCAT, but with an other IBM utility the crash dump is recreatable. From my past hercules experience i expect, its an issue with SMS controlled volumes.
Btw. if we would talk about the "other IBM utility" i prefer to do it offline.
Erwin
IGD306I UNEXPECTED ERROR DURING ?CBRXVOL PROCESSING
Do either of the following make any sense to you? (I am not a z/OS person!)
Did you lock your catalog and forget to unlock it?
(even if you did though, Hyperion should not ever crash!)
Btw. if we would talk about the "other IBM utility" i prefer to do it offline.
Understood.
I prefer keeping everything open and public, but I understand sometimes sensitive issues arise that require processing hidden from public view,
Feel free to contact me directly if/when you feel you must/should.
Please also see pg. 464 of:
FYI: IBM's Library Server (where you can find almost any manual) is here:
Hope that helps!
Regarding IGD306I you are right. I used parts from my catalog recovery job, which locks the catalog. But because it was not complete, it was no unlocked. Sorry for that.
One note more. All related volumes are defined with cu=3990-6.
Hi Fish,
in the meantime i've done some more research. My conclusion is, that the crash doesn't depends on an application. It depends on the model and the size of the CCKD volume. I got this crash only on model 27 volumes. And the crash occurs only with large volumes. Below is a table with my largest model 27 volumes.
CUU | Size | Volume | CU |
---|---|---|---|
0AB0 | 3.133.113.539 | DSKD19 | |
0AAC | 1.709.341.690 | DSKD02 | |
0AAD | 1.662.959.028 | DSKD09 | |
0AEA | 1.073.056.019 | DSKA01 | 3990-6 |
0AE5 | 890.734.070 | DSK005 | 3990-6 |
0AE4 | 761.210.724 | DSK004 | 3990-6 |
0AE6 | 739.819.109 | DSK006 | 3990-6 |
0AE2 | 508.524.255 | DSK003 | |
0AE1 | 365.887.585 | DSK002 | |
0AE3 | 311.177.372 | DSK004 | |
128.229.866 | DSK001 |
Depending on the applications i run, the crash occurs only on model 27 with a size of above about 300.000.000.
Also, i run with all model 27 and the corresponding shadow files before and after the CCKDCDSK. No error before and after.
Would it be possible, if there is defined a variable, which is to small and overflows and switches to minus?
Erwin
Hi Fish, i tried your corrected version regarding the bug in the dasd cache logic, but no success. The crash occurs on the same place. What about my findings above.
I've uploaded the hercules log and the dump. [Uploading hercules_64_fish-161127- hercules_64_fish-161127-151502.zip 151502.zip…]()
Sorry for the bad news.
Erwin
Since I am no longer a Hercules developer I am unable to help you any further with this issue. Please ask one of the Hercules developers to help you. I am very sorry. :(
Hi guys,
is there somebody out there, who can help me in the above issue. I'm a beginner regarding hercules internals. So i need really help from somebody with more knowledge.
Thanks in advance
Erwin
Could we get a summed up version of the actual issue. I know there is an actual issue with CCKD cache, but it usually involves list directed IPL (or at least non CCKD DASD ipl).
Greg & Fish (And possibly Mark as well) would be the most relevant people to help there.
--Ivan
On 11/28/2016 4:14 PM, Erwin Marschalk wrote:
Hi guys,
is there somebody out there, who can help me in the above issue. I'm a beginner regarding hercules internals. So i need really help from somebody with more knowledge.
Thanks in advance
Erwin
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/issues/147#issuecomment-263296666, or mute the thread https://github.com/notifications/unsubscribe-auth/ABjMW-mqcCo2K64DTG-xzInE9S_KZ_-Tks5rCu_ogaJpZM4JtaGt.
And by the way..
Are we through with the power tantrums.... get Fish back in the loop
And let this be over with.. we are no longer teenagers...(although we wish we were)
--Ivan
On 11/28/2016 4:14 PM, Erwin Marschalk wrote:
Hi guys,
is there somebody out there, who can help me in the above issue. I'm a beginner regarding hercules internals. So i need really help from somebody with more knowledge.
Thanks in advance
Erwin
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/issues/147#issuecomment-263296666, or mute the thread https://github.com/notifications/unsubscribe-auth/ABjMW-mqcCo2K64DTG-xzInE9S_KZ_-Tks5rCu_ogaJpZM4JtaGt.
Hi guys,
here is my summary. It needs a little bit longer, because i tested few changes in my configuration. But now, i'm pretty sure when it happens. The crash occurs if the garbage collector works on 3390 Model 27 disks, if the disk is compressed. I'm not 100% sure, if the CU is also involved. Half of the 3390-27 volumes have no CU parm, other half are defined with CU=3990-6. At this time, i've also converted all model 27 volumes to CKD type and hercules works fine.
The crash occurs in cckddasd.c at this line:
for (i = 0; i < cckd->cdevhdr[sfx].numl1tab; i++)
if (fpos + cckd->free[i].len == (off_t)cckd->l1[sfx][i]) /* Crash */
break;
Because i where not sure about file errors, i run CCKDCDSK on all model 27 disks. I finds no error on any volume.
Not that I know anything about the device manager, but I see a cast. Get rid of it!
Some more infos. The crash occurs, if i'm creating/updating large files. For example if i upload a 250 MB file with IND$FILE or with unpacking large files with TRSMAIN. Also with other tools, dealing with large files are getting the crash.
I suspect you are running out of room in your CCKD file.
The maximum supported size for a CCKD (compressed CKD) dasd image file is 4GB. The capacity of a 3390-27 dasd however is 27GB. I suspect your CCKD device dasd image (or shadow) files are slightly less than 4GB, and when you try adding "too much" new data to them (pushing them over the 4GB limit), the bug in CCKD's garbage collector logic ends up getting tripped and crash goes Hercules.
The 4GB limitation for CCKD file sizes has been a problem in Hercules for a very long time now. It really needs to be addressed by someone. I began looking into how hard it would be to add "64-bit" support to Hercules's CCKD logic myself, but as usual got sidetracked into other more pressing concerns.
I'm confident it'll happen eventually.
Precisely when however, I cannot say. :(
In the mean time your only resolution (which is really just a workaround and not a fix of course) is to make frequent backups of your Hercules dasd and to closely monitor the size of your CCKD files. When you see one approaching the 4GB limit, use the sf+cuu
(or sf+*
) command to cause CCKD logic to begin writing updated tracks to a new (shadow) file, and then begin closely monitoring the size of that (those) file(s) for the same problem.
When you see your 8th shadow file (the maximum CCKD supports) approach the 4GB limit, you would then of course be left with no alternative but to convert your dasd at that time to normal uncompressed CKD type instead, just as you have already done.
Hope that helps!
After upgrading some products, i always get a crash dump starting that application. It has occured with 3.12, but also with 4.00. I retest it with 4.00.8597 the same result. I've downloaded the current source, but i'm not so familiar with the VC 2015 debugging environment, so i've no idea, where the issue occurs.
I've attached the log file, where the crash dump occurs and also the crash dump.
Thanks for any help.
Erwin hercules_64_400-160825-160841.txt
Hercules.dmp.zip