hercules-390 / hyperion

Hercules 390
Other
248 stars 67 forks source link

Crash dump Hercules 4.0.0 rc0 #178

Open mgutzwiller opened 7 years ago

mgutzwiller commented 7 years ago

I'm trying to install OS/360 following the directions at www.conmicro.com/hercos360, and Hercules crashes during the HASP installation stage, I03LOAD.

Attached is the crash dump and console log. Hercules-crash-con-log.txt Hercules-crash-dump.zip

Update - I followed the advice, did make clean before rebuilding for each -O1 and -O2. With both options, the program crashed.

UPDATE - Commit 52f4a4a - I built and tested this with the default optimization and using configure --enable-opimization=-O3 and --enable-opimization="-O3". The OS/360 install went smoothly, No crashes occurred.

jphartmann commented 7 years ago

For the dump to be of any use, you must compile Hyperion with -g. Then you have to unravel autolib's obfuscation of what executable you are really running. Then you must ensure that you can in fact create a core dump ulimit -c unlimited. Then the command is

gdp -q <executable> core

Assuming the core dump is indeed named core. Each distribution has its own idiosyncrasies. Once all of that is done, the gdb command is where.

mgutzwiller commented 7 years ago

John, Thanks for the instructions. The crash dump I uploaded was, I guess, a Windows Mini dump. It was done using the binaries on the releases page for this project, 64-bit, I believe. Making major leaps of what ever, it probably contains lots of useful information, if you have years to dig through it.

I'll build it on Ubuntu 16.04. I really hate having multiple executable versions on my machine, so it will be the only version on the machine. It will be the release tagged RC0.

jphartmann commented 7 years ago

You do not need to install Hercules to test it. Just do make and go to the build directory and issue ./hercules to run.

mgutzwiller commented 7 years ago

here is the out put of the gdb where command:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `hercules -f mvt.cnf'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0  0x00007f169039e269 in raise (sig=sig@entry=5)
    at ../sysdeps/unix/sysv/linux/pt-raise.c:35
35  ../sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f168e817700 (LWP 19113))]
(gdb) where
#0  0x00007f169039e269 in raise (sig=sig@entry=5)
    at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x00007f1690d41538 in ScheduleIORequest (dev=0x1df3000)
    at /home/mark/source/hercules/hyperion/channel.c:2435
#2  schedule_ioq (regs=regs@entry=0x0, dev=dev@entry=0x1df3000)
    at /home/mark/source/hercules/hyperion/channel.c:2681
#3  0x00007f1690d40753 in schedule_ioq (dev=0x1df3000, regs=0x0)
    at /home/mark/source/hercules/hyperion/channel.c:2619
#4  s370_execute_ccw_chain (arg=arg@entry=0x1df3000)
    at /home/mark/source/hercules/hyperion/channel.c:4962
#5  0x00007f1690d40dd1 in call_execute_ccw_chain (arch_mode=<optimized out>, 
    pDevBlk=pDevBlk@entry=0x1df3000)
    at /home/mark/source/hercules/hyperion/channel.c:6195
#6  0x00007f1690d4136d in schedule_ioq (regs=<optimized out>, 
    dev=dev@entry=0x1df3000)
    at /home/mark/source/hercules/hyperion/channel.c:2676
#7  0x00007f1690d45502 in schedule_ioq (dev=0x1df3000, regs=<optimized out>)
    at /home/mark/source/hercules/hyperion/channel.c:2619
#8  s370_startio (regs=regs@entry=0x7f1680001000, dev=0x1df3000, 
    orb=orb@entry=0x7f168e816d10)
    at /home/mark/source/hercules/hyperion/channel.c:4061
#9  0x00007f1690f696ea in s370_start_io (inst=0x7f168de17ada "\234", 
    regs=0x7f1680001000) at /home/mark/source/hercules/hyperion/io.c:1062
#10 0x00007f1690db2619 in s370_run_cpu (cpu=<optimized out>, 
    oldregs=<optimized out>) at /home/mark/source/hercules/hyperion/cpu.c:1836
#11 0x00007f1690da82df in cpu_thread (ptr=ptr@entry=0x7ffeb56acb9c)
    at /home/mark/source/hercules/hyperion/cpu.c:1287
#12 0x00007f16907c7800 in hthread_func (arg2=0x1dddfa0)
    at /home/mark/source/hercules/hyperion/hthreads.c:777
#13 0x00007f16903946ba in start_thread (arg=0x7f168e817700)
    at pthread_create.c:333
#14 0x00007f16900ca82d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb)
ivan-w commented 7 years ago

Channel.c has an explicit trap at this line. Considering the name of the macro invoked, I am wondering if this could be a leftover of someone's attempt at debugging some issue.

dasdman commented 7 years ago

The BREAK_INTO_DEBUGGER macro should NOT be raising a signal in normal operation within channel.c, but letting whatever continue on, but that should be opened as a separate incident.

That said, this is a "normal" planned condition as it normally CANNOT readily be debugged without a debugger and a readily repeatable test as when this has previously occurred, the control blocks have already been munged (the original condition that lead to this has already been overwritten; in two cases I tracked the problem down to host OS errors, which lead to inserting the BREAK_INTO_DEBUGGER call in the first place).

Please note the comment for the block:

If DEVBLK already in queue, fail queuing of the DEVBLK.

If a repeatable scenario can be developed,it would be greatly appreciated, as at this point in code, we're well past the actual error, and that is the error that actually needs to be found.

Sidebar: BREAK_INTO_DEBUGGER calls need to be split into two types, IGNORE and TRAP, based on usage. There are parts within Hercules, such as channel.c, where the normal should be IGNORE, while others should be TRAP (such as used in the UNREACHABLE_CODE macro).

ivan-w commented 7 years ago

I don't know.. looking at dbgtrace.h, BREAK_INTO_DEBUGGER unconditionally raises a SIGTRAP (unless I've missed something)

dasdman commented 7 years ago

Ivan: Did you read the LAST paragraph (the sidebar) closely? The macro BREAK_INTO_DEBUGGER needs to be split into two different forms for when the debugger is not present.

jphartmann commented 7 years ago

Sounds like it, Ivan.

        /* If DEVBLK already in queue, fail queueing of DEVBLK */
        if (ioq == dev)
        {
            rc = 2;
            BREAK_INTO_DEBUGGER();
            break;
        }

At first glance, this looks to me like we have a case of device/subchannel busy. If so, the channel should reject the channel program with appropriate device status.

Since it is there, you might as well print device_resume.

You might also print *dev, but please paste the result into a bracket of three backticks so the output is not totally garbled.

Assigning to Dasdman since he is to blame according to github.

Mark, I know you are very busy. If you can tell us what should happen, we can try to hack it.

jphartmann commented 7 years ago

Mark, it certainly raised SIGTRAP, which what is behind assert().

BREAK_INTO_DEBUGGER sounds Windowese to me.

dasdman commented 7 years ago

1) Please see my prior notes.

2) Interim is to comment out the BREAK_INTO_DEBUGGER to let the rc flow back to the caller; printing out at this time is next to meaningless as what one needs to see has already been damaged (the original error condition) -- that which permitted the devblk to appear to be schedulable when it was not.

ivan-w commented 7 years ago

The issue is that, basically, this shouldn't happen.

If the device is busy, then SIO should have given CC=2 from the get go - and the process should never had gone this far.

This is similar to a problem that existed a couple of years ago - which was caused by a confusion between the various DEVBLK flags, including those that were used by Shared CCKD support.

dasdman commented 7 years ago

FYI. Tests need to be rerun WITHOUT SYNCIO.

dasdman commented 7 years ago

In terms of BREAK_INTO_DEBUGGER, this function was (previously) working with gdb and Eclipse for taking breaks.

jphartmann commented 7 years ago

With GDB it would require that the process was already attached to GDB for SIGTAP to be intercepted. It will cause a dump otherwise, as we see here.

dasdman commented 7 years ago

Just comment out the two debugger breaks. It's not worth an argument at this point in time.

jphartmann commented 7 years ago

I think I nulled the offending macro. Please retest.

jphartmann commented 7 years ago

Mark, did you push that update or was it outside channel.c?

mgutzwiller commented 7 years ago

Okay I just tried it, and it crashed. Attached is [Uploading channel-rollback.txt.tar.gz…]() the output of the 'where full' command. I guess their are too many angle brackets for it to be pasted cleanly. Before I saw this, I had used "../../hyperion/configure --enable-debug --enable-optimization=-O0" and when built, it worked okay.

jphartmann commented 7 years ago

Please update the issue rather than append to the forum (if that is what you did). The mail I received is unreadable. Also, please enclose the gdb output in three backticks (<output>). Thank you.

On 04/18/2017 03:23 PM, mgutzwiller wrote:

Okay, I have tested the update. I it crashed. Here is the output from the gdb 'where full'. Before I saw this update I had built and tested using the "../../hyperion/configure --enable-debug --enable-optimization=-O0", and it ran okay.

jphartmann commented 7 years ago

If your previous optimization flag was -O3, then please try with -O2.

ivan-w commented 7 years ago

I don't understand.

Just did a git pull, and BREAK_INTO_DEBUGGER is still there in channel.c line 2435 and BREAK_INTO_DEBUGGER still raises SIGTRAP in dbgtrace.h

jphartmann commented 7 years ago

Mark said he had dealt with it at the point of definition, so I pulled my update to channel.c.

However, there is no evidence that this update was pushed, so I fixed it globally. I also replaced -O3 with -O2, since -O3 has been shown to cause all kinds of problems.

ivan-w commented 7 years ago

DO NOT remove -O3 !!! PLEASE !

jphartmann commented 7 years ago

Ivan, if you want -O3, then tell your configure to do so. I've been telling mine to do -O2 for yonks. Why the big huhu?

On 04/18/2017 04:41 PM, Ivan Warren wrote:

DO NOT remove -O3 !!! PLEASE !

ivan-w commented 7 years ago

So why change anything ?

ivan-w commented 7 years ago

Anyway, the BREAK_INTO_DEBUGGER is a red herring... We should never get there in channel.c. Disabling the effects of that macro is just hiding an existing problem.

jphartmann commented 7 years ago

For testing, please clean out old objects and redo autogen.sh. So if you are in the source directory:

cd <build directory>
make clean
cd -
./1Stop
jphartmann commented 7 years ago

Ivan, please read this issue from the top. BREAK_INTO_DEBUGGER springing a SIGTRAP is the whole reason for the issue. Red herring, my foot!

ivan-w commented 7 years ago

Red herring it is.

This should NEVER occur.

The fact that we are reaching this portion of code is the problem.

The fact is it causing a core dump is a side effect.

If we let it through, chances are another problem will occur way later and be even harder to understand.

Hiding a problem doesn't make it go away.

--Ivan

jphartmann commented 7 years ago

Correct, Ivan, it should not occur. And it does not occur when compiled -O0, as perusal of the previous appends will show you. It does occur with -O3.

When it does occur, the intent was to enter a debugger to eyeball it and then return an error. However, on UNIX, the call that Fish employed causes an abort that looks like an assert; but this was presumably not his intent.

Please read the entire issue rather than pounce on each append in isolation. Thank you.

ivan-w commented 7 years ago

Do as you wish. Remove the trap, compile everything with -O0 and hide your head in the sand.

Blaming the compiler or the optimizer is only a last resort, and in 15 odd years working on herc, I've only seen it happen twice : once on linux with the ill fated RH 2.96 gcc compiler, and another time with MSVC which had a wrongly coded intrinsic - a bug then confirmed by MS and fixed on the next version.

jphartmann commented 7 years ago

BREAK_INTO_DEBUGGER is a lot more than a read herring.

Anyhow I reverted essentially back to my original commit.

I cannot build at the moment. You may need to wait for Steve to sort out configure et al. Otherwise give it a try.

jphartmann commented 7 years ago

Steve fixed configure and I can do make check. Please retest.

mgutzwiller commented 7 years ago

I tested with commit ca29dfa, and the install finished without any problems. I built it without any other options other than --enable-debug.