hercules-390 / hyperion

Hercules 390
Other
246 stars 69 forks source link

Hercules segment faults on sfd or quit commands when built on certain 32-bit Linux systems without large file support. #231

Open srorso opened 6 years ago

srorso commented 6 years ago

Steps to reproduce:

1) clone Hercules 2) ./1Stop --disable-largefile 3) Start Hercules with any operating system 4) at the Hercules console, issue sfd <ipl-addr>

where is the cuu or subchannel of the system residence volume being ipl'd.

The result is a segment fault. See the following gdb session log excerpt:

HHC01603I ipl 190
HHC00801I Processor CP00: Addressing exception code 0005  ilc 6
HHC02324I PSW=00040005C0003044 INST=D2FFB000F9BA MVC   0(256,11),2490(15)     move_character
HHC02326I R:001FFFBD:K:06=000000 00000000 00000000 00000000 00 ................
HHC02326I R:000039BC:K:06=00000000 00000000 00000000 00000000  ................
HHC02269I GR00=00000000 GR01=00003088 GR02=00000000 GR03=00000000
HHC02269I GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000
HHC02269I GR08=00000000 GR09=00000000 GR10=00000000 GR11=001FFFBD
HHC02269I GR12=00000000 GR13=00000000 GR14=00000000 GR15=40003002
HHC00107I Starting thread cckd_ra(), active=0, started=0, max=2
HHC00100I Thread id b2fedb40, prio 2147483647, name Read-ahead thread-1 started
HHC00107I Starting thread cckd_ra() from cckd_ra(), active=1, started=1, max=2
HHC00100I Thread id b27ecb40, prio 2147483647, name Read-ahead thread-2 started
HHC01603I sfd 190
HHC00333I 0:0190           size free  nbr st   reads  writes l2reads    hits switches
herc =====>
Thread 10 "hercules" received signal SIGSEGV, Segmentation fault.                     instcnt 54,812; mips 0.000; I/O      0
[Switching to Thread 0xb1febb40 (LWP 18469)]
0xb775a383 in _IO_vfprintf_internal (s=0xb1feb020, format=<optimized out>, ap=0xb1feb244 "") at vfprintf.c:1632
1632    vfprintf.c: No such file or directory.
(gdb) bt
#0  0xb775a383 in _IO_vfprintf_internal (s=0xb1feb020, format=<optimized out>, ap=0xb1feb244 "") at vfprintf.c:1632
#1  0xb780b21c in ___vsnprintf_chk (s=0xb5a00648 "HHC00339I 0:0190 [0] 0005988552 000 % -1214836986 ",
    maxlen=<optimized out>, flags=1, slen=4294967295,
    format=0xb7972048 "HHC00339I %1d:%04X [0] %10.10lld %3.3lld %% %4.4d %s %7.7d %7.7d %7.7d\n", args=0xb1feb224 "")
    at vsnprintf_chk.c:63
#2  0xb791d895 in vsnprintf (__ap=<optimized out>,
    __fmt=0xb7972048 "HHC00339I %1d:%04X [0] %10.10lld %3.3lld %% %4.4d %s %7.7d %7.7d %7.7d\n", __n=1024,
    __s=0xb5a00648 "HHC00339I 0:0190 [0] 0005988552 000 % -1214836986 ") at /usr/include/i386-linux-gnu/bits/stdio2.h:77
#3  vfwritemsg (f=f@entry=0xb78c8d60 <_IO_2_1_stdout_>,
    filename=filename@entry=0xb7971b34 "/home/srorso/Hercules/hyperion/cckddasd.c", line=line@entry=4627,
    func=0xb7975db8 <__FUNCTION__.34883> "cckd_sf_stats",
    fmt=0xb7972048 "HHC00339I %1d:%04X [0] %10.10lld %3.3lld %% %4.4d %s %7.7d %7.7d %7.7d\n", vl=<optimized out>)
    at /home/srorso/Hercules/hyperion/logmsg.c:341
#4  0xb791ddf4 in fwritemsg (f=0xb78c8d60 <_IO_2_1_stdout_>,
    filename=0xb7971b34 "/home/srorso/Hercules/hyperion/cckddasd.c", line=4627,
    func=0xb7975db8 <__FUNCTION__.34883> "cckd_sf_stats",
    fmt=0xb7972048 "HHC00339I %1d:%04X [0] %10.10lld %3.3lld %% %4.4d %s %7.7d %7.7d %7.7d\n")
    at /home/srorso/Hercules/hyperion/logmsg.c:435
#5  0xb79401ba in cckd_sf_stats (data=0x8097000) at /home/srorso/Hercules/hyperion/cckddasd.c:4623
#6  0xb7919a53 in hthread_func (arg2=0x80a0d48) at /home/srorso/Hercules/hyperion/hthreads.c:777
#7  0xb78d2295 in start_thread (arg=0xb1febb40) at pthread_create.c:333
#8  0xb77fd05e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114

(The addressing exception in the above log is a normal result of the operating system finding the end of installed main storage.)

The gdb console log above was created on Ubuntu 16.04 LTS; emulated operating system is DOS/360. The working assumption is that the issue is operating system-independent and occurs when there are non-zero I/O counts for a device queried by the sfd console command, or when quit is issued and any compressed disk device has a non-zere I/O count.

A look at local variables in frame 3, above, (vfwritemsg), shows a partially-built device compression statistics message with really interesting values (see the variable bfr):

(gdb) f 3
#3  vfwritemsg (f=f@entry=0xb78c8d60 <_IO_2_1_stdout_>,
    filename=filename@entry=0xb7971b34 "/home/srorso/Hercules/hyperion/cckddasd.c", line=line@entry=4627,
    func=0xb7975db8 <__FUNCTION__.34883> "cckd_sf_stats",
    fmt=0xb7972048 "HHC00339I %1d:%04X [0] %10.10lld %3.3lld %% %4.4d %s %7.7d %7.7d %7.7d\n", vl=<optimized out>)
    at /home/srorso/Hercules/hyperion/logmsg.c:341
341     BFR_VSNPRINTF();  // Note: uses 'vl', 'bfr', 'siz', 'fmt' and 'rc'.
(gdb) info arg
f = 0xb78c8d60 <_IO_2_1_stdout_>
filename = 0xb7971b34 "/home/srorso/Hercules/hyperion/cckddasd.c"
line = 4627
func = 0xb7975db8 <__FUNCTION__.34883> "cckd_sf_stats"
fmt = 0xb7972048 "HHC00339I %1d:%04X [0] %10.10lld %3.3lld %% %4.4d %s %7.7d %7.7d %7.7d\n"
vl = <optimized out>
(gdb) info locals
original_vl = 0xb1feb224 ""
prefix = '\000' <repeats 31 times>
bfr = 0xb5a00648 "HHC00339I 0:0190 [0] 0005988552 000 % -1214836986 "
rc = -1
siz = 1024
msgbuf = <optimized out>
msglen = <optimized out>
bufsiz = <optimized out>
__FUNCTION__ = "vfwritemsg"

Repeatable on Debian 8.6 (jessie, 32-bit), Debian 9.0 (stretch, 32-bit), Ubuntu 16.04 LTS (Xenial Xerus, 32-bit). Hercules does not segment fault or sfd <dev-addr> when built on FreeBSD 11.0 (32-bit) without large file support. 64-bit systems always (?) include large file support.

This issue was first revealed because the CMake build for Hercules did not correctly set variables needed for large file support on 32-bit systems. But the issue exists when Hercules is built using GNU autotools and --disable-largefile.

I leave the question of how much attention this requires to others. I am uncertain how many 32-bit systems do not support LFS, nor do I know if the segment fault arises on a 32-bit system that actually does not support LFS, as opposed to trying to turn it off.

srorso commented 6 years ago

I was content to leave this issue be once I saw that it was not caused by a CMake build. I am not skilled in the ways of cckddasd.c and had no wish to be distracted from completing a working CMake build for Hercules.

But the research needed to correct the CMake build also revealed what caused this issue in the first place. It would be churlish of me not to share.

File hmacro.h at line 364 deciphers what kind of large file support should be used by Hercules. Options include full support (off_t > 4, which means for the moment off_t = 8), transitional support, which uses alternative functions and an alternative offset type (off64_t) for file access, or no support.

See the document http://www.unix.org/version2/whatsnew/lfs20mar.html for a description of the standards defined by the Large File Summit, in particular the macros _LFS_LARGEFILE and _LFS64_LARGEFILE. Neither macro enables large file support; they just advertise availability of that support.

Unfortunately, hmacros,h only tests whether the target system has large file support available. It does not test whether that support is enabled. On 64-bit systems, on FreeBSD 32-bit systems since FreeBSD 2.0 (ca. 1994), and (I think) Apple Darwin systems since mac OS 10.0, that support is always enabled, so no problem.

On 32-bit GNU/Linux and Solaris systems, large file support is enabled only when _FILE_OFFSET_BITS is set to 64. Absent that define, 32-bit GNU/Linux and Solaris systems expose a 32-bit off_t and do not support large files. Header file hmacros.h assumes that such 32-bit GNU/Linux and Solaris systems do support large files when _LFS_LARGEFILE is present even though _FILE_OFFSET_BITS is missing and large file support is not enabled. This sets up Hercules to fail.

AIX uses a different macro for large file enablement, but the concept is the same.

I will craft an update to hmacros.h.