larcenists / larceny

Larceny Scheme implementation
Other
203 stars 32 forks source link

current-utc-time segfaults on one MacOS X machine #700

Closed WillClinger closed 7 years ago

WillClinger commented 9 years ago

On a Mac mini running a recently installed MacOS X 10.8.5 (Mountain Lion) with Apple Developer command-line tools installed soon afterwards:

% ./larceny
Larceny v0.98b3 (Feb  1 2015 20:19:44, precise:Posix:unified)
larceny.heap, built on Sun Feb  1 20:21:36 EST 2015

> (require 'time)
#t

> (current-utc-time)
Segmentation fault: 11

There is no segfault on this machine when the above is done with the binary distribution of v0.97, and there is no segfault on other MacOS X machines running Snow Leopard and earlier versions of MacOS X. There is no segfault on Linux machines either.

This may be related to ticket #659 (import order sensitivity when mixing r6rs and FFI), but this segfault can't have anything to do with import order. SRFI 27 and SRFI 19 both call current-utc-time.

WillClinger commented 9 years ago

The segfaulting machine is a Macmini3,1 (late 2009). According to Wikipedia, Snow Leopard boots in 32-bit mode on that machine. Mountain Lion is 64-bit only.

Of the three Macintosh systems I'm using for development, the segfaulting machine is the only one running a 64-bit OS kernel. The MacBook Pro I'm using should be capable of running a 64-bit kernel, but is running Snow Leopard (MacOS X 10.6.8) in 32-bit mode.

That doesn't necessarily explain anything, because current-utc-time works fine on 64-bit Linux. It does mean I need to test current-utc-time on more recent Macintosh hardware running more recent versions of MacOS X.

WillClinger commented 9 years ago

On the segfaulting machine, it looks as though all procedures defined by foreign-procedure (which is itself defined in "lib/Ffi/ffi-upper.sch") are segfaulting:

% ./larceny
Larceny v0.98b3 (Feb  1 2015 20:19:44, precise:Posix:unified)
larceny.heap, built on Sun Feb  1 20:21:36 EST 2015

> (require 'file-system)
#t

> (list-directory "/tmp")
Segmentation fault: 11
% ./larceny
Larceny v0.98b3 (Feb  1 2015 20:19:44, precise:Posix:unified)
larceny.heap, built on Sun Feb  1 20:21:36 EST 2015

> (require 'unix)
#t

> (temporary-file-name)
Segmentation fault: 11

On the same machine, all of those work in the v0.97 binary distribution.

WillClinger commented 9 years ago

I created a v0.98b3 binary distribution on a 32-bit Mac mini and copyied it to the segfaulting Mac mini. The FFI procedures mentioned above work fine on the segfaulting Mac mini, so it's definitely a problem with the build-time libraries/environment rather than the run-time libraries.

That's a clue, but it's also a possible work-around if I can't get this fixed quickly. If necessary, we could build v0.98 for MacOS X on a 32-bit machine and make that the binary distribution. We'd have to warn users that a 32-bit machine must be used when building the MacOS X version from source.

Both "lib/Base/std-ffi.sch" and "src/Rts/Sys/ffi.c" contain hard-coded assumptions that void* is 32 bits. That assumption works so long as the system is build on a 32-bit machine. (I suspect it's enough to build larceny.bin on a 32-bit machine, but I'll have to check that.)

The FFI works in systems built on 64-bit Linux machines, but we had to install some 32-bit packages before that would work. We probably need to do something similar for MacOS X.

WillClinger commented 8 years ago

Building the public releases on 32-bit systems continues to work, and I don't want this to delay release of v0.99, so I'm changing the milestone once again.

WillClinger commented 7 years ago

Fixed by changeset 9df3126016f3ab93d3b29994264e7ff4db8168a7

This turned out to be a bug in the C compiler. As of MacOS X 10.8, gcc is an alias for the clang compiler, which uses an LLVM back end. A stack-allocated array whose element type was a union with double as one possibility wasn't always being allocated on a 4-byte boundary.

Adding print statements or other code to the C function occasionally fixed the problem when compiled on one machine while creating the problem when compiled on a different machine. That led me to suspect some sort of alignment problem, which was confirmed by printing the addresses of various variables and pointers.