mgsloan / store

Fast binary serialization in Haskell
MIT License
109 stars 35 forks source link

store-0.2.1.2 on OpenBSD 6.0 #85

Closed seanwestfall closed 7 years ago

seanwestfall commented 7 years ago

Hey guys, currently trying to cabal install stack on OpenBSD 6.0 on an AMD64 virtual machine, and store-0.2.1.2 is one of it's dependencies, and currently store not compiling is one of the roadblocks I keep running into:

stack-1.2.0 depends on store-0.2.1.2 which failed to install.
store-0.2.1.2 failed during the building phase. The exception was:
ExitFailure (-10)
)
# cabal install store-0.2.1.2
Resolving dependencies...
Configuring store-0.2.1.2...
Building store-0.2.1.2...
Preprocessing library store-0.2.1.2...
[ 1 of 10] Compiling System.IO.ByteBuffer ( src/System/IO/ByteBuffer.hs, dist/build/System/IO/ByteBuffer.o )
[ 2 of 10] Compiling Data.Store.Impl  ( src/Data/Store/Impl.hs, dist/build/Data/Store/Impl.o )
[ 3 of 10] Compiling Data.Store.TH    ( src/Data/Store/TH.hs, dist/build/Data/Store/TH.o )
[ 4 of 10] Compiling Data.Store.TH.Internal ( src/Data/Store/TH/Internal.hs, dist/build/Data/Store/TH/Internal.o )
[ 5 of 10] Compiling Data.Store.Internal ( src/Data/Store/Internal.hs, dist/build/Data/Store/Internal.o )
Failed to install store-0.2.1.2
cabal: user error (Error: some packages failed to install:
store-0.2.1.2 failed during the building phase. The exception was:
ExitFailure (-10)
)

breaks at step 5 out of 10. Is this possibly related to this: Store does not currently work at all on architectures which lack efficient unaligned memory access (for example, older ARM processors). This is not a fundamental limitation, but we do not currently require ARM or PowerPC support. See #37 and #47. I'm trying to compile this on an amd64 virtual machine equipped with AMD Athlon64 processors, aka a AMD 64 bit machine.

Any and all help would be really appreciated. Is it possible at all to get more detailed error messages, so I can better debug this. Like it would be really convenient to see which lines it's failing on. Thanks.

mgsloan commented 7 years ago

I've updated the README to no longer include that text, as #37 is resolved. Not sure what can cause ExitFailure (-10) :/

seanwestfall commented 7 years ago

is it possible to turn on a more descriptive error message from cabal? Something that will mention which line it's failing on, instead of just a generic failure message?

Blaisorblade commented 7 years ago

That's not easy, because some process is being killed by a signal (maybe SIGBUS, at least it's so on OS X?). So one would need need to enable core dumps with ulimit -c unlimited, find the generated core file, and get a backtrace with some OpenBSD debugger (does gdb work there?).

@mgsloan but do we know whence SIGBUS could came? Does store use OS-specific primitives? I know SIGBUS can be due to misalignment, but that shouldn't happen on AMD64...

seanwestfall commented 7 years ago

@Blaisorblade No, it's not on an OS X, it's on a virtual machine:

# grep -i cpu /var/run/dmesg.boot
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.1.2, 2500.35 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,POPCNT,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: apic clock running at 1000MHz
acpicpu at acpi0 not configured
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.1.2, 2500.28 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,POPCNT,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: apic clock running at 1000MHz
acpicpu at acpi0 not configured
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.1.2, 2500.29 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,POPCNT,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: apic clock running at 1000MHz
acpicpu at acpi0 not configured
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.1.2, 2500.36 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,POPCNT,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: apic clock running at 1000MHz
acpicpu at acpi0 not configured
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.1.2, 2500.32 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,POPCNT,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: apic clock running at 1000MHz
acpicpu at acpi0 not configured
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.1.2, 2500.33 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,POPCNT,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
cpu0: apic clock running at 1000MHz
acpicpu0 at acpi0: C1(@1 halt!)
# /sbin/sysctl hw
hw.machine=amd64
hw.model=QEMU Virtual CPU version 2.1.2
hw.ncpu=1
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=cd0:,sd0:d6e5bf6309b1bd2a,fd0:
hw.diskcount=3
hw.sensors.viomb0.raw0=0 (desired)
hw.sensors.viomb0.raw1=0 (current)
hw.cpuspeed=2500
hw.vendor=QEMU
hw.product=Standard PC (i440FX + PIIX, 1996)
hw.version=pc-i440fx-2.1
hw.uuid=5be3b2c7-531d-453a-9313-12a3f256402f
hw.physmem=1056833536
hw.usermem=1056821248
hw.ncpufound=1
hw.allowpowerdown=1
#
Blaisorblade commented 7 years ago

@Blaisorblade No, it's not on an OS X, it's on a virtual machine:

Sure, I just hadn't found what's SIGBUS on OpenBSD and hoped it'd be the same as OS X (not an unreasonable guess). Turns out to be true: http://bxr.su/OpenBSD/sys/sys/signal.h#63

In other words, some process is being killed by the operating system with SIGBUS, so this is unlikely to be debuggable with Haskell-specific tools.

Most importantly, the affected process is probably cabal or ghc, so this might not be a store bug. (Unless the segfault is due to failing store code run through TemplateHaskell — the offending module uses TemplateHaskell, and store uses low-level unsafe code, so that can't be excluded immediately).

== Is GHC running out of RAM/resources? ==

Does the virtual machine have enough memory/swap for the build process? Is some partition (close to) full? 1G is pretty low, and I recently debugged a GHC bug in the runtime system producing SIGSEGV on low memory (in https://github.com/commercialhaskell/stack/issues/2575 and https://ghc.haskell.org/trac/ghc/ticket/12690)—if you're using GHC 8, I wouldn't be surprised to see the same bug.

To exclude this, you could try checking if RAM/swap/disk space are low during the build, reproducing this after increasing the appropriate resources, or building any other big enough package (aeson should be big enough). A failure there would exonerate store.

== Some generic GHC on OpenBSD bug? ==

Have you checked GHC's bug database for relevant OpenBSD-specific bugs? Which GHC version are you using BTW?

== Getting cores ==

As mentioned, getting a core is the appropriate general debugging procedure without specific hypotheses. Are you familiar with the steps or can you figure them out?

I usually use file to figure out the offending executable, then gdb. Since gdb seems to be available on OpenBSD (maybe after installation from ports), you can probably use non-OpenBSD-specific instructions, such as:

http://stackoverflow.com/a/8306805/53974

after that, I'd run bt to get a backtrace, and post here the gdb output.

seanwestfall commented 7 years ago

@Blaisorblade

# ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.3

Only GHC version currently available for OpenBSD 6.0.

Okay, let me try doubling the available RAM, cause that is probably it. Thanks!

seanwestfall commented 7 years ago

@Blaisorblade @mgsloan Okay, that was it. Not enough memory. Though oodly enough, my ssh timed out and lost connection and when I sshed in again, I had to cabal install stack a second time, and it resumed to installing store again, so that means it eventually failed on store this time again, though it completed alright the second time the command was set. So, it seems a little usually that in a memory intensive situations store is what fails.

Thanks guys!

ketzacoatl commented 7 years ago

For future viewers, if doubling ram does not work, ensure you have provided enough to your user with ulimit, and/or see login.conf(5).

Blaisorblade commented 7 years ago

FWIW getting a SIGBUS, even in low memory, suggests there's a bug somewhere—but I certainly don't have time/experience to investigate this properly.