haskell-works / hw-json-simd

BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

Creating index causes Segmentation fault #53

Open jcberentsen opened 5 years ago

jcberentsen commented 5 years ago

After installing with: $ cabal new-install hw-json-simd, running the example causes a segmentation fault.

$ echo "{}" | pv -t -e -b -a | hw-json-simd create-index --method standard -i /dev/stdin --output-ib-file test.json.ib.idx --output-bp-file test.json.bp.idx

3.00 B 0:00:00 [53.3KiB/s]
Segmentation fault (core dumped

The same happens with --method standard (and larger json input than this ;)

This is an attempt at stripping down a problem I had with using fromByteStringViaSimd from hw-json where it also segfaults. The same code using fromByteStringViaBlanking works fine.

The fromByteStringViaSimd problem was tested on two different x86 machines (one without bmi2 support) Looking at the code, the fromByteStringViaSimd seems to attempt to safeguard on the CPU capabilities, maybe it is not working as intended?

I also tried to pass diverse flags (via stack.yaml) to enable sse42, bmi2 and avx2 for various packages in the hw- ecosystem, without luck :(

newhoggy commented 5 years ago

Is this a regression or is this the first time you're using this library?

Also, what OS and GHC version?

newhoggy commented 5 years ago

This is my setup:

$ sysctl -a | grep machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.6.5
$ cabal --version
cabal-install version 2.4.0.0
compiled using version 2.4.0.1 of the Cabal library
$ uname -a
Darwin intlkymac.lan 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64
$ echo "{}" | pv -t -e -b -a | hw-json-simd create-index --method standard -i /dev/stdin --output-ib-file test.json.ib.idx --output-bp-file test.json.bp.idx
3.00 B 0:00:00 [13.4KiB/s]
newhoggy commented 5 years ago

I'm also interested to know if a recent change in the hw-prim is responsible.

If you could modify the constraint for hw-prim to 0.6.2.32 please and see if that makes a difference.

jcberentsen commented 5 years ago

Is this a regression or is this the first time you're using this library?

Also, what OS and GHC version?

This is the first time trying the library. The OS is Ubuntu 16.04 and 18.04 On both the 16.04 and 18.04 the version used by stack lts-14.7 is ghc-8.6.5 (I had to pin a few extra-dependencies.)

The hw-json-simd command was installed with ghc-8.0.2 on the Ubuntu 18.04 box cabal version: 2.4.1.0

$ cat /proc/cpuinfo | grep flags
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
...

I'll have a go at hw-prim-0.6.2.32

jcberentsen commented 5 years ago

Pinning hw-prim-0.6.2.32 still has the problem (On the 18.04 box)

jcberentsen commented 5 years ago

Actually I think stack lts-14.7 was already on hw-prim-0.6.2.32 https://www.stackage.org/lts-14.7/package/hw-prim-0.6.2.32

jcberentsen commented 5 years ago

Pinning to hw-prim-0.6.2.33 also segfaults

jcberentsen commented 5 years ago

New problem; with hw-prim-0.6.2.33 using fromByteStringViaBlanking also crashes:

Illegal instruction (core dumped)

I'll revert back to 0.6.2.32 and try this again

jcberentsen commented 5 years ago

fromByteStringViaBlanking works fine again with hw-prim-0.6.2.32

jcberentsen commented 5 years ago

On the other machine, which actually has bmi2, the fromByteStringViaBlanking version works fine with hw-prim-0.2.33.

Here are the flags I used in the stack.yaml:

flags:
  hw-json:
    bmi2: true
    sse42: true
  hw-json-simd:
    avx2: true
    bmi2: true
    sse42: true
  bits-extra:
      bmi2: true
  hw-rankselect-base:
    bmi2: true
  hw-rankselect:
    bmi2: true
  hw-simd:
    bmi2: true
    avx2: true

The bmi2: true is a lie on the 18.04 machine, so this probably explains the 'Illegal instruction' Setting bmi2: false on that machine, resolves the illegal instruction problem.

The original problem still remains on the 16.04 machine, which has the following cpuinfo:

$ cat /proc/cpuinfo | head -n 28
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb00002e
cpu MHz         : 2194.711
cache size      : 56320 KB
physical id     : 0
siblings        : 22
core id         : 0
cpu cores       : 22
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm rdseed adx smap xsaveopt arat flush_l1d arch_capabilities
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
bogomips        : 4389.42
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:
jcberentsen commented 5 years ago

I am able to reproduce the problem in hw-json-simd HEAD, with this code added as hw-json-simd/test/Spec.hs:

{-# LANGUAGE OverloadedStrings #-}

import qualified HaskellWorks.Data.ByteString.Lazy          as LBS
import           HaskellWorks.Data.Json.Simd.Index.Standard

main :: IO ()
main = do
  -- let res = makeStandardJsonIbBps "{}"
  let res = makeStandardJsonIbBps . LBS.resegmentPadded 512 $ "{}"
  case res of
    Right chunks -> do
      putStrLn $ "Chunks:"
      let triggerBug = True
      if triggerBug then putStrLn $ show (length chunks) else pure ()

    err ->
        putStrLn $ "No chunks: " ++ show err
$ ./project.sh test
Build profile: -w ghc-8.0.2 -O2
In order, the following will be built (use -v for more details):
 - hw-json-simd-0.1.0.2 (test:hw-json-simd-test) (file test/Spec.hs changed)
Preprocessing test suite 'hw-json-simd-test' for hw-json-simd-0.1.0.2..
Building test suite 'hw-json-simd-test' for hw-json-simd-0.1.0.2..
[2 of 2] Compiling Main             ( test/Spec.hs, /home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/build/hw-json-simd-test/hw-json-simd-test-tmp/Main.o )
Linking /home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/build/hw-json-simd-test/hw-json-simd-test ...
Running 1 test suites...
Test suite hw-json-simd-test: RUNNING...
Test suite hw-json-simd-test: FAIL
Test suite logged to:
/home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/test/hw-json-simd-0.1.0.2-hw-json-simd-test.log
0 of 1 test suites (0 of 1 test cases) passed.
cabal: Tests failed for test:hw-json-simd-test from hw-json-simd-0.1.0.2.

Disabling the computation of length chunks doesn't trigger, but I guess this is just because of lazy evaluation?

Not doing the resegmentPadded also fails

newhoggy commented 5 years ago

The ergonomics of figuring out why something fails is not so good :(

newhoggy commented 5 years ago

Is there an EC2 instance or where this happens?

newhoggy commented 5 years ago

Try this:

$ cd cbits
$ make
$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
$ ./a.out sm simple.json simple.json.ib.idx simple.json.bp.idx
jcberentsen commented 5 years ago

Both invocations of a.out segfaults on the machines in question (They are not available in EC2)

Program received signal SIGSEGV, Segmentation fault.
0x00000000004012f0 in hw_json_simd_sm_make_ib_op_cl_chunks ()
(gdb) bt
#0  0x00000000004012f0 in hw_json_simd_sm_make_ib_op_cl_chunks ()
#1  0x0000000000401692 in hw_simd_json_sm_main ()
#2  0x0000000000400819 in main ()
(gdb)
jcberentsen commented 5 years ago

The previous gdb backtrace was for the sm command

Backtrace for the sp command:

(gdb) r
Starting program: /home/chrberen/github/hw-json-simd/cbits/a.out sp simple.json simple.json.ib.idx simple.json.bp.idx

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400b66 in hw_json_simd_summarise ()
(gdb) bt
#0  0x0000000000400b66 in hw_json_simd_summarise ()
#1  0x0000000000400cdf in hw_json_simd_process_chunk ()
#2  0x0000000000401064 in hw_json_simd_main_spliced ()
#3  0x00000000004007fc in main ()
(gdb)
jcberentsen commented 5 years ago

Assembly, if this is of any use:


│0x400b66 <hw_json_simd_summarise+22>    vmovdqa (%rdi),%ymm0                                                                                                  │
   │0x400b6a <hw_json_simd_summarise+26>    vpcmpeqb 0xe8e(%rip),%ymm0,%ymm1        # 0x401a00                                                                    │
   │0x400b72 <hw_json_simd_summarise+34>    vpmovmskb %ymm1,%r12d                                                                                                 │
   │0x400b76 <hw_json_simd_summarise+38>    vpcmpeqb 0xea2(%rip),%ymm0,%ymm1        # 0x401a20                                                                    │
   │0x400b7e <hw_json_simd_summarise+46>    vpmovmskb %ymm1,%r10d                                                                                                 │
   │0x400b82 <hw_json_simd_summarise+50>    vpcmpeqb 0xeb6(%rip),%ymm0,%ymm1        # 0x401a40                                                                    │
   │0x400b8a <hw_json_simd_summarise+58>    or     %r12d,%r10d                                                                                                    │
   │0x400b8d <hw_json_simd_summarise+61>    mov    %r10d,(%rsi)                                                                                                   │
   │0x400b90 <hw_json_simd_summarise+64>    vpmovmskb %ymm1,%ebx                                                                                                  │
   │0x400b94 <hw_json_simd_summarise+68>    vpcmpeqb 0xec4(%rip),%ymm0,%ymm1        # 0x401a60                                                                    │
   │0x400b9c <hw_json_simd_summarise+76>    vpmovmskb %ymm1,%r11d```
jcberentsen commented 5 years ago

I suspect this is a memory alignment problem. According to https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc.3.html; malloc returns aligned memory on macOS. I don't think this is necessarily the case on other platforms

jcberentsen commented 5 years ago

Here is some evidence suggesting alignment of the buffer on the stack may be the problem: I added a print of the buffer address in simd-spliced.c and ran a.out multiple times...

chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe9a0c7040 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffcb9909b00 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffc9589e5a0 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe28f06b40 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe340096b0 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fffb74b0e10 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fff6a0e48c0 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffde0d89500 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fff5e5f8d30 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffdf992fc60 of size 32768

There seems to be a pattern between the buffer address alignment and when it segfaults?! Seems like it segfaults when the next to last hex digit in the address is odd

jcberentsen commented 5 years ago

The sm case seems to need phi-buffer on a 32-byte boundary. #56 contains code that also fixes the sm segmentation fault for me.

newhoggy commented 5 years ago

Thanks so much for your PR.

There's one remaining thing that worries me, and that is I don't have a means to regression test any future code changes given that this seems to be either compiler or architecture specific.

jcberentsen commented 5 years ago

Just to clarify, the a.out segfaults were fixed by aligning the buffers, but running the haskell test still fails. I guess there needs to be some way of aligning the buffers passed to the c-code?

newhoggy commented 5 years ago

Good point. I'm guessing that it's possible to do that by adding a wrapper around mallocForeignPtrBytes with similar logic and calling that instead.

newhoggy commented 5 years ago

hw-json-simd-0.1.0.3 has been published.