Open jcberentsen opened 5 years ago
Is this a regression or is this the first time you're using this library?
Also, what OS and GHC version?
This is my setup:
$ sysctl -a | grep machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.6.5
$ cabal --version
cabal-install version 2.4.0.0
compiled using version 2.4.0.1 of the Cabal library
$ uname -a
Darwin intlkymac.lan 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64
$ echo "{}" | pv -t -e -b -a | hw-json-simd create-index --method standard -i /dev/stdin --output-ib-file test.json.ib.idx --output-bp-file test.json.bp.idx
3.00 B 0:00:00 [13.4KiB/s]
I'm also interested to know if a recent change in the hw-prim
is responsible.
If you could modify the constraint for hw-prim
to 0.6.2.32
please and see if that makes a difference.
Is this a regression or is this the first time you're using this library?
Also, what OS and GHC version?
This is the first time trying the library. The OS is Ubuntu 16.04 and 18.04 On both the 16.04 and 18.04 the version used by stack lts-14.7 is ghc-8.6.5 (I had to pin a few extra-dependencies.)
The hw-json-simd
command was installed with ghc-8.0.2 on the Ubuntu 18.04 box
cabal version: 2.4.1.0
$ cat /proc/cpuinfo | grep flags
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
...
I'll have a go at hw-prim-0.6.2.32
Pinning hw-prim-0.6.2.32
still has the problem (On the 18.04 box)
Actually I think stack lts-14.7 was already on hw-prim-0.6.2.32
https://www.stackage.org/lts-14.7/package/hw-prim-0.6.2.32
Pinning to hw-prim-0.6.2.33
also segfaults
New problem; with hw-prim-0.6.2.33
using fromByteStringViaBlanking
also crashes:
Illegal instruction (core dumped)
I'll revert back to 0.6.2.32 and try this again
fromByteStringViaBlanking
works fine again with hw-prim-0.6.2.32
On the other machine, which actually has bmi2
, the fromByteStringViaBlanking
version works fine with hw-prim-0.2.33
.
Here are the flags I used in the stack.yaml:
flags:
hw-json:
bmi2: true
sse42: true
hw-json-simd:
avx2: true
bmi2: true
sse42: true
bits-extra:
bmi2: true
hw-rankselect-base:
bmi2: true
hw-rankselect:
bmi2: true
hw-simd:
bmi2: true
avx2: true
The bmi2: true
is a lie on the 18.04 machine, so this probably explains the 'Illegal instruction'
Setting bmi2: false
on that machine, resolves the illegal instruction problem.
The original problem still remains on the 16.04 machine, which has the following cpuinfo:
$ cat /proc/cpuinfo | head -n 28
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping : 1
microcode : 0xb00002e
cpu MHz : 2194.711
cache size : 56320 KB
physical id : 0
siblings : 22
core id : 0
cpu cores : 22
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm rdseed adx smap xsaveopt arat flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
bogomips : 4389.42
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management:
I am able to reproduce the problem in hw-json-simd HEAD, with this code added as hw-json-simd/test/Spec.hs
:
{-# LANGUAGE OverloadedStrings #-}
import qualified HaskellWorks.Data.ByteString.Lazy as LBS
import HaskellWorks.Data.Json.Simd.Index.Standard
main :: IO ()
main = do
-- let res = makeStandardJsonIbBps "{}"
let res = makeStandardJsonIbBps . LBS.resegmentPadded 512 $ "{}"
case res of
Right chunks -> do
putStrLn $ "Chunks:"
let triggerBug = True
if triggerBug then putStrLn $ show (length chunks) else pure ()
err ->
putStrLn $ "No chunks: " ++ show err
$ ./project.sh test
Build profile: -w ghc-8.0.2 -O2
In order, the following will be built (use -v for more details):
- hw-json-simd-0.1.0.2 (test:hw-json-simd-test) (file test/Spec.hs changed)
Preprocessing test suite 'hw-json-simd-test' for hw-json-simd-0.1.0.2..
Building test suite 'hw-json-simd-test' for hw-json-simd-0.1.0.2..
[2 of 2] Compiling Main ( test/Spec.hs, /home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/build/hw-json-simd-test/hw-json-simd-test-tmp/Main.o )
Linking /home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/build/hw-json-simd-test/hw-json-simd-test ...
Running 1 test suites...
Test suite hw-json-simd-test: RUNNING...
Test suite hw-json-simd-test: FAIL
Test suite logged to:
/home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/test/hw-json-simd-0.1.0.2-hw-json-simd-test.log
0 of 1 test suites (0 of 1 test cases) passed.
cabal: Tests failed for test:hw-json-simd-test from hw-json-simd-0.1.0.2.
Disabling the computation of length chunks
doesn't trigger, but I guess this is just because of lazy evaluation?
Not doing the resegmentPadded
also fails
The ergonomics of figuring out why something fails is not so good :(
Is there an EC2 instance or where this happens?
Try this:
$ cd cbits
$ make
$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
$ ./a.out sm simple.json simple.json.ib.idx simple.json.bp.idx
Both invocations of a.out segfaults on the machines in question (They are not available in EC2)
Program received signal SIGSEGV, Segmentation fault.
0x00000000004012f0 in hw_json_simd_sm_make_ib_op_cl_chunks ()
(gdb) bt
#0 0x00000000004012f0 in hw_json_simd_sm_make_ib_op_cl_chunks ()
#1 0x0000000000401692 in hw_simd_json_sm_main ()
#2 0x0000000000400819 in main ()
(gdb)
The previous gdb backtrace was for the sm
command
Backtrace for the sp
command:
(gdb) r
Starting program: /home/chrberen/github/hw-json-simd/cbits/a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400b66 in hw_json_simd_summarise ()
(gdb) bt
#0 0x0000000000400b66 in hw_json_simd_summarise ()
#1 0x0000000000400cdf in hw_json_simd_process_chunk ()
#2 0x0000000000401064 in hw_json_simd_main_spliced ()
#3 0x00000000004007fc in main ()
(gdb)
Assembly, if this is of any use:
│0x400b66 <hw_json_simd_summarise+22> vmovdqa (%rdi),%ymm0 │
│0x400b6a <hw_json_simd_summarise+26> vpcmpeqb 0xe8e(%rip),%ymm0,%ymm1 # 0x401a00 │
│0x400b72 <hw_json_simd_summarise+34> vpmovmskb %ymm1,%r12d │
│0x400b76 <hw_json_simd_summarise+38> vpcmpeqb 0xea2(%rip),%ymm0,%ymm1 # 0x401a20 │
│0x400b7e <hw_json_simd_summarise+46> vpmovmskb %ymm1,%r10d │
│0x400b82 <hw_json_simd_summarise+50> vpcmpeqb 0xeb6(%rip),%ymm0,%ymm1 # 0x401a40 │
│0x400b8a <hw_json_simd_summarise+58> or %r12d,%r10d │
│0x400b8d <hw_json_simd_summarise+61> mov %r10d,(%rsi) │
│0x400b90 <hw_json_simd_summarise+64> vpmovmskb %ymm1,%ebx │
│0x400b94 <hw_json_simd_summarise+68> vpcmpeqb 0xec4(%rip),%ymm0,%ymm1 # 0x401a60 │
│0x400b9c <hw_json_simd_summarise+76> vpmovmskb %ymm1,%r11d```
I suspect this is a memory alignment problem. According to https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc.3.html; malloc returns aligned memory on macOS. I don't think this is necessarily the case on other platforms
Here is some evidence suggesting alignment of the buffer
on the stack may be the problem:
I added a print of the buffer address in simd-spliced.c
and ran a.out multiple times...
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe9a0c7040 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffcb9909b00 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffc9589e5a0 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe28f06b40 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe340096b0 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fffb74b0e10 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fff6a0e48c0 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffde0d89500 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fff5e5f8d30 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffdf992fc60 of size 32768
There seems to be a pattern between the buffer
address alignment and when it segfaults?!
Seems like it segfaults when the next to last hex digit in the address is odd
The sm
case seems to need phi-buffer
on a 32-byte boundary. #56 contains code that also fixes the sm
segmentation fault for me.
Thanks so much for your PR.
There's one remaining thing that worries me, and that is I don't have a means to regression test any future code changes given that this seems to be either compiler or architecture specific.
Just to clarify, the a.out segfaults were fixed by aligning the buffers, but running the haskell test still fails. I guess there needs to be some way of aligning the buffers passed to the c-code?
Good point. I'm guessing that it's possible to do that by adding a wrapper around mallocForeignPtrBytes
with similar logic and calling that instead.
hw-json-simd-0.1.0.3
has been published.
After installing with:
$ cabal new-install hw-json-simd
, running the example causes a segmentation fault.The same happens with
--method standard
(and larger json input than this ;)This is an attempt at stripping down a problem I had with using
fromByteStringViaSimd
fromhw-json
where it also segfaults. The same code usingfromByteStringViaBlanking
works fine.The
fromByteStringViaSimd
problem was tested on two different x86 machines (one without bmi2 support) Looking at the code, thefromByteStringViaSimd
seems to attempt to safeguard on the CPU capabilities, maybe it is not working as intended?I also tried to pass diverse flags (via stack.yaml) to enable
sse42
,bmi2
andavx2
for various packages in thehw-
ecosystem, without luck :(