lemire / simdcomp

A simple C library for compressing lists of integers using binary packing
BSD 3-Clause "New" or "Revised" License
488 stars 53 forks source link

SSE2 compatibility should be warranted #10

Closed weltling closed 8 years ago

weltling commented 9 years ago

See the actual discussion here https://github.com/lemire/simdcomp/commit/1a6ea48fb9ba53888470e891b321164ab61ecb39, but in short - while various SSE versions have useful features, SSE2 is the standard set. The suggestion is to retain the compatibility with SSE2 even while some features from the upper SSE versions are used.

This is done by the following steps

wshager commented 8 years ago

This emulation of smmintrin.h (adapted from SSEPlus) suffices to compile. I wanted to see what Emscripten would make of this, but the generated javascript floods the CPU...

smmintrin.h.txt

Good luck with the project.

wshager commented 8 years ago

Correction: the first two tests work, the simple_demo() hangs (compiled with Emscripten).

weltling commented 8 years ago

Hi,

i've started to work on this having half a free day. But it is somehow strange :) I had to find out that i have no single machine not supporting sse4.1. Nevertheless, i made this small patch locally in the makefile:

diff --git a/makefile b/makefile
index 0e1c8de..228e737 100644
--- a/makefile
+++ b/makefile
@@ -2,10 +2,17 @@
 .SUFFIXES:
 #
 .SUFFIXES: .cpp .o .c .h
+
+ifeq ($(SSE),2)
+SSE_ARG=-msse2
+else
+SSE_ARG=-msse4.1
+endif
+
 ifeq ($(DEBUG),1)
-CFLAGS = -fPIC  -std=c89 -ggdb -msse4.1 -march=native -Wall -Wextra -pedantic
+CFLAGS = -fPIC  -std=c89 -ggdb $(SSE_ARG) -march=native -Wall -Wextra -pedantic
 else
-CFLAGS = -fPIC -std=c89 -O3 -msse4.1  -march=native -Wall -Wextra -pedantic
+CFLAGS = -fPIC -std=c89 -O3 $(SSE_ARG) -march=native -Wall -Wextra -pedantic
 endif # debug
 LDFLAGS = -shared
 LIBNAME=libsimdcomp.so.0.0.3

And then i compile like make SSE=2. Surprisingly, everything compiles fine and unit tests pass through. It is strange - firstly, it should not even compile, should it? Or, it could be good, that gcc already emulates something. Or gcc ignores this option as the VM supports SSE4_1. I have no chance to move the binaries and test on a machine without SSE4_1 as i don't have such HW. Can someone explain this?

Thanks.

weltling commented 8 years ago

@wshager thanks for the interest. Could you please be more verbose about how you compile and test it, and what HW was used?

Thanks.

wshager commented 8 years ago

@weltling I used Emscripten to compile, and since it only supports SIMD intrinsics up to SSE2, newer headers simply won't be found in the Emscripten lib. The HW doesn't seem relevant in this case. The current lack of SIMD support in javascript kinda puts this project on hold for me, since I'm looking for a solution in the browser.

lemire commented 8 years ago

@weltling

SSE 4.1 was introduced ~ 2007. I am not sure whether anyone makes x64 CPUs today without SSE 4.1.

Anyone knows?

lemire commented 8 years ago

Checked in a version of the code that should compile fine given the proper compiler settings.

@wshager Hopefully, the code should be compilable by Emscripten if there is SIMD support.

weltling commented 8 years ago

@lemire, yeah, i was linking this ttp://store.steampowered.com/hwsurvey in another thread which tells that sse4.1 is still used by round 83% in the gaming, which should be the indicator.

I was planning long to go for the emulation layer, just it was quite busy times on my side inbetween. But now when i've came to it, the strange thing was that gcc -msse2 was compiling just fine the existing code even before your patch. So maybe gcc already inserts some emulation, or ignores the option. That's why I was wondering :) BTW Visual Studio seems to not to have options to enforce some specific SSE version compatibility, seems the runtime will do the job. But by the plan I had was to implement a compatibility layer explicitly.

Thanks.

lemire commented 8 years ago

The goal of my fix is to allow @wshager to go forward with his plan to compile this library with Emscripten. I'd be very interested in hearing about the results...

yeah, i was linking this http://store.steampowered.com/hwsurvey in another thread which tells that sse4.1 is still used by round 83% in the gaming, which should be the indicator.

It is an interesting reference.

the strange thing was that gcc -msse2 was compiling just fine the existing code even before your patch. So maybe gcc already inserts some emulation, or ignores the option.

I do not think that -msse2 precludes SSE 4.1. If you want to forbid the compiler to use SSE 4.1, you have to use a negative flag like -mno-sse4.1.

You can certainly emulate SSE 4.1 functions, but you will get awful performance. It is probably not worth it. Better not use SIMD instructions in the first place if you have to support obsolete hardware.

weltling commented 8 years ago

@lemire great, -msse2 -mno-sse4.1 worked as expected. Looks like gcc expects the least version with -m, instead of a most one. Was my wrong interpretation. With these options now it properly fails to compile.

I got it exactly the way you told. With your fix it satisfies SSE2. But effectively it excludes several functionality. I was more talking about keeping the APIs, but having necessary parts emulated. Regarding performance - clear, SSE 2 were worse than 4.1. But I were more about compatibility. It's just down to whether to retain APIs compatible vs. moving forward with features.

Another point is also, that SIMD in any form will certainly provide a better performance compared to plain C/C++. More practical applicability vs. speed within SSE versions, as round 1/5 of cases sounds still significant to me. OFC i don't insist on this, just it was what we discussed earlier, why i opened this issue back then. I'm likely to call it a day then, mostly for the time reasons for my part.

Thanks.

lemire commented 8 years ago

SIMD in any form will certainly provide a better performance compared to plain C/C++.

Some non-SSE2 SIMD instructions we use (e.g., pshufb) cannot be emulated using SSE2 instructions. So you need to write the SIMD register to memory, reload it into regular registers, do the computation, then reload a SIMD register. That's going to be atrociously slow in some cases.

Even when the SIMD instruction can be emulated with SSE2 instructions (e.g., pmaxud), it may involve more operations than a non-SIMD algorithm would use. On the whole, you may end up with a significantly worse latency and throughput.

wshager commented 8 years ago

@lemire thanks! The example now compiles to Emscripten. Steps to reproduce:

emcc -O2 src/simdbitpacking.c -o simdbitpacking.bc -msse2 -Iinclude emcc -O2 src/simdcomputil.c -o simdcomputil.bc -msse2 -Iinclude emcc -O2 src/simdintegratedbitpacking.c -o simdintegratedbitpacking.bc -msse2 -Iinclude emcc -O2 example.c -o example.bc -msse2 -Iinclude emcc -O2 example.bc simdbitpacking.bc simdintegratedbitpacking.bc simdcomputil.bc -o project.html -s ALLOW_MEMORY_GROWTH=1

I'll puzzle some more on dynamic calls from JS later, and report my experiences in #13. Moving forward, I expect Emscripten to support SSE4.1 in short term. https://github.com/kripken/emscripten/issues/4030

juj commented 8 years ago

This looks like a very interesting use case of SIMD.js! I'd be very curious to see comparison graphs of native scalar vs native SIMD vs JS scalar vs JS SIMD performance to get a reality check of the suitability of the upcoming SIMD.js specification on your SIMD use case. If you generate such benchmark results at some point, please ping me in for the numbers!

wshager commented 8 years ago

@juj I'd be happy to.

lemire commented 8 years ago

@wshager

Which version of emscripten do you use? The fact that there is no -msse2 flag on my version does not make me hopeful.

$ emcc --version
emcc (Emscripten GCC-like replacement) 1.10.0 ()
Copyright (C) 2013 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ emcc -O2 src/simdbitpacking.c -o simdbitpacking.bc -msse2 -Iinclude
clang: warning: argument unused during compilation: '-msse2'
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:6:1: error: redefinition of '_mm_set_ps'
_mm_set_ps(float z, float y, float x, float w)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:6:1: note: previous definition is here
_mm_set_ps(float z, float y, float x, float w)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:12:1: error: redefinition of '_mm_set1_ps'
_mm_set1_ps(float w)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:12:1: note: previous definition is here
_mm_set1_ps(float w)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:18:1: error: redefinition of '_mm_setzero_ps'
_mm_setzero_ps(void)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:18:1: note: previous definition is here
_mm_setzero_ps(void)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:24:1: error: redefinition of '_mm_store_ps'
_mm_store_ps(float *p, __m128 a)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:24:1: note: previous definition is here
_mm_store_ps(float *p, __m128 a)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:30:1: error: redefinition of '_mm_movemask_ps'
_mm_movemask_ps(__m128 a)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:30:1: note: previous definition is here
_mm_movemask_ps(__m128 a)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:36:1: error: redefinition of '_mm_add_ps'
_mm_add_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:36:1: note: previous definition is here
_mm_add_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:42:1: error: redefinition of '_mm_sub_ps'
_mm_sub_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:42:1: note: previous definition is here
_mm_sub_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:48:1: error: redefinition of '_mm_mul_ps'
_mm_mul_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:48:1: note: previous definition is here
_mm_mul_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:54:1: error: redefinition of '_mm_div_ps'
_mm_div_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:54:1: note: previous definition is here
_mm_div_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:60:1: error: redefinition of '_mm_min_ps'
_mm_min_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:60:1: note: previous definition is here
_mm_min_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:66:1: error: redefinition of '_mm_max_ps'
_mm_max_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:66:1: note: previous definition is here
_mm_max_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:72:1: error: redefinition of '_mm_sqrt_ps'
_mm_sqrt_ps(__m128 a)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:72:1: note: previous definition is here
_mm_sqrt_ps(__m128 a)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:80:1: error: redefinition of '_mm_cmplt_ps'
_mm_cmplt_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:80:1: note: previous definition is here
_mm_cmplt_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:86:1: error: redefinition of '_mm_cmple_ps'
_mm_cmple_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:86:1: note: previous definition is here
_mm_cmple_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:92:1: error: redefinition of '_mm_cmpeq_ps'
_mm_cmpeq_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:92:1: note: previous definition is here
_mm_cmpeq_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:98:1: error: redefinition of '_mm_cmpge_ps'
_mm_cmpge_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:98:1: note: previous definition is here
_mm_cmpge_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:104:1: error: redefinition of '_mm_cmpgt_ps'
_mm_cmpgt_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:104:1: note: previous definition is here
_mm_cmpgt_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:110:1: error: redefinition of '_mm_and_ps'
_mm_and_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:110:1: note: previous definition is here
_mm_and_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:116:1: error: redefinition of '_mm_andnot_ps'
_mm_andnot_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:116:1: note: previous definition is here
_mm_andnot_ps(__m128 a, __m128 b)
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
ERROR    root: compiler frontend failed to generate LLVM bitcode, halting
wshager commented 8 years ago

@lemire I use 1.35.0. Choose 'emsdk_portable install latest' / 'emsdk_portable activate latest'.

lemire commented 8 years ago

@wshager Makes sense. Thanks.

lemire commented 8 years ago

@wshager Right. It builds.

$ emcc --version
emcc (Emscripten gcc/clang-like replacement) 1.35.0 (commit 42fb486c53d627b203b77af6e5d0c088c0ad03ad)
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ emcc -O2 src/simdbitpacking.c -o simdbitpacking.bc -msse2 -Iinclude
INFO:root:(Emscripten: Running sanity checks)
$ emcc -O2 src/simdcomputil.c -o simdcomputil.bc -msse2 -Iinclude
$ emcc -O2 src/simdintegratedbitpacking.c -o simdintegratedbitpacking.bc -msse2 -Iinclude
$ emcc -O2 example.c -o example.bc -msse2 -Iinclude
$ emcc -O2 example.bc simdbitpacking.bc simdintegratedbitpacking.bc simdcomputil.bc -o project.html -s ALLOW_MEMORY_GROWTH=1
WARNING:root:generating system library: libc.bc...

WARNING:root:                                     ok
WARNING:root:generating system library: dlmalloc.bc...
WARNING:root:                                         ok
wshager commented 8 years ago

@lemire you can add -s EXPORTED_FUNCTIONS="['_main', '_compress_decompress_demo','_varying_bit_width_demo','_simple_demo']" to export functions separately.

lemire commented 8 years ago

Thanks.

I am just checking things out in case others want to try it out (so that it is documented).

wshager commented 8 years ago

@lemire great. I'm going to try to rewrite some functions using SIMD.js. I'll report that too.