lemire / despacer

C library to remove white space from strings as fast as possible
BSD 3-Clause "New" or "Revised" License
151 stars 14 forks source link

Failed to make on Debian but succeeded on Mac OSX #5

Closed dendisuhubdy closed 7 years ago

dendisuhubdy commented 7 years ago
suhubdyd@eos14:~/dev/despacer$ make
cc -fPIC -std=c99 -O3  -march=native -Wall -Wextra -Wshadow -o despacebenchmark ./benchmarks/despacebenchmark.c -Iinclude
In file included from ./benchmarks/despacebenchmark.c:6:0:
include/despacer.h: In function ‘cleanm256’:
include/despacer.h:377:5: warning: implicit declaration of function ‘_mm256_loadu2_m128i’ [-Wimplicit-function-declaration]
     __m256i mask = _mm256_loadu2_m128i((const __m128i *)despace_mask16 + maskhigh, (const __m128i *)despace_mask16 + masklow);
     ^
include/despacer.h:377:20: error: incompatible types when initializing type ‘__m256i’ using type ‘int’
     __m256i mask = _mm256_loadu2_m128i((const __m128i *)despace_mask16 + maskhigh, (const __m128i *)despace_mask16 + masklow);
                    ^
include/despacer.h: In function ‘avx2_despace_branchless’:
include/despacer.h:400:5: warning: implicit declaration of function ‘_mm256_storeu2_m128i’ [-Wimplicit-function-declaration]
     _mm256_storeu2_m128i((__m128i *)(bytes + pos + offset1), (__m128i *)(bytes + pos ),x);
     ^
Makefile:14: recipe for target 'despacebenchmark' failed
make: *** [despacebenchmark] Error 1
grok-machine:despacer dendisuhubdy$ make
cc -fPIC -std=c99 -O3  -march=native -Wall -Wextra -Wshadow -o despacebenchmark ./benchmarks/despacebenchmark.c -Iinclude
grok-machine:despacer dendisuhubdy$ ./despacebenchmark 
pointer alignment = 4096 bytes 
memcpy(tmpbuffer,buffer,N):  0.082031 cycles / ops
countspaces(buffer, N):  2.191406 cycles / ops
countspaces32(buffer, N):  0.730469 cycles / ops
despace(buffer, N):  1.578125 cycles / ops
despace32(buffer, N):  1.074219 cycles / ops
faster_despace(buffer, N):  1.318359 cycles / ops
faster_despace32(buffer, N):  1.609375 cycles / ops
despace64(buffer, N):  1.353516 cycles / ops
despace_to(buffer, N, tmpbuffer):  1.441406 cycles / ops
avx2_countspaces(buffer, N):  0.078125 cycles / ops
avx2_despace(buffer, N):  1.398438 cycles / ops
avx2_despace_branchless(buffer, N):  0.218750 cycles / ops
avx2_despace_branchless_u2(buffer, N):  0.205078 cycles / ops
sse4_despace(buffer, N):  0.486328 cycles / ops
sse4_despace_branchless(buffer, N):  0.320312 cycles / ops
sse4_despace_branchless32(buffer, N):  0.320312 cycles / ops
sse4_despace_branchless_u2(buffer, N):  0.203125 cycles / ops
sse4_despace_branchless_u4(buffer, N):  0.210938 cycles / ops
sse4_despace_branchless_mask8(buffer, N):  0.431641 cycles / ops
sse4_despace_trail(buffer, N):  1.126953 cycles / ops
sse42_despace_branchless(buffer, N):  0.496094 cycles / ops
sse42_despace_branchless_lookup(buffer, N):  0.501953 cycles / ops
sse42_despace_to(buffer, N,tmpbuffer):  1.050781 cycles / ops
lemire commented 7 years ago

Yes, the problem is simply that your hardware does not support AVX/AVX2. Sadly, I do not have hardware old enough to miss AVX support but to have SSE4.2 to test my code. I'll see about adding a few checks.

lemire commented 7 years ago

Ok. In my latest commit, I moved things around. It should work now. Can you try building again on your older machine?

dendisuhubdy commented 7 years ago

@lemire Built succeed

suhubdyd@eos14:~/dev/despacer$ make
cc -fPIC -std=c99 -O3  -march=native -Wall -Wextra -Wshadow -o despacebenchmark ./benchmarks/despacebenchmark.c -Iinclude
suhubdyd@eos14:~/dev/despacer$ ./despacebenchmark 
pointer alignment = 16 bytes 
memcpy(tmpbuffer,buffer,N):  0.105469 cycles / ops
countspaces(buffer, N):  4.820312 cycles / ops
countspaces32(buffer, N):  1.062500 cycles / ops
despace(buffer, N):  4.390625 cycles / ops
despace32(buffer, N):  2.984375 cycles / ops
faster_despace(buffer, N):  2.402344 cycles / ops
faster_despace32(buffer, N):  3.312500 cycles / ops
despace64(buffer, N):  3.949219 cycles / ops
despace_to(buffer, N, tmpbuffer):  4.347656 cycles / ops
sse4_despace(buffer, N):  0.847656 cycles / ops
sse4_despace_branchless(buffer, N):  0.441406 cycles / ops
sse4_despace_branchless32(buffer, N):  0.390625 cycles / ops
sse4_despace_branchless_u2(buffer, N):  0.441406 cycles / ops
sse4_despace_branchless_u4(buffer, N):  0.417969 cycles / ops
sse4_despace_branchless_mask8(buffer, N):  0.550781 cycles / ops
sse4_despace_trail(buffer, N):  2.148438 cycles / ops
sse42_despace_branchless(buffer, N):  0.675781 cycles / ops
sse42_despace_branchless_lookup(buffer, N):  0.789062 cycles / ops
sse42_despace_to(buffer, N,tmpbuffer):  1.824219 cycles / ops
lemire commented 7 years ago

Great. Thanks for your help.