simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.37k stars 247 forks source link

Bug on Arm when using simde/x86/sse.h #897

Closed phridley closed 3 years ago

phridley commented 3 years ago

Hi,

in simde/x86/sse.h the replacement for mm_prefetch is defined as

simde_mm_prefetch (char const* p, int i)

This falls back to the compiler's buildin_prefetch which is defined as __builtin_prefetch (const void *addr[, rw[, locality]])

(https://developer.arm.com/documentation/101458/2100/Optimize/Prefetching-with---builtin-prefetch)

So in simde/x86/sse.h, it should really be the following

simde_mm_prefetch (const void *p, int i)

Please can someone take a look.

Thanks

aqrit commented 3 years ago

I don't understand...

simde_mm_prefetch emulates _mm_prefetch (char const* p, int i). const char* and char const* are the same thing. __builtin_prefetch should accept char pointers without warnings.

Perhaps you're looking for a generic prefetcher: https://github.com/nemequ/hedley/issues/31 ?

phridley commented 3 years ago

Please have a look at line 2244 in https://github.com/DLTcollab/sse2neon/blob/master/sse2neon.h

That implementation works fine with GCC on Arm. In this case, the compilers don't accept char const p, they need const void p to match what __builtin_prefetch has.

aqrit commented 3 years ago

The function signature of _mm_prefetch is char const* and should not be changed (this is a bug in sse2neon). One could cast to a void pointer within simde_mm_prefetch (eg: __builtin_prefetch((const void *)p)).

As far as I can tell, such a cast should not be needed (godbolt,l:'5',n:'0',o:'C+source+%231',t:'0')),k:56.113289774827706,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:carm930,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'1',libraryCode:'1',trim:'1'),flagsViewOpen:'1',fontScale:16,fontUsePx:'0',j:3,lang:c,libs:!(),options:'-Wcast-qual+-std%3Dc11+-O3+-Wall+-Wextra+-pedantic+-Wmissing-prototypes+-Wstrict-prototypes+-Werror+-Wcast-align%3Dstrict',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'ARM+gcc+9.3+(linux)+(Editor+%231,+Compiler+%233)+C',t:'0')),k:33.33333333333333,l:'4',m:50.21913091115416,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:carm64g930,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'1',trim:'1'),flagsViewOpen:'1',fontScale:16,fontUsePx:'0',j:4,lang:c,libs:!(),options:'-Wcast-qual+-std%3Dc11+-O3+-Wall+-Wextra+-pedantic+-Wmissing-prototypes+-Wstrict-prototypes+-Werror+-Wcast-align%3Dstrict',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'ARM64+gcc+9.3+(Editor+%231,+Compiler+%234)+C',t:'0')),header:(),l:'4',m:49.780869088845854,n:'0',o:'',s:0,t:'0')),k:43.88671022517229,l:'3',n:'0',o:'',t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)).

But don't be discouraged, I'm not a part of this project.

nemequ commented 3 years ago

@aqrit is right, SSE2NEON is using the wrong type. So are GCC and clang, though. MSVC and ICC use the correct type (char const*). Intel's documentation is the authoritative source. I agree that the type of the parameter to _mm_prefetch should have been const void* not const char*, but that's not up to us (and it's not even close to the top of the list of really bad decisions Intel made with their SIMD APIs).

I've gone back and forth a bit on whether to change it. Using char const* is pretty clearly the most correct option, but since GCC and clang are already more permissive using const void* makes for less friction when coming from those compilers, and we would still correctly handle all valid input (we would just also accept some invalid input, the same set that GCC and clang already do).

I think I'll go ahead and change SIMDe to accept const void*, but to be clear all this will really do is hide a bug in your code… if you try to compile your code on MSVC, for example, you're likely to end up with a warning or error.

phridley commented 3 years ago

Thanks @aqrit and @nemequ for the explanation here - understood.

nemequ commented 3 years ago

This should work without the cast now. This also reminded me that the implementation had a lot of room for improvement, so I added a much more complete implementation which should work pretty reliably on most compilers and architectures.

Thanks for reporting this, please let me know if you notice any other issues :)