pygame-community / pygame-ce

🐍🎮 pygame - Community Edition is a FOSS Python library for multimedia applications (like games). Built on top of the excellent SDL library.
https://pyga.me
822 stars 131 forks source link

Enable SSE 4.2 builds (image.tostring) (2227) #1154

Open GalacticEmperor1 opened 1 year ago

GalacticEmperor1 commented 1 year ago

Issue №2227 opened by illume at 2020-10-22 09:48:55

Need to figure out how to build this with runtime detection. Otherwise using SSE2 could be an option.


Comments

*nthykier commented at 2021-10-17 08:08:40*

Had a look at this and as I understand it, the root issue is that we unconditionally passed -msse4.2 to the compiler - allegedly to appease some ancient version of CentOS per # 1905 (https://github.com/pygame/pygame/commit/8894fb266d2ef7b70ce4fdd8cae4b4adb489fcc1). The # ifdefs were not the issue.

I propose that we bump the gcc version requirements to perform the SSE4.2 build so it excludes the old CentOS compiler but happens with more modern versions.

@illume: Do you remember the version of gcc used or the CentOS version?


*nthykier commented at 2021-10-17 08:23:25*

MIght need a bit of # pragma massage before including the instrintics.

https://stackoverflow.com/questions/46165752/does-clang-have-something-like-pragma-gcc-target


*robertpfeiffer commented at 2021-10-18 15:42:12*

MIght need a bit of # pragma massage before including the instrintics.

https://stackoverflow.com/questions/46165752/does-clang-have-something-like-pragma-gcc-target

Last time I checked, this does not work with MSVC. As far as I can tell, there is no simple way to do this that works in GCC, clang, and MSVC, and that works for ARM and X86 without additional ifdefs.


*illume commented at 2022-01-26 00:38:49*

We've dropped support for SDL1 and older manylinux since this issue was started. So now maybe it will work.

A couple of years ago it was 97%, now SSE 4.2 is at 98.53% on the steam survey. https://store.steampowered.com/hwsurvey Also, it does seem more things require SSE 4.2 now compared to 2020, so probably we can just enable it by default without much damage. For example dosbox requires it since 2021.

I think we can just enable SSE 4.2, and be able to use the nice image.tostring function that @nthykier wrote, and possibly allow @MyreMylar to use SSE 4.2 in blitters.

SSE 4.2 could do 4 pixels at once? Similar to https://github.com/pygame/pygame/pull/1715 ? https://en.wikipedia.org/wiki/SSE4# SSE4.2

"several new instructions that perform character searches and comparison on two operands of 16 bytes at a time."

itzpr3d4t0r commented 2 months ago

I think we could remake our SIMD backend to be more similar to SDL's, where they compile/include all symbols in a single header file: SDL_intrin.txt Like the bulk of the file consists of these:

#if defined(__x86_64__) || defined(_M_X64) || defined(__i386__) || defined(_M_IX86)
# if ((defined(_MSC_VER) && !defined(_M_X64)) || defined(__MMX__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_MMX)
#  define SDL_MMX_INTRINSICS 1
#  include <mmintrin.h>
# endif
# if (defined(_MSC_VER) || defined(__SSE__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_SSE)
#  define SDL_SSE_INTRINSICS 1
#  include <xmmintrin.h>
# endif
# if (defined(_MSC_VER) || defined(__SSE2__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_SSE2)
#  define SDL_SSE2_INTRINSICS 1
#  include <emmintrin.h>
# endif
# if (defined(_MSC_VER) || defined(__SSE3__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_SSE3)
#  define SDL_SSE3_INTRINSICS 1
#  include <pmmintrin.h>
# endif
# if (defined(_MSC_VER) || defined(__SSE4_1__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_SSE4_1)
#  define SDL_SSE4_1_INTRINSICS 1
#  include <smmintrin.h>
# endif
# if (defined(_MSC_VER) || defined(__SSE4_2__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_SSE4_2)
#  define SDL_SSE4_2_INTRINSICS 1
#  include <nmmintrin.h>
# endif
# if defined(__clang__) && (defined(_MSC_VER) || defined(__SCE__)) && !defined(__AVX__) && !defined(SDL_DISABLE_AVX)
#  define SDL_DISABLE_AVX       /* see https://reviews.llvm.org/D20291 and https://reviews.llvm.org/D79194 */
# endif
# if (defined(_MSC_VER) || defined(__AVX__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_AVX)
#  define SDL_AVX_INTRINSICS 1
#  include <immintrin.h>
# endif
# if defined(__clang__) && (defined(_MSC_VER) || defined(__SCE__)) && !defined(__AVX2__) && !defined(SDL_DISABLE_AVX2)
#  define SDL_DISABLE_AVX2      /* see https://reviews.llvm.org/D20291 and https://reviews.llvm.org/D79194 */
# endif
# if (defined(_MSC_VER) || defined(__AVX2__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_AVX2)
#  define SDL_AVX2_INTRINSICS 1
#  include <immintrin.h>
# endif
# if defined(__clang__) && (defined(_MSC_VER) || defined(__SCE__)) && !defined(__AVX512F__) && !defined(SDL_DISABLE_AVX512F)
#  define SDL_DISABLE_AVX512F   /* see https://reviews.llvm.org/D20291 and https://reviews.llvm.org/D79194 */
# endif
# if (defined(_MSC_VER) || defined(__AVX512F__) || defined(SDL_HAS_TARGET_ATTRIBS)) && !defined(SDL_DISABLE_AVX512F)
#  define SDL_AVX512F_INTRINSICS 1
#  include <immintrin.h>
# endif
#endif /* defined(__x86_64__) || defined(_M_X64) || defined(__i386__) || defined(_M_IX86) */