Closed itzpr3d4t0r closed 1 month ago
I've said this before, I'll say it again, I'm skeptical of bring more blit routine complexity into pygame-ce. We should probably be going the opposite direction entirely.
Or maybe we could go all the way in and take over blitting and Surface allocation entirely and go ham.
I'm going to encourage you again to try to bring these optimizations to SDL itself, so everyone can benefit and we don't have the continuing maintenance burden of it.
This PR also adjusts our AVX2 macros to be slightly more efficient and simpler (code size wise).
This is much less controversial, could be in another PR and evaluated on its own.
The way we currently handle this code uses
SDL_BlitSurface
internally:Unfortunately, SDL_BlitSurface only implements a "pixel by pixel" approach, which significantly slows down these types of blits. In comparison, performing a similar blit with a natively alpha surface filled with a specific color is substantially faster. This difference is puzzling because, with a surface alpha, we can reuse the same alpha value for all pixels, whereas we cannot do this in the other case.
Moreover, all our alpha blit algorithms are maintained for compatibility reasons. Specifically, these blitters use special alpha blending formulas to conform to older SDL versions, something the set_alpha() blit mode does not address.
Currently, we have an AVX/SSE implementation for set_alpha(), but it only applies when blitting to another surface with alpha. This PR extends the implementation to handle blitting onto an opaque surface.
This PR also adjusts our AVX2 macros to be slightly more efficient and simpler (code size wise).
As expected, the results show a significant improvement:![image](https://github.com/pygame-community/pygame-ce/assets/103119829/9a83ae01-9c7b-45f3-a569-85b44b5262dc)
A small program to test some sizes and times: