pygame-community / pygame-ce

🐍🎮 pygame - Community Edition is a FOSS Python library for multimedia applications (like games). Built on top of the excellent SDL library.
https://pyga.me
767 stars 120 forks source link

Add SIMD paths to .set_alpha() blit onto opaque #2884

Closed itzpr3d4t0r closed 1 month ago

itzpr3d4t0r commented 1 month ago

The way we currently handle this code uses SDL_BlitSurface internally:

import pygame

pygame.init()

screen = pygame.display.set_mode((500, 500))

s = pygame.Surface((size, size))
s.fill((31, 3, 44))
s.set_alpha(123)

screen.blit(s, (0, 0))

Unfortunately, SDL_BlitSurface only implements a "pixel by pixel" approach, which significantly slows down these types of blits. In comparison, performing a similar blit with a natively alpha surface filled with a specific color is substantially faster. This difference is puzzling because, with a surface alpha, we can reuse the same alpha value for all pixels, whereas we cannot do this in the other case.

Moreover, all our alpha blit algorithms are maintained for compatibility reasons. Specifically, these blitters use special alpha blending formulas to conform to older SDL versions, something the set_alpha() blit mode does not address.

Currently, we have an AVX/SSE implementation for set_alpha(), but it only applies when blitting to another surface with alpha. This PR extends the implementation to handle blitting onto an opaque surface.

This PR also adjusts our AVX2 macros to be slightly more efficient and simpler (code size wise).

As expected, the results show a significant improvement: image

image

A small program to test some sizes and times:

import pygame
from timeit import timeit

pygame.init()

screen = pygame.display.set_mode((500, 500))

data = []

for size in range(10, 500, 10):
    s = pygame.Surface((size, size))
    s.fill((31, 3, 44))
    s.set_alpha(123)

    tot_time = timeit(lambda: screen.blit(s, (0, 0)), number=10000)
    print(f"Finished size {size} in {tot_time:.8f} seconds")
    data.append(tot_time)

print(f"Total time: {sum(data):.8f} seconds")
Starbuck5 commented 1 month ago

I've said this before, I'll say it again, I'm skeptical of bring more blit routine complexity into pygame-ce. We should probably be going the opposite direction entirely.

Or maybe we could go all the way in and take over blitting and Surface allocation entirely and go ham.

I'm going to encourage you again to try to bring these optimizations to SDL itself, so everyone can benefit and we don't have the continuing maintenance burden of it.

Starbuck5 commented 1 month ago

This PR also adjusts our AVX2 macros to be slightly more efficient and simpler (code size wise).

This is much less controversial, could be in another PR and evaluated on its own.