pygame-community / pygame-ce

🐍🎮 pygame - Community Edition is a FOSS Python library for multimedia applications (like games). Built on top of the excellent SDL library.
https://pyga.me
767 stars 120 forks source link

Add missing check in SSE2 alpha blitter #2896

Closed Starbuck5 closed 1 month ago

Starbuck5 commented 1 month ago

Fixes this reported segfault: https://github.com/pygame-community/pygame-ce/issues/2694#issuecomment-2140045572

LOOP_UNROLLED4, "Duff's Device", doesn't handle looping 0 times properly, it needs a pre check before entering that there are loops to be had. I forgot to add this while porting the SSE no_surf_alpha_opaque_dst blitter to use the modern stride switching conventions (see https://github.com/pygame-community/pygame-ce/pull/2601)

Without this check, it was running the 4 pixels SIMD block 3 times when it was supposed to be running it 0 times. This led to the dst pointer drifting farther off until it left the memory it was supposed to be in.

Luckily, this was caught in the dev window for 2.5.0, so this regression never made it into a stable release. Whew.

Thank you so much to @MrDixioner for testing the dev release and reporting that! ❤️

The diff looks more complicated than it is, all I did was add the check and indent the stuff that's now one level lower.

Starbuck5 commented 1 month ago

For reviewers:

Final reproducer:

import pygame
pygame.init()

screen=pygame.display.set_mode((200, 20))

pine1_img = pygame.Surface((1376, 20), pygame.SRCALPHA)

# -1372 is fine, -1373, -1374, -1375 segfaults, -1376 is fine
# experiments also revealed a region 197-199 with the same behavior.

screen.blit(pine1_img, (-1373, 0))

print("Done?")
itzpr3d4t0r commented 1 month ago

Can't we directly implement this check inside the macro so that we don't miss it in the future and makes code less indented and easier to read? All other uses would need to be readjusted after this though.

Starbuck5 commented 1 month ago

Can't we directly implement this check inside the macro so that we don't miss it in the future and makes code less indented and easier to read? All other uses would need to be readjusted after this though.

That's not a bad idea. However, a lot of places don't need this check. For example the normal pixel by pixel blitters don't have it because 0 width blits exit before they get there.

Maybe a LOOP_UNROLLED_SAFE macro that is safe for 0 widths as well?

I'd like to keep my current implementation as of now because it's a simple patch that gets it up to the standard used by the other SIMD blitters.

I think a real takeaway from this is that using the large "RUN_BLITTER" macros is not just less code but less opportunity for manual iteration mistakes like this one. Maybe we can get the alpha blitters ported to them in the future.