One method to fix the NaN/INT_MAX segfault in blit

MyreMylar commented 1 month ago

fixes #2694

The logic being that these values are so large that they can only realistically come from bad data, but also we don't want to error out inside blit?

Hopefully bits earlier up the chain - like Rect/Frect can catch these invalid values and error there.

Starbuck5 commented 1 month ago

I'd like to hold on merging this until I understand it, somewhat selfishly.

Um, how did the original poster get nan values in the first place?

damusss commented 1 month ago

I'd like to hold on merging this until I understand it, somewhat selfishly.

Um, how did the original poster get nan values in the first place?

They didn't, and their code did not produce any segfaults for me and for other contributors. The nan was added by MyreMylar and we discovered it segfaults.

MyreMylar commented 1 month ago

That was the second person that posted a segfault in that thread. The first person mentioned nan values from a calculation that they then resolved themselves. The second person posting a similar segfault mentioned just holding down the right key which I couldn't reproduce myself at the time but it may have been because I just didn't hold it down long enough as the fundamental issue seems to be out of range large position values sent into blit sometimes causing a segfault.

I don't know if this is the final solution myself. I think FRect shouldn't accept NaN values either, though we should probably guard against these overflow int values in blit anyway in case they creep in from elsewhere.

I'd like to find out exactly where in blit the segfault originates too and also if there is any other position values that are in range that would cause this segfault here. It is clearly not just the INT_MIN value on its own as you also need the y value here and the size of the blotted surface to cause something else to fail.

On Sat, 1 Jun 2024, 06:56 Damus666, @.***> wrote:

I'd like to hold on merging this until I understand it, somewhat selfishly.

Um, how did the original poster get nan values in the first place?

They didn't, and their code did not produce any segfaults for me and for other contributors. The nan was added by MyreMylar and we discovered it segfaults.

— Reply to this email directly, view it on GitHub https://github.com/pygame-community/pygame-ce/pull/2893#issuecomment-2143311204, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGDGGTLTZPSEK6FFSQPFRLZFFPB3AVCNFSM6AAAAABIROFBYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBTGMYTCMRQGQ . You are receiving this because you authored the thread.Message ID: @.***>

MyreMylar commented 1 month ago

Poking at my segfault test program a bit more it looks like it is related to the width of the blit itself & the INT_MAX / INT_MIN values overflowing. Try this program and mess around with the numbers a bit:

import pygame

int_max = 2147483647

surf_width = 1057
int_max_buffer = surf_width + 32

for x in reversed(range(0, 700)):
    print(x)
    pygame.display.init()
    screen = pygame.display.set_mode((x, 600))
    test_surf = pygame.Surface((surf_width, 400), flags=pygame.SRCALPHA)
    pos_rect = pygame.FRect(int_max - int_max_buffer, 200.0, 100.0, 100.0)
    screen.blit(test_surf, pos_rect)
    pygame.display.quit()

the surface to be blitted needs to be overlapping the target vertically here to get into the 'danger zone' but then you can turn the segfault on or off by having an x position of anything greater than: INT_MAX - surface_width - 32

MyreMylar commented 1 month ago

That suggests to me that is specifically something to do with clipping the source surface to the destination surface when the right hand side of the source surface position is greater than INT_MAX. I mean maybe the other sides would also crash in the clipping but that one gets hit first because it is furthest out in the pygame coordinate system.

Basically we need to keep all four corners of the surface-to-be-clipped rectangle inside INT_MIN to INT_MAX - possibly with a small safety buffer as well.

MyreMylar commented 1 month ago

Further refinement to strip out FRect as that is not required for the segfault:

import pygame

int_max = 2147483647
surf_width = 2048

for x in reversed(range(0, 400)):
    print(x, surf_width)
    int_max_buffer = (
        surf_width 
    )  # subtract values here (e.g. 32, 64) and you will get a segfault pretty quick
    pygame.display.init()
    screen = pygame.display.set_mode((x, 600))
    test_surf = pygame.Surface((surf_width, 400), flags=pygame.SRCALPHA)
    screen.blit(test_surf, (int_max - int_max_buffer, 200.0))
    pygame.display.quit()

I think this probably makes it about as clear as it can get debugging from the python side. If anyone can narrow it down on the C side somehow that would be useful too.

MyreMylar commented 1 month ago

OK, I adjusted the early exits to just always account for the dimensions of the surfaces and I can't make it segfault now. We could check for the dimensions of the source surface minus the dimensions of the destination surface - but that is extra math for something that I don't believe there is any legitimate reason to be doing (surfaces can't actually get very large so these blits would all be clipped anyway if the clipping wasn't segfaulting). Doing it this way we will early exit with stupid source rect parameters whatever the width of the destination surface.

Other things we could do:

Make Rect handle float("nan") and float("inf) so they don't get converted to INT_MIN when passed in via various methods. See:
```
import pygame
```

rect = pygame.Rect(float("nan"), float("inf"), float("nan"), float("inf")) print(rect)

Output:

Rect(-2147483648, -2147483648, -2147483648, -2147483648)


Meanwhile:
```python
int(float("NaN"))

Output:

ValueError: cannot convert float NaN to integer

And:

int(float("inf"))

Output:

OverflowError: cannot convert float infinity to integer

Add a note to the SDL issue on this topic here: https://github.com/libsdl-org/SDL/issues/8879 if we think we have anything useful to add. It sounds like SDL is already aware of INT_MAX/INT_MIN overflows in the code base but hasn't come up with a definitive solution, or decided if there is one.

pygame-community / pygame-ce

One method to fix the NaN/INT_MAX segfault in blit #2893