libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.98k stars 1.84k forks source link

SDL3 Android crash SDL_free / SDL_DelTouch #9810

Closed AntTheAlchemist closed 6 months ago

AntTheAlchemist commented 6 months ago

Scudo ERROR: invalid chunk state when deallocating address 0x<sanitized>

A recent update to SDL3 seems to have sparked a new group of crashes being reported in Google Play Console. Has something been done to the touch screen device code that might cause this? I noticed that the reported devices are mostly Motorola, so it could be specific to certain hardware.

Here's the stack log. I don't have anything else to go by because I can't reproduce this end.

libc.so (abort+168)
libc.so (scudo::die()+8)
libc.so (scudo::ScopedErrorReport::~ScopedErrorReport()+32)
libc.so (scudo::reportInvalidChunkState(scudo::AllocatorAction, void*)+76)
libc.so (scudo::Allocator<scudo::AndroidConfig, &(scudo_malloc_postinit)>::deallocate(void*, scudo::Chunk::Origin, unsigned long, unsigned long)+308)
split_config.arm64_v8a.apk!libSDL3.so (SDL_free+28) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
split_config.arm64_v8a.apk!libSDL3.so (SDL_DelTouch) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
split_config.arm64_v8a.apk!libSDL3.so (SDL_QuitTouch) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
split_config.arm64_v8a.apk!libSDL3.so (SDL_VideoQuit) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
split_config.arm64_v8a.apk!libSDL3.so (SDL_QuitSubSystem+436) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
split_config.arm64_v8a.apk!libSDL3.so (SDL_Quit+28) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
split_config.arm64_v8a.apk!libmain.so (SDL_main+108) (BuildId: 5d0c5b4a16e6ccebcd94dec98a8f95d9ba1f5c92)
split_config.arm64_v8a.apk!libSDL3.so (Java_org_libsdl_app_SDLActivity_nativeRunMain) (BuildId: 7422e3b8152f80c1782f4a32419d3f28b2a53c32)
1bsyl commented 6 months ago

It can also be some memory corruption. It would be nice to be able to run valgrind or ASAN UBSAN sanitizer on Android...

AntTheAlchemist commented 6 months ago

Let me dig deeper. It may take some time, as I'm relying on Google Play crash reports.

Notes mainly for me:

slouken commented 6 months ago

I'm sure it's an issue with SDL, I'm just not sure where.

AntTheAlchemist commented 6 months ago

I'm sure it's an issue with SDL, I'm just not sure where.

Um 🙈 , even if I'm doing weird stuff in Java with mSurface and mLayout? In MainActivity, I create a View above the AdView (AdMob banner), which is focusable, to fool the Android TV's D-Pad into moving focus to and from the advert banner. It's the only thing I could come up with to solve Google Play's policy regarding advert banners needing to be interactive with D-Pad.

Would this code be at all responsible?

            case COMMAND_FOCUSBANNERAD:
                if(adView != null) {
                    if(invisibleButton == null) {
                        invisibleButton = new View(this);
                        invisibleButton.setFocusable(true);
                        invisibleButton.setFocusableInTouchMode(true); // I now know this isn't needed, but it's in there in the last upload
                        RelativeLayout.LayoutParams params = new RelativeLayout.LayoutParams(1, 1);
                        adView.setId(ViewCompat.generateViewId());
                        params.addRule(RelativeLayout.ABOVE, adView.getId());
                        invisibleButton.setLayoutParams(params);
                        invisibleButton.setOnFocusChangeListener(new View.OnFocusChangeListener() {
                            @Override public void onFocusChange(View v, boolean hasFocus) {
                                if(hasFocus) {
                                    mSurface.requestFocus();
                                    onNativeKeyDown(KeyEvent.KEYCODE_DPAD_UP);
                                }
                            }
                        });
                        mLayout.addView(invisibleButton);
                    } else
                        invisibleButton.setVisibility(View.VISIBLE);
                    adView.requestFocus(FOCUS_DOWN);
                }
                break;

I clean it up like this:

   @Override protected void onDestroy() {
        billingClient.endConnection();
        if(adView != null)
            mLayout.removeView(adView);
        if(invisibleButton != null)
            mLayout.removeView(invisibleButton);
        super.onDestroy();
    }
slouken commented 6 months ago

That code shouldn't be responsible, but it might tickle a bug in SDL.

Susko3 commented 6 months ago

Same thing on Windows on SDL commit c6cc719067af0f1be6671eb258b79b708945210d.

Upon closing the application when I've used my touchscreen:

image

image

00000075`04cfe3b0 00007ffa`2deefea5     : 000001b4`02334e30 00007ffa`00000001 000001b4`023f6430 00007ffa`03549cf8 : SDL3!free_dbg_nolock+0x17c
00000075`04cfe4b0 00007ffa`2deb37b8     : 000001b4`02334e30 000001b4`00000001 00000000`00000000 00000000`00000000 : SDL3!_free_dbg+0x55
00000075`04cfe4f0 00007ffa`2dcf6ba3     : 000001b4`02334e30 000001b4`023c5280 00000000`00000000 00007ffa`2dbd6453 : SDL3!free+0x28
00000075`04cfe530 00007ffa`2dcf68fe     : 000001b4`02334e30 00007ffa`ffffffff 00007ffa`00000000 000001b4`023c5280 : SDL3!real_free+0x13
00000075`04cfe560 00007ffa`2dbd6f1a     : 000001b4`02334e30 00007ffa`51bd2f4d 000001b4`3ef637d0 00007ffa`0369d8e8 : SDL3!SDL_free+0x1e
00000075`04cfe590 00007ffa`2dbd6fbf     : 00000000`00020051 00007ffa`0369d8e8 000001b4`43516628 00000000`00000000 : SDL3!SDL_DelTouch+0x7a
00000075`04cfe5d0 00007ffa`2ddcfe78     : 00000075`04cfe6a0 00007ffa`0369d8e8 000001b4`43516628 00000000`00000000 : SDL3!SDL_QuitTouch+0x3f
00000075`04cfe610 00007ffa`2dceb608     : 000001b4`00000020 00000000`00000000 00007ff7`22d40000 00007ffa`03549ce0 : SDL3!SDL_VideoQuit+0x18
00000075`04cfe660 00007ffa`2dceb83d     : 00000001`ffffffff 000001b4`436536c8 00000000`00000000 00000000`00000001 : SDL3!SDL_QuitSubSystem+0x148
00000075`04cfe690 00007ffa`2db8788f     : 000001b4`3ef637d0 00000000`00000000 000001b4`3ed712c0 00000000`00000001 : SDL3!SDL_Quit+0x1d
00000075`04cfe6c0 00007ffa`2db875b3     : 00007ffa`2e06e690 00007ffa`2db875e0 00000000`00000000 00007ffa`03549cf8 : SDL3!SDL_QuitMainCallbacks+0x2f
00000075`04cfe6f0 00007ffa`034721f2     : 00000000`00000000 000001b4`3ef637d0 00007ffa`03549ce0 00007ffa`03549cf8 : SDL3!SDL_EnterAppMainCallbacks+0x143
// ...
AntTheAlchemist commented 6 months ago

Crashing at exactly the same place on Android in debug mode as well. This should be easy to track down with some more digging. I wonder if it's related to the messed-up touch up/down events?

Susko3 commented 6 months ago

This issue lies in the SDL_DelFinger SDL_memmove call, leading to a double free during cleanup.

https://github.com/libsdl-org/SDL/blob/dc13c08375df0cd712471557337f363bda9787bc/src/events/SDL_touch.c#L236-L248

Note that touch->fingers is just an array of pointers. This code helpfully removes the deleted touch by moving touches after it over the deleted touch.

Let's say you started with this configuration. This is a real scenario where you tap and hold two fingers.

touch->num_fingers = 2;
touch->max_fingers = 2;
touch->fingers[0] = 0x1234;
touch->fingers[1] = 0x5678;

Now, we release the first tapped finger, that would be the finger at index 0 in our case. Memmove will move the finger at index 1 to fill in the space of the finger at index 0, note that the memory is not moved, but simply copied, and the original memory is not zeroed:

touch->num_fingers = 1;
touch->max_fingers = 2;
touch->fingers[0] = 0x5678;
touch->fingers[1] = 0x5678;

Now release the second tapped finger:

touch->num_fingers = 0;
touch->max_fingers = 2;
touch->fingers[0] = 0x5678;
touch->fingers[1] = 0x5678;

Observe that you have the same pointer twice in the array. That is fine by itself, but let's look at what SDL_DelTouch does.

https://github.com/libsdl-org/SDL/blob/dc13c08375df0cd712471557337f363bda9787bc/src/events/SDL_touch.c#L487-L489

It iterates over the array and frees all pointers, from 0 to max_fingers. So, in our example, it calls:

SDL_free(0x5678);
SDL_free(0x5678);

Whoops.

This seems like a reasonable diff, but it explodes elsewhere:

diff --git a/src/events/SDL_touch.c b/src/events/SDL_touch.c
index 968225169..5cf1592a0 100644
--- a/src/events/SDL_touch.c
+++ b/src/events/SDL_touch.c
@@ -243,6 +243,7 @@ static int SDL_DelFinger(SDL_Touch *touch, SDL_FingerID fingerid)
     if (index < (touch->num_fingers - 1)) {
         SDL_memmove(&touch->fingers[index], &touch->fingers[index + 1], (touch->num_fingers - index - 1) * sizeof(touch->fingers[index]));
     }
+    touch->fingers[touch->num_fingers - 1] = NULL;
     --touch->num_fingers;
     return 0;
 }
AntTheAlchemist commented 6 months ago

Good find 👍🏼 . I will take a proper dive into it later when I have time, but you seem to be way ahead of me already 🍀

slouken commented 6 months ago

Oh, good catch. The finger memory isn't intended to be freed, it should just be swapped with the higher entry, like is done with SDL_touchDevices in SDL_DelTouch()

slouken commented 6 months ago

Does this fix it?

diff --git a/src/events/SDL_touch.c b/src/events/SDL_touch.c
index 968225169..ed882d4b4 100644
--- a/src/events/SDL_touch.c
+++ b/src/events/SDL_touch.c
@@ -241,7 +241,9 @@ static int SDL_DelFinger(SDL_Touch *touch, SDL_FingerID fingerid)
     }

     if (index < (touch->num_fingers - 1)) {
+        SDL_Finger *tmp = touch->fingers[index];
         SDL_memmove(&touch->fingers[index], &touch->fingers[index + 1], (touch->num_fingers - index - 1) * sizeof(touch->fingers[index]));
+        touch->fingers[touch->num_fingers - 1] = tmp;
     }
     --touch->num_fingers;
     return 0;
AntTheAlchemist commented 6 months ago

Does this fix it?

Yes! It fixes the crash on clean-up and now the events are coming through as expected. Tested on Android, but not able to test on Windows.

@Susko3's PR also works.

AntTheAlchemist commented 6 months ago

@Susko3 didn't you previously submit a temp-fix to attempt to bypass this problem? Does that need to be taken out now? I can't remember if it was merged to main or not.

Susko3 commented 6 months ago

A temp fix was applied in a downstream project, so it's unrelated to SDL.