mullvad / mullvadvpn-app

The Mullvad VPN client app for desktop and mobile
https://mullvad.net/
GNU General Public License v3.0
5.13k stars 342 forks source link

Random VPN disconnects with MTE enabled #6349

Open gsture opened 5 months ago

gsture commented 5 months ago

Is it a bug?

I have checked if others have reported this already

Current Behavior

The os randomly says "disconnected from always on VPN" and start blocking connections while the mullvad app still thinks it is connected. The mullvad app notification still says the vpn is connected. It seems like the app is silently crashing and is not aware that the vpn is disconnected. When i press the notification / open the mullvad vpn app the vpn seems to reconnect and the notification goes away.

This seems to happen randomly with MTE (memory tagging extention) enabled. If MTE is disabled this issue does not occur.

Screenshot_20240731-182840~2

Expected Behavior

The vpn should stay connected or reconnect when it's disconnected. It's now like the app is not aware that the vpn is disconnected.

Steps to Reproduce

Install mullvad vpn on pixel 8 with grapheneos and MTE enabled (default) and use phone for a while. Issue happens with both the vpn in main profile and work profile.

Failure Logs

Log 1.txt

Android version

GrapheneOS 14 (AP2A.240705.005)

Device model

2 x Pixel 8 and 1 x Pixel 8a

Mullvad VPN app version

Tested on 2024.2 & 2024.4-beta1

Additional Information

I also reported this to grapheneos but they say it's not an os bug.

gsture commented 5 months ago

This issue is now happening multiple times a day. Makes the vpn kind of unusable. Am i the only one that is experiencing this issue? I have added a failure log.

MrChocolatine commented 5 months ago

Same issue as #6292 that you closed. (just for the record)

albin-mullvad commented 5 months ago

Thanks for the report, we'll look into this! We are aware of some strange behavior when running the app on GrapheneOS and/or Work Profiles but haven't been able to pinpoint the exact problem.

gsture commented 5 months ago

For what it's worth: I never encountered this issue on pixel 5 with grapheneos. It could have something to do with memory tagging feature on newer pixels with grapheneos but that's just me guessing. I will disable it for the mullvad app and test if it makes any difference.

gsture commented 5 months ago

Thanks for the report, we'll look into this! We are aware of some strange behavior when running the app on GrapheneOS and/or Work Profiles but haven't been able to pinpoint the exact problem.

After disabling memory tagging (for mullvad vpn app only) I have not experienced this issue so far. I left it enabled on another pixel 8 and there the error did occur. So it might be worth looking into this.

It might be a coincidence so I will report back at the end of this week but I think the app might not work well with memory tagging enabled at the moment.

gsture commented 4 months ago

Ok so after switching back and forth between memory tagging on and memory tagging off for the last 3 weeks I can almost confidently say that the problem points compatibility issue with MTE. The app functions fine with memory tagging disabled. And when enabled I get random disconnects.

Let me know if you need anything else.

gsture commented 3 months ago

2 more logs regarding this issue.

Main profile: Log 2.txt

Work profile: Log 3.txt

I also updated the main post with up to date information.

Pururun commented 2 months ago

@gsture We have hopefully solved this issue in this PR: https://github.com/mullvad/mullvadvpn-app/pull/6727 We expect to include this fix in the next release (2024.5). Keeping this issue open until you are able to confirm it being fixed.

DianaNites commented 2 months ago

What a coincidence, I just came to this issue from google search with the exact same issue, and in the 3 hours since i opened the tab i learned there'll be a fix and the privacy leaks will finally stop, and that theres a workaround in disabling MTE for Mullvad for now? Incredible timing!

Now if only it was possible to use secure and private LAN services AND have assurance that connections can't bypass the VPN when it crashes... :/

gsture commented 2 months ago

@gsture We have hopefully solved this issue in this PR: #6727 We expect to include this fix in the next release (2024.5). Keeping this issue open until you are able to confirm it being fixed.

I tested this in version 2024.5 beta 1 but now the mullvad app almost immediately crashes when opening with MTE enabled.

Log 24050001.txt

FID02 commented 2 months ago

@gsture I tested this in version 2024.5 beta 1 but now the mullvad app almost immediately crashes when opening with MTE enabled.

Log 24050001.txt

I have been running the 2024.5 beta 1 for a few hours and I'm unable to reproduce any crash when being run with memory tagging and hardened_malloc under GrapheneOS. Perhaps it would be constructive if you also attached the crash log (including the backtrace) that memory tagging produces.


2024.5 beta 1 also seems to have fixed a memory corruption bug that was reproducible when the app was being run with memory tagging and Scudo as the allocator: MTE crashed the app upon every connection to the VPN tunnel. So it appears that Mullvad has fixed a potential security issue that likely affected a lot of their app users. (If you are running GrapheneOS and want to reproduce this crash prior to version 2024.5 beta 1, you can disable hardened_malloc for the app while leaving memory tagging enabled).

I will be running the app with memory tagging for the next few days and report back any issues.

gsture commented 2 months ago

@gsture I tested this in version 2024.5 beta 1 but now the mullvad app almost immediately crashes when opening with MTE enabled.

Log 24050001.txt

I have been running the 2024.5 beta 1 for a few hours and I'm unable to reproduce any crash when being run with memory tagging and hardened_malloc under GrapheneOS. Perhaps it would be constructive if you also attached the crash log (including the backtrace) that memory tagging produces.

2024.5 beta 1 also seems to have fixed a memory corruption bug that was reproducible when the app was being run with memory tagging and Scudo as the allocator: MTE crashed the app upon every connection to the VPN tunnel. So it appears that Mullvad has fixed a potential security issue that likely affected a lot of their app users. (If you are running GrapheneOS and want to reproduce this crash prior to version 2024.5 beta 1, you can disable hardened_malloc for the app while leaving memory tagging enabled).

I will be running the app with memory tagging for the next few days and report back any issues.

Try to open the mullvad app a few times in a row, maybe open another app in between. It will crash sooner or later with MTE enabled. I was trying out new daita and shadowsocks settings in normal and work profile so had to switch back and forth between browser and mullvad in main and work profile a lot and that was undoable with MTE enabled.

Pururun commented 2 months ago

Thanks for testing @gsture will check on our Graphene phone and see if I can replicate.

If possible could you test with DAITA off and see if you still get the crashes?

Pururun commented 2 months ago

Also @FID02 since you are not seeing the crash, are you using DAITA?

FID02 commented 2 months ago

Also @FID02 since you are not seeing the crash, are you using DAITA?

DAITA is disabled on my end.

gsture commented 2 months ago

Thanks for testing @gsture will check on our Graphene phone and see if I can replicate.

If possible could you test with DAITA off and see if you still get the crashes?

I tried it with shadowsocks on and off, with daita and without and a combination of the two. That is not the problem in this case. As soon as i turn on MTE the app experiences crashes.

To reproduce: switch from mullvad to desktop to another app and back to mullvad sometimes it takes a few rounds but eventually the app crashes.

FID02 commented 2 months ago

I'm able to reproduce the issue that gsture is experiencing – or at least, I can reproduce an app shutdown when switching between profiles (the system is not reporting an app crash).

Pixel 8 GrapheneOS 2024091900

Steps to reproduce:

  1. Install Mullvad in the Owner profile, and make sure memory tagging is enabled for it
  2. Sign in to the app and connect, and keep the default settings
  3. Switch to a secondary profile and install the app there as well, and make sure memory tagging is enabled for it
  4. In the secondary profile, sign in to the app and connect, and keep the default settings
  5. Switch from – without ending the session – the secondary profile to the Owner profile

Observe that the app shuts down.

  1. Switch back to the secondary profile and observe the same app shutdown

If it cannot be immediately reproduced, switch profiles until it can be.

Here is the app log file produced by the system, taken from the Owner profile right after being reproduced, by going to the app's App info and selecting View logs: Mullvad VPN log 9f4a981faec5.txt

Pururun commented 2 months ago

@gsture Thanks for the confirmation of it not being DAITA.

Is it possible to provide more detailed explanation of how you achieve this? For example if you are connected or not while switching to another app, if you need to switch to another app or you can just go to the desktop app and go back? Any information is highly appreciated.

I have tried leaving the app and opening fdroid and going back I have yet to receive this crash.

I tested on the beta2 (should be the same as beta1) on a Pixel 8 running GrapheneOS 2024091900.

Here are my app settings as well if you can spot any difference:

Screenshot_20240925-114117

FID02 commented 2 months ago

gsture explained that they reproduced it while switching their workflow between a work profile and another profile. "Profile switching" seems to be the key words here, as I was able to reproduce the behaviour that gsture described with the steps I outlined above. As gsture wrote, the only setting you need is memory tagging enabled for Mullvad.

gsture commented 2 months ago

gsture explained that they reproduced it while switching their workflow between a work profile and another profile. "Profile switching" seems to be the key words here, as I was able to reproduce the behaviour that gsture described with the steps I outlined above. As gsture wrote, the only setting you need is memory tagging enabled for Mullvad.

I don't "switch" between profiles. A work profile is a separate tab in the app drawer where apps are isolated from the main profile. Although i have a work profile setup with 2 VPNs running the crash happens just with switching between apps even if the work profile is disabled.

gsture commented 2 months ago

@gsture Thanks for the confirmation of it not being DAITA.

Is it possible to provide more detailed explanation of how you achieve this? For example if you are connected or not while switching to another app, if you need to switch to another app or you can just go to the desktop app and go back? Any information is highly appreciated.

I have tried leaving the app and opening fdroid and going back I have yet to receive this crash.

I tested on the beta2 (should be the same as beta1) on a Pixel 8 running GrapheneOS 2024091900.

Here are my app settings as well if you can spot any difference:

I just switch back and forth from desktop to vpn to another app and that sometimes it crashes immediately sometimes it takes a few switches.

My app settings are the same.

Here is a screen record:

https://github.com/user-attachments/assets/e4048691-2648-43c0-80c1-3069c8b8c937

Pururun commented 2 months ago

@gsture Thank you for the video. I think I manage to get the crash maybe once perhaps after trying a lot times. Are you sure it is related to mte though? At least for me when I get a crash related to mte, the system will tell me, but it could just be the settings on the phone.

gsture commented 2 months ago

@gsture Thank you for the video. I think I manage to get the crash maybe once perhaps after trying a lot times. Are you sure it is related to mte though? At least for me when I get a crash related to mte, the system will tell me, but it could just be the settings on the phone.

I don't get an error message when it happens but I didn't get one with previous bug either. If I disable MTE I can keep switching without a crash. I just tried about 100 times and couldn't reproduce.

Although this is a different bug then the one I filed the report for I am quite certain it is MTE related.

Pururun commented 1 month ago

@gsture Thank you for the video. I think I manage to get the crash maybe once perhaps after trying a lot times. Are you sure it is related to mte though? At least for me when I get a crash related to mte, the system will tell me, but it could just be the settings on the phone.

I don't get an error message when it happens but I didn't get one with previous bug either. If I disable MTE I can keep switching without a crash. I just tried about 100 times and couldn't reproduce.

Although this is a different bug then the one I filed the report for I am quite certain it is MTE related.

I understand. I did some tests and it seems to be the same for me that if you turn off MTE this crash does not happen. We did some tests on the developer preview of Android 15 which includes MTE support but did not manage to replicate the crash. This could indicate that there is something specific to how Graphene handles MTE or that the MTE thing is kind of accidental.

One thing that I have noticed is for the crash to trigger for me is that I need to have multiple profiles running. I was not able to trigger the crash until I did the step provided by @FID02 and after I restarted the phone I was not able to trigger the crash until I at least had Mullvad running on multiple profiles. Is that something that either of you noticed?

gsture commented 1 month ago

@gsture Thank you for the video. I think I manage to get the crash maybe once perhaps after trying a lot times. Are you sure it is related to mte though? At least for me when I get a crash related to mte, the system will tell me, but it could just be the settings on the phone.

I don't get an error message when it happens but I didn't get one with previous bug either. If I disable MTE I can keep switching without a crash. I just tried about 100 times and couldn't reproduce. Although this is a different bug then the one I filed the report for I am quite certain it is MTE related.

I understand. I did some tests and it seems to be the same for me that if you turn off MTE this crash does not happen. We did some tests on the developer preview of Android 15 which includes MTE support but did not manage to replicate the crash. This could indicate that there is something specific to how Graphene handles MTE or that the MTE thing is kind of accidental.

One thing that I have noticed is for the crash to trigger for me is that I need to have multiple profiles running. I was not able to trigger the crash until I did the step provided by @FID02 and after I restarted the phone I was not able to trigger the crash until I at least had Mullvad running on multiple profiles. Is that something that either of you noticed?

Do you mean seperate user profiles or a work profile setup for 1 user?

I do not use multiple user profiles. I do use a work profile to seperate work related stuff but this crash also happens if I disable it.

FID02 commented 1 month ago

One thing that I have noticed is for the crash to trigger for me is that I need to have multiple profiles running. I was not able to trigger the crash until I did the step provided by @FID02 and after I restarted the phone I was not able to trigger the crash until I at least had Mullvad running on multiple profiles.

That's my experience too, so far. Please note that I do not use work profiles.

thestinger commented 1 month ago

@Pururun You can enable MTE on current stable stock Pixel OS releases too. You need to do more than enabling the developer option which only allows using MTE and doesn't actually enable it for anything. It requires enabling it via ADB. See here:

https://developer.android.com/ndk/guides/arm-mte

GrapheneOS has our own implementation of memory tagging as part of our hardened allocator that's much better than the standard one and covers a lot more allocations along with having stronger security properties. GrapheneOS users can use the same implementation of MTE available via ADB in the stock OS by disabling the hardened allocator with MTE enabled for the app, which will use their implementation of it as part of the standard allocator instead.

Our implementation does find more bugs as part of being more hardened. We also find a lot of memory corruption bugs with our hardened allocator even without MTE but they tend to be ones which would have caused problems for apps at some point while MTE tends to find more subtle things since it catches essentially any heap memory corruption for both reads and writes. Reading out of bounds of a heap allocation would only be caught in certain cases with our hardened allocator via the guard pages around all large allocations, guard slabs around medium sized ones and guard slabs around groups of allocations for small ones. For example, a 16 byte allocation is in a slab with 256x 16 byte allocations with a random slot chosen for it. A read overflow past the end will only be caught without MTE if it's placed in the last slot in the slab since it will hit the guard slab between every slab on GrapheneOS. Allocations from 16k through 128k have a single allocation per slab and above 128k have their own mapping with a randomly sized guard region before and after.

There are also canaries for small allocations and other features even without MTE, but the canaries only catch write overflows on free, not reads, and not a write where it's never freed. They mainly exist to absorb small overflows and to contain C string overflows via a leading zero byte.

thestinger commented 1 month ago

It can be frustrating narrowing down the bugs but in many cases the would have caused problems for users outside GrapheneOS on edge cases or in the future. It's fairly common for the bugs uncovered in regular use to be security vulnerabilities. We've reported several serious security vulnerabilities including a remotely exploitable Bluetooth issue assigned High severity for the Android Open Source Project which were found via MTE during regular use not any special testing or fuzzing.

Pururun commented 1 month ago

@gsture Thank you for the video. I think I manage to get the crash maybe once perhaps after trying a lot times. Are you sure it is related to mte though? At least for me when I get a crash related to mte, the system will tell me, but it could just be the settings on the phone.

I don't get an error message when it happens but I didn't get one with previous bug either. If I disable MTE I can keep switching without a crash. I just tried about 100 times and couldn't reproduce. Although this is a different bug then the one I filed the report for I am quite certain it is MTE related.

I understand. I did some tests and it seems to be the same for me that if you turn off MTE this crash does not happen. We did some tests on the developer preview of Android 15 which includes MTE support but did not manage to replicate the crash. This could indicate that there is something specific to how Graphene handles MTE or that the MTE thing is kind of accidental. One thing that I have noticed is for the crash to trigger for me is that I need to have multiple profiles running. I was not able to trigger the crash until I did the step provided by @FID02 and after I restarted the phone I was not able to trigger the crash until I at least had Mullvad running on multiple profiles. Is that something that either of you noticed?

Do you mean seperate user profiles or a work profile setup for 1 user?

I do not use multiple user profiles. I do use a work profile to seperate work related stuff but this crash also happens if I disable it.

I tested first with a work profile, but I could not replicate the issue. So for me it has only been happening with multiple user profiles. But good to know that it does not factor in for you.

thestinger commented 1 month ago

There's a Go issue filed in September 2018 for the issue of it reading entire pages containing objects instead of only the objects themselves based on Valgrind finding the bug earlier:

https://github.com/golang/go/issues/27610

It was fixed since they wanted to keep it as an optimization although it's hard to see how reading entire pages would be faster instead of it at least being limited to cacheline multiples, but either way it's wrong for something outside the core platform to assume this is safe. It's clearly undefined behavior and broken with memory tagging via MTE but also other forms of memory tagging such as SPARC ADI from many years ago. There are also other cases it could break. Assuming memory protection is only enforced at page granularity is wrong and while libc depending on something that's undefined behavior where they control how it's implemented in the compiler/libc, it's not the case for Go assuming things about the C implementation and platform this way.

thestinger commented 1 month ago

There are probably more Go issues like this causing other problems.

bedair81 commented 1 month ago

I can confirm this issue persists with the android 2024.5 beta 2 release, app crashes with MTE enabled in the background every so often, works fine with MTE disabled. I do not use multiple user profiles, not even a work profile.

Pururun commented 1 month ago

I can confirm this issue persists with the android 2024.5 beta 2 release, app crashes with MTE enabled in the background every so often, works fine with MTE disabled. I do not use multiple user profiles, not even a work profile.

Thanks for the report. If I understand you correctly Mullvad is crashing in the background while you are using another app? I assume you are connected when this happened? Do you get any indication from the system that Mullvad has crashed or how do you know that it has crashed?

bedair81 commented 1 month ago

Do you get any indication from the system that Mullvad has crashed

I don't get an MTE notification informing me of the crash, however I have the "Always-on VPN" toggle enabled, as well as the "Block connections without VPN" toggle enabled. This means that I have a persistent notification whenever the VPN is disconnected, and my internet is blocked whenever the VPN app(mullvad in this case) crashes in the background. When MTE is enabled on mullvad 2024.5 beta 2, the app crashes in the background(seemingly randomly) and leads me to have to reopen the mullvad app and reconnect in order to restore connectivity. It is quite persistent however and so I am currently having to use the app with MTE disabled.

Pururun commented 1 month ago

Do you get any indication from the system that Mullvad has crashed

I don't get an MTE notification informing me of the crash, however I have the "Always-on VPN" toggle enabled, as well as the "Block connections without VPN" toggle enabled. This means that I have a persistent notification whenever the VPN is disconnected, and my internet is blocked whenever the VPN app(mullvad in this case) crashes in the background. When MTE is enabled on mullvad 2024.5 beta 2, the app crashes in the background(seemingly randomly) and leads me to have to reopen the mullvad app and reconnect in order to restore connectivity. It is quite persistent however and so I am currently having to use the app with MTE disabled.

Right I understand, very unfortunate experience. :/

@FID02 @gsture I have had a very hard time replicating this crash after updating to Graphene OS to 2024092900. If you have updated to that version of Graphene have you seen any difference in the frequency of the crash?

gsture commented 1 month ago

@FID02 @gsture I have had a very hard time replicating this crash after updating to Graphene OS to 2024091900. If you have updated to that version of Graphene have you seen any difference in the frequency of the crash?

You mean 2024092900 right? I just tried. The crash will eventually still happen for me but it does seem to take longer now to reproduce.

Pururun commented 1 month ago

@FID02 @gsture I have had a very hard time replicating this crash after updating to Graphene OS to 2024091900. If you have updated to that version of Graphene have you seen any difference in the frequency of the crash?

You mean 2024092900 right? Is just tried. The crash will eventually still happen for me but it does seem to take longer now to reproduce.

Yes sorry, 2900. Thanks, that is good to know. 👍

Pururun commented 1 month ago

Short update on this: Biggest blocker is currently that we are unable to get any stacktrace or anything from the android logs in regards to this crash. We have a potential solution for this, but until it is possible to reproduce this crash in any consistent manner it is hard to verify anything.

We are currently not prioritizing this, but please update the issue if you have any new findings or updates.

FID02 commented 1 month ago

I have had a very hard time replicating this crash after updating to Graphene OS to 2024092900.

To me, on GrapheneOS 2024092900 and with memory tagging and hardened_malloc enabled for the app, and VPN killswitch is enabled, the bug that I reported that is triggerable by switching profiles is equally reproducible. But, I noticed that you don't need to have Mullvad running in the secondary profile to trigger the bug – you can have any VPN running in the secondary profile as long as Mullvad is connected in Owner.

Biggest blocker is currently that we are unable to get any stacktrace or anything from the android logs in regards to this crash.

The bug that I reported does not produce a reported crash from MTE. The VPN gets disconnected and the app closes, but the OS does not report a crash.

Perhaps Arm's documentation on debugging with MTE might be useful here?

Pururun commented 1 month ago

I have had a very hard time replicating this crash after updating to Graphene OS to 2024092900.

To me, on GrapheneOS 2024092900 and with memory tagging and hardened_malloc enabled for the app, and VPN killswitch is enabled, the bug that I reported that is triggerable by switching profiles is equally reproducible. But, I noticed that you don't need to have Mullvad running in the secondary profile to trigger the bug – you can have any VPN running in the secondary profile as long as Mullvad is connected in Owner.

Great thanks for the report. Just to make it 100% clear this is the process:

  1. Have Mullvad installed and connected on the "Owner" profile.
  2. Have another VPN running on the second profile.
  3. Switch between the profiles and click on Mullvad

So for some clarification:

Biggest blocker is currently that we are unable to get any stacktrace or anything from the android logs in regards to this crash.

The bug that I reported does not produce a reported crash from MTE. The VPN gets disconnected and the app closes, but the OS does not report a crash.

Perhaps Arm's documentation on debugging with MTE might be useful here?

Since we have so far not being able to reproduce it outside of Graphene our current working theory is that related to hardened_malloc and not to MTE directly.

We also do some internal exception logging that prevents the android system from collecting crash logs, but unfortunately every time I disable the internal exception logging I am unable to reproduce the crash. Which means they might be related, but probably it is just a side effect of the crash being hard to reproduce.

FID02 commented 1 month ago

Great thanks for the report. Just to make it 100% clear this is the process:

Have Mullvad installed and connected on the "Owner" profile.

Have another VPN running on the second profile.

Switch between the profiles and click on Mullvad

Actually I just observed that you don't need to have a VPN running in a secondary profile to trigger the disconnect when switching to the Owner profile. But you might need to be actively using the secondary profile for a few minutes before switching back to the owner profile (I was using it for ~10 minutes just now before switching back to Owner and getting a disconnect). Sometimes the disconnect won't trigger immediately, but will trigger if you open the app and browse the menus for a bit.

When you talk about "killswitch" do you mean the setting in VPN settings "Block connections without VPN"

Yes. Haven't tested with that setting disabled, though. Can try that.

How often does the crash trigger? Every time, every 10 times?

The disconnect seems to trigger about 50% of the time. Right estimate.

Since we have so far not being able to reproduce it outside of Graphene our current working theory is that related to hardened_malloc and not to MTE directly.

I will try for a while with hardened_malloc disabled but MTE enabled for the app and report back.

FID02 commented 1 month ago

Since we have so far not being able to reproduce it outside of Graphene our current working theory is that related to hardened_malloc and not to MTE directly.

I have now been running 2024.5-beta2 for 3 days with hardened_malloc disabled and memory tagging enabled for the Mullvad app, and I'm unable to reproduce any random VPN disconnects or other app issues. This includes the disconnects that occur when switching between profiles – I've done that quite a lot the few days and haven't had any disconnects when switching profiles.

Daita is disabled, default in-app settings are kept, and "block connections without VPN" is enabled.

I can only assume that GrapheneOS' implementation of memory tagging is detecting more memory safety issues than the Android standard implementation with Scudo.

thestinger commented 1 month ago

@FID02 What about with hardened_malloc enabled but memory tagging disabled?

FID02 commented 1 month ago

What about with hardened_malloc enabled but memory tagging disabled?

I will run Mullvad with that configuration for a few days and report back.

bedair81 commented 1 month ago

What about with hardened_malloc enabled but memory tagging disabled?

This configuration doesn't crash, I tested it myself

gsture commented 1 month ago

374526433-1b91ed31-81c8-4368-9b36-b7dcecfb5c0e

Although they do happen less often, at least for me these mte issues are still present.

I think there are 2 separate issues:

1: The crashing when you switch between apps but then the vpn does reconnect when the app restarts. 2: The bug as seen in the screenshot and where i started the report with is also still there. Here the vpn is disconnected but the app still thinks it is connected. I have taken this screenshot just now after enabling MTE 1 hour ago.

With MTE disabled and hardened_malloc enabled the app functions fine, none of the above issues happen.

With hardened_malloc disabled and MTE enabled I did not have this error yet but will do some more testing.

Pururun commented 1 month ago

Thanks everyone that is testing, much appreciated ❤️

@thestinger

Thanks for the information. I agree it is quite likely that the crash indicate some kind of memory issue. Unfortunately I am a bit stuck since the crash is quite random and I am unable to get any stacktraces.

No stacktrace seems to be logged anywhere and it seems like tombstoned is not able to write the stacktrace to file. I tried to disable our own internal exception logger, but I still do not get any logs.

This is what I get from logcat:

10-09 11:41:17.328  5447  5570 I net.mullvad.mullvadvpn: Thread[5,tid=5570,WaitingInMainSignalCatcherLoop,Thread*=0xc00d6a50a999400,peer=0x12c08020,"Signal Catcher"]: reacting to signal 3
10-09 11:41:17.328  5447  5570 I net.mullvad.mullvadvpn: 
10-09 11:41:17.760   842   842 I tombstoned: received crash request for pid 5447
10-09 11:41:17.760   842   842 I tombstoned: found intercept fd 512 for pid 5447 and type kDebuggerdJavaBacktrace
10-09 11:41:17.761  5447  5570 I net.mullvad.mullvadvpn: Wrote stack traces to tombstoned
10-09 11:41:17.762  8219  8267 I dumpstate: libdebuggerd_client: done dumping process 5447
10-09 11:41:17.762   842   842 W tombstoned: skipping tombstone file creation due to intercept

Do you know if Graphene does anything special in regards to tombstoned?

thestinger commented 1 month ago

There should be nothing special. You can try disabling exec-based spawning (secure spawning) to restore some of the lost debugging capabilities but it shouldn't impact that.

gsture commented 1 month ago

Small update:

With MTE disabled and hardened_malloc enabled: no issues.

With MTE enabled and hardened_malloc disabled: also no issues.

So far I have only encounter above issues with both MTE and hardened_malloc enabled together.

Hope this helps.

FID02 commented 1 month ago

What about with hardened_malloc enabled but memory tagging disabled?

I have been unable to reproduce any issues when switching profiles with the above configuration. That aligns with other user reports.