realm / realm-js

Realm is a mobile database: an alternative to SQLite & key-value stores
https://realm.io
Apache License 2.0
5.62k stars 558 forks source link

macOS Electron crash report `EXC_BAD_ACCESS` referencing `realm util logger` #6337

Closed GitMurf closed 1 week ago

GitMurf commented 4 months ago

How frequently does the bug occur?

Always

Description

We have been facing an issue for over a week now and have tried everything under the sun to pinpoint the problem but are stuck. I am a windows guy and the problems we are having are only on Mac so it is a bit out of my expertise area.

I apologize for the limited info (these screenshots are what I have access to right now but could get more info if needed).

Our electron app is working fine until Mac users close the app. Then they get a Mac crash error message saying the app quit unexpectedly and then gives a EXC_BAD_ACCESS SIGBUS exception and KERN_PROTECTION_FAILURE.

It appears something to do with a native addon as there are references to NAPI and then Realm DB logger (see screenshots) realm util logger. We have been using Realm with Electron for 2 years and have never had this problem until recently. Seems to be a combo (potentially) of newer MacOS (Sonoma) and realm 12.4/5. We started facing it when updated to 12.4 but it was reported sporadically… but now on 12.5 it happens every time our electron app closes.

I understand this is not a ton to go off, and can certainly dig to get more info from our Mac testers, but I wanted to see if any of this jumped out to anyone or rang any bells and then if you could provide instructions of what further info you need to help diagnose?

Thank you so much!

IMG_0788

IMG_0789

IMG_0790

Stacktrace & log output

See screenshots

Can you reproduce the bug?

Always

Reproduction Steps

When closing an electron app that uses Realm.

Version

12.5

What services are you using?

Local Database only

Are you using encryption?

No

Platform OS and version(s)

macOS Sonoma

Build environment

Which debugger for React Native: ..

Cocoapods version

No response

kneth commented 4 months ago

@GitMurf Thank you for reporting it.

In v12.0.0 we changed to a new logger, and since it is a singleton shared across threads - and processes? - it can be that you are hitting some kind of race condition when deallocating memory at shutdown.

Do you create your own logger? Or do you use the default logger? Which log level do you specify (if you create your own logger)?

GitMurf commented 4 months ago

@kneth thanks for the support!

So currently we are not setting the logger at all as we only configured it to run when we are debugging. Just to make sure, I pasted in our related code showing debug mode is set to false so we are not doing any logging other than the default? Is a default logger set when opening a realm connection?

const DEBUG_MODE = false as boolean;

interface RealmLog {
  message: string;
  level: string;
}

const logs: RealmLog[] = [];
const masterLogs: RealmLog[] = [];
let loggerSet = false;

// other code

if (DEBUG_MODE && IS_DEV && !loggerSet) {
  loggerSet = true;
  Realm.setLogger((level, message) => {
    logs.push({ level, message });
  });
  Realm.setLogLevel('all');
}

Another thing to note is that we do open up two realm databases (not sure if that matters).

GitMurf commented 4 months ago

Some additional info if it helps at all. Basically some hunches we've had and/or have tried to explore on potential issues:

  1. We are using Electron. I reported to them as well and they seemed to feel fairly confident it was not Electron (but who knows if it is a combination of things)
  2. Our app is signed and notarized through Apple. It has been signed for several months now so while it could be related, we do not think it is the direct cause of the problem as it worked fine for a couple months seemingly.
  3. Due to signing and notarization and allowing for certain macos access, we then have to use a plist. Could there be some sort of mixup with the plist entitlements that cause issue with realm / access etc.? I am not a mac guy so have no clue but wanted to share all the thoughts we've had 🤷‍♂️
  4. It is happening on macos sonoma. Is it something to do with something apple has done with macos lately? we figured probably not as if that was the case you would likely be getting a lot of reports of similar issues by now.
  5. We first discovered this about 3-4 weeks ago.
  6. We have had our app running fine for almost 2 years now and have had zero issues in the past like this.
  7. We have been on v12 of realm since July so I do not think it was totally a v12 thing. It seemed to start showing itself around v12.3 / v12.4.
GitMurf commented 4 months ago

Additionally some other thoughts/hunches ... I noticed a couple of items that stood out to me in the v12.4 realm release that I wonder could be related to the problem.

We use electron forge to build/package our electron app and it uses electron builder under the hood. Could this fix have broke something?

image

I saw reference to an update to the logger in v12.4... could that have caused something?

image

kneth commented 4 months ago

@GitMurf

I saw reference to an update to the logger

I don't believe so - it is related to inspect() so it is easier to dump objects while debugging (in node.js).

Due to signing and notarization and allowing for certain macos access, we then have to use a plist

We have an Electron app using Realm JS, and we are switch to a new notarization mechanism. And plist has also been once of our challenges. The PR hasn't been merged but you might want to take a look at https://github.com/realm/realm-studio/pull/1615.

Is it something to do with something apple has done with macos lately?

Notarizing has definitely changed recently.

bwachter commented 4 months ago
3. Due to signing and notarization and allowing for certain macos access, we then have to use a plist. Could there be some sort of mixup with the plist entitlements that cause issue with realm / access etc.? I am not a mac guy so have no clue but wanted to share all the thoughts we've had 🤷‍♂️

I've observed the same crash problems you've described with an incomplete plist for realm studio - the updated version is in the pull request linked above.

Are you building a universal binary, or individual binaries for different architectures? Right now universal binaries do not work properly as only one architecture of arch dependant realm binaries get included. Arm64 should work, but may run into problems with electron builder. x86_64 is working on both intel macs and arm macs via Rosetta.

4. It is happening on macos sonoma. Is it something to do with something apple has done with macos lately? we figured probably not as if that was the case you would likely be getting a lot of reports of similar issues by now.

There were recent changes related to enforcement of the notarization requirements, which can lead to those crashes if there are some errors in the signed application.

GitMurf commented 4 months ago

Are you building a universal binary, or individual binaries for different architectures? Right now universal binaries do not work properly as only one architecture of arch dependant realm binaries get included. Arm64 should work, but may run into problems with electron builder. x86_64 is working on both intel macs and arm macs via Rosetta.

@bwachter thanks for your insight! We use electron forge to handle our packaging process. I apologize my MacOS knowledge is very limited but does this help answer your question about the universal binary vs individual binaries for each arc? This is from our forge config.

new MakerDMG(
      {
        name: `${productName}`,
        format: 'ULFO',
        additionalDMGOptions: {
          window: {
            size: {
              width: 512,
              height: 344,
            },
          },
        },
      },
      ['darwin']
    ),
    new MakerZIP(
      (arch) => ({
        macUpdateManifestBaseUrl: `${AMAZON_S3_BASE}/darwin/${arch}`,
      }),
      ['darwin']
    ),
bwachter commented 4 months ago

@bwachter thanks for your insight! We use electron forge to handle our packaging process. I apologize my MacOS knowledge is very limited but does this help answer your question about the universal binary vs individual binaries for each arc?

This is just the section configuring how the generated artifacts are packed up. If you don't have any other configuration in there, and don't pass architectures on the call to electron builder you're probably building x86_64 only. When you look at the binaries the architecture should be in there - for example, for realm studio the names would be like "Realm Studio-14.1.2-universal.dmg" for universal builds.

GitMurf commented 4 months ago

@bwachter does this help? Here is what our Mac releases end up named. We use dmg for the initial install and then .zip is used for delivering auto updates through AWS.

I know in talking to the "Mac guy" on our team that our Mac app works on both m1 and intel chips and we have NOT done what is necessary to "fully optimize" for M1 macs. I assume this is related to the difference between the x84/64 vs Arm64 and gives me reasonable confidence that the answer to your question is that yes we are building x86/64 only. Would you agree?

image

Additionally when forge uploads to AWS for us for auto update delivery, this is the path the Mac files get uploaded to... further confirming the x86/64 I believe:

image

GitMurf commented 4 months ago

Assuming you agree with my assumptions from my last comment, @bwachter would you suggest the next step for us is to utilize the Realm Studio plist you have in the PR referenced above and figure out what we need to merge into ours? https://github.com/realm/realm-studio/pull/1615

For clarity the two plist files from the PR are here:

I guess my final question on this would be how these two files work together? Inherit vs the other plist. For conceptual purposes should I just consider them combined as one file? or is there significance in them being separated? We only have a single plist file which I have included below:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>com.apple.security.cs.allow-dyld-environment-variables</key>
    <true/>
    <key>com.apple.security.cs.disable-library-validation</key>
    <true/>
    <key>com.apple.security.cs.allow-jit</key>
    <true/>
    <key>com.apple.security.cs.allow-unsigned-executable-memory</key>
    <true/>
    <key>com.apple.security.cs.debugger</key>
    <true/>
    <key>com.apple.security.network.client</key>
    <true/>
    <key>com.apple.security.network.server</key>
    <true/>
    <key>com.apple.security.files.user-selected.read-only</key>
    <true/>
    <key>com.apple.security.inherit</key>
    <true/>
    <key>com.apple.security.automation.apple-events</key>
    <true/>
  </dict>
</plist>
bwachter commented 4 months ago

@GitMurf electron-builder documentation about the plists is here https://www.electron.build/configuration/mas.html

In this case you can ignore the inherit plist, though. In our case we can only pass in one plist file to the signer, so entitlements.mac.plist contains everything needed.

GitMurf commented 4 months ago

In doing a comparison, it seems that the realm studio plist has these two items that we are missing:

<key>com.apple.security.app-sandbox</key>
<false />

<key>com.apple.security.cs.disable-executable-page-protection</key>
<true />

We have the following additional items that Realm Studio does not:

<key>com.apple.security.cs.debugger</key>
<true />

<key>com.apple.security.network.client</key>
<true />

<key>com.apple.security.network.server</key>
<true />

<key>com.apple.security.files.user-selected.read-only</key>
<true />

<key>com.apple.security.inherit</key>
<true />

<key>com.apple.security.automation.apple-events</key>
<true />
kneth commented 3 months ago

@GitMurf Did it have any effect to add the two items (com.apple.security.app-sandbox and com.apple.security.cs.disable-executable-page-protection)?

GitMurf commented 3 months ago

@GitMurf Did it have any effect to add the two items (com.apple.security.app-sandbox and com.apple.security.cs.disable-executable-page-protection)?

Hi @kneth thank you for following up. Unfortunately this did not fix it BUT we do have a bit of progress. We have confirmed there is NOT a crash on macOS 12.2.1 Monterey so it seems to be specific to Sonoma 13+.

That being said, I am getting well outside my area of expertise as I am a windows guy so I am bringing my colleague @Calhistorian into the conversation who is the "Mac guy" for us ;)

GitMurf commented 3 months ago

@kneth at this point do you think it would be helpful to see the full logs and errors produced by our crash? If so, @Calhistorian can get that for you.

This article seems to describe our issue... the question is just what is causing it exactly. https://developer.apple.com/documentation/xcode/investigating-memory-access-crashes

Search for "KERN_PROTECTION_FAILURE" and you will see it matches closely to our logs I posted at the beginning of this thread. The following example in that article seem to be what our "memory tag 253" are facing as it shows invalid not writable memory just like the article.

image

image

GitMurf commented 3 months ago

@kneth one hunch that I have had from the beginning is that Electron is closing / quitting before Realm native stuff is finished closing up so then an immediate crash report happens for bad memory access because Electron quit and realm (realm logger was mentioned in the crash report) was still hanging around in the native code and likely tried to talk back to electron/js/node land but electron had already shut down.

We run realm.close() just before quitting the electron app. is there a way to have some sort of callback for the realm close that tells us when all of the realm native code has finished and closed up so that we can then wait for that callback to finish the quit/close of electron? or is there a way to await a "total close" of some sort so that we know all of realm and the native code has been closed / cleaned up?

GitMurf commented 3 months ago

Ok this thread sounds all too similar from a couple years ago! https://github.com/realm/realm-js/issues/4150

It indeed had to do with logging. Now even though we do NOT set the logger or use it, I discovered in the docs that the "info" log level is still on by default. I have pushed a change to disable the logger all together and we will see in the morning with our overnight build if this helps with our issue or not :)

A few other similar looking gh issues from the past for reference:

GitMurf commented 3 months ago

Some other potentially related issues:

kneth commented 3 months ago

a way to await a "total close"

Unfortunately we don't provide such a method. We don't see it in our tests as we have a brutal method to shut down everything (we don't care if the database is corrupted after the test).

I can imagine that the logger might cause an issue as it is shared.

disable the logger all together

I am curious to hear what you observe.

GitMurf commented 3 months ago

disable the logger all together

I am curious to hear what you observe.

Unfortunately it did not solve the problem.

An additional piece of information is that it appears as if on older macOS (pre Sonoma) it actually does produce the same issue but macOS apparently just "gracefully" ignores it as i ran our app via the command line and it gives a seg fault 11 log as soon as we close the app. My hunch is that Sonoma is just choosing to show the crash report instead of letting it be "ignored" 🤷🏻‍♂️ but I am just hypothesizing on that.

Point is that this problem has likely been around for much longer but we just never noticed it because macOS never showed the crash report to the user.

Does that make sense @kneth?

kneth commented 3 months ago

@GitMurf

Point is that this problem has likely been around for much longer but we just never noticed it because macOS never showed the crash report to the user.

It is an interesting hypothesis. What is you open Console on an older MacOS version, can you see any mention of the error?

GitMurf commented 3 months ago

It is an interesting hypothesis. What is you open Console on an older MacOS version, can you see any mention of the error?

@kneth This is all it gives (screenshot below)! Literally just this log of "Segmentation fault: 11" which is why we likely never even noticed it in the past when debugging stuff in cli 🤷‍♂️ But when I googled seg fault 11 too many related results and pointers to bad memory access came up that seems like it has to be related to the crash dump on sonoma... but I have been known to be wrong in the past so who knows ;)

In looking at some of those old gh issues I linked above, it looks like @tomduncalf was pretty involved. Have you had a chance to see if he has any learnings or conclusions from those activities?

Thanks!

image

kneth commented 3 months ago

We might have to see if we need to reintroduce the fix in https://github.com/realm/realm-js/pull/4237/files#diff-0e7960bf82824a008ff89d2ffc7ac5f11c7e223a82cf6bf6c3a4c27a5df0bcc8R244.

GitMurf commented 3 weeks ago

@kneth does this comment here mean this should be resolved as well?!! 🎉 https://github.com/realm/realm-js/issues/6609#issuecomment-2051310719

cc @Calhistorian