planetary-social / nos

nos.social social media for all of us, using nostr
https://nos.social
Mozilla Public License 2.0
120 stars 14 forks source link

[Tech Debt] Enable out-of-memory error tracking in Sentry #539

Closed mplorentz closed 2 months ago

mplorentz commented 1 year ago

Sentry has a feature called WatchdogTerminations that tries to guess when your app has been killed by the system (most often this happens because the device ran out of memory). I had to disable it while working on our performance tests because it was logging a watchdog termination after every test run and was messing up our crash free session rate.

We should figure out how to tell Sentry that performance tests finishing are not watchdog terminations and un-ignore this issue type in the Sentry web UI.

mplorentz commented 9 months ago

I re-enabled the tracking but our performance tests are still causing false positives in Sentry and messing up our crash free percentage. Let's fix that.

martindsq commented 3 months ago

We should figure out how to tell Sentry that performance tests finishing are not watchdog terminations

@mplorentz I think Sentry should be not receiving events at all when testing/debugging, right? So, this ticket should 1) enable WatchdogTerminations, 2) make sure crash reports are not sent from tests.

mplorentz commented 3 months ago

@martindsq the performance tests actually run in Release configuration so they get all the compiler optimizations. And Sentry tracking is turned on during release builds.

I think step 2) will need to include settings a custom environment variable when running the performance tests that the app detects and turns off Sentry.

mplorentz commented 3 months ago

@martindsq shower thought: maybe now that we have Nos Dev with its own bundle ID the performance tests will report the out of memory crashes under the dev environment in Sentry, making them easy to filter out. If that's the case maybe we can skip the environment variable stuff.

mplorentz commented 3 months ago

I went to turn this on in Sentry and found that it was already on. This made me sad because we are often getting crash reports that don't show up in Sentry, and I thought turning on their watchdog termination tracking would fix that.

So I read the docs for Sentry Watchdog Terminations to make sure that we have them configured correctly and we do. However I also found this interesting tidbit

If the app is terminated because it hangs, we don't create a watchdog termination event, but instead an AppHangs event is created.

This seems to mean that Core Data thread deadlocks, like the one I fixed in https://github.com/planetary-social/nos/pull/1266, will not show up as unhandled execptions/crashes in Sentry but instead will show as hangs. I spent some time trying to figure out if I could filter the hangs (which we have tons of) down to hangs that resulted in termination but it doesn't seem like Sentry has that data.

I did look through the Sentry SDK release notes and saw some bug fixes related to hangs so I updated our version of the library in #1279.

Unfortunately I think we are doing all we can with Sentry and we'll just need to remain vigilant in checking the TestFlight crash reports and investigating word-of-mouth crash reports to make sure we aren't missing crashes that aren't tracked by Sentry.