kstenerud / KSCrash

The Ultimate iOS Crash Reporter
MIT License
4.25k stars 712 forks source link

Memory termination #476

Closed naftaly closed 5 months ago

naftaly commented 5 months ago

Adding OOM support to KSCrash.

In a gist, this works by writing an OOM report to disk outside of the report area for later modification. We also keep track of memory information through a memory mapped file for crash resistance, we write to it anytime memory limit or pressure changes. On termination, that data is basically a breadcrumb we can use on cold start.

On cold start, we look for that mapped file, if memory pressure is >= critical or memory level is >= critical it means we were terminated due to memory. At this point we'll read in the OOM report we write to disk on the previous app launch, modify it to fit our needs with latest information and move it to the report folder where the rest of the system will pick it up and send it in.

One thing that would make everything better is to have new repositories for each cold start. it can make handling files a lot more robust.

GLinnik21 commented 5 months ago

I'm really impressed by the clever way you've used memory-mapped files! It's a great solution. I just wanted to check in about something. How can we be sure that data cached in memory is going to be written to disc by the kernel in case of a crash?

naftaly commented 5 months ago

I'm really impressed by the clever way you've used memory-mapped files! It's a great solution. I just wanted to check in about something. How can we be sure that data cached in memory is going to be written to disc by the kernel in case of a crash?

The way I understand it is that mmap is handled in kernel space and not user space, which leads to file based mmap to basically be a “kernel cache” that is always dumped to disk based on the mapping sync functions, and a process being terminated is one of those instances where the kernel will explicitly call those sync functions, which leads to crash resilience.

naftaly commented 5 months ago

@GLinnik21 Can you reopen the PR so I can change the base to main and we can continue work on it?

GLinnik21 commented 5 months ago

Oh sorry. It got automatically closed because I merged and deleted release-2.0 to master. I can try to restore release-2.0 or you can try changing the target branch to kstenerud:master. What works better for you?

image
naftaly commented 5 months ago

I wasn't able to change the base for some reason, likely because the PR is closed.

Screenshot 2024-05-17 at 7 40 22 PM
naftaly commented 5 months ago

Oh, there we go :) thank you.

GLinnik21 commented 5 months ago

Could you add some tests to cover the new functionality where possible? It would help maintain the quality and stability of the project. Thanks!

naftaly commented 5 months ago

Could you add some tests to cover the new functionality where possible? It would help maintain the quality and stability of the project. Thanks!

I've added a few tests. If there's anything in particular you'd like to be tested feel free to point it out, I'm happy to add more.

naftaly commented 5 months ago

Any clues what’s going on with this check, I don’t see what’s wrong.

GLinnik21 commented 5 months ago

Any clues what’s going on with this check, I don’t see what’s wrong.

Let's try to rerun for now.

GLinnik21 commented 5 months ago

I wanted to share a thought on organizing our code. When we put multiple class implementations in a single file, especially in both implementation and header files, it can make the code harder to read and maintain. Keeping everything in separate files can really simplify code reading and understanding. Some of our files with multiple implementations have grown to several hundred lines of code, which can be overwhelming. What do you think about splitting them up to make things more manageable?

naftaly commented 5 months ago

I wanted to share a thought on organizing our code. When we put multiple class implementations in a single file, especially in both implementation and header files, it can make the code harder to read and maintain. Keeping everything in separate files can really simplify code reading and understanding. Some of our files with multiple implementations have grown to several hundred lines of code, which can be overwhelming. What do you think about splitting them up to make things more manageable?

Sounds good, which ones are you thinking of?

GLinnik21 commented 5 months ago

Sounds good, which ones are you thinking of?

I had this in mind when I was reviewing KSCrashAppMemory, but if it applies to any other new files you've created, I'd suggest splitting those up as well.

naftaly commented 5 months ago

https://github.com/kstenerud/KSCrash/pull/476/commits/b6f5418c01fef8260d5d8ef013664462d1c63b85 splits app memory and app memory tracker. That should help a bit.

There are a few more items that are unresolved that need your attention, could you take a look when you have a chance?

GLinnik21 commented 5 months ago

I've added a few tests. If there's anything in particular you'd like to be tested feel free to point it out, I'm happy to add more.

Thanks for adding the tests! I'd appreciate it if you could cover all the new code with tests as much as possible. Let me know if you need any specifics!

naftaly commented 5 months ago

I think I'm content with the result. Again, great work! However, I would really love for @bamx23 to review this PR as well.

Thanks! :)

I would also love it if @bamx23 could review.

GLinnik21 commented 5 months ago

Also, do you have any kind of sample to test OOMs?

naftaly commented 5 months ago

Also, do you have any kind of sample to test OOMs?

I have a SwiftUI view modifier that shows a sheet with lots of options in it, one being to create an OOM.

Screenshot 2024-05-21 at 3 30 25 PM

Where do you add sample code? I can address that a bit later today.

GLinnik21 commented 5 months ago

Any clues what’s going on with this check, I don’t see what’s wrong.

Sometimes they get canceled by timeout. I'm trying to address it here #487

GLinnik21 commented 5 months ago

Where do you add sample code? I can address that a bit later today.

Actually, Karl has been using the CrashProbe library, which is really out of date and archived. If you have a good replacement for it in your sample code, it might be a good opportunity to make a change.

naftaly commented 5 months ago

Thanks for the great feedback @bamx23. I've addressed most of it and left a few comments for discussion as well. If I missed anything please ping me.

naftaly commented 5 months ago

Sorry about that last commit message, autocomplete failure :) Should have been "Addressed Feedback".

Screenshot 2024-05-22 at 12 05 16 PM
naftaly commented 5 months ago

Thanks a lot for addressing and responding to all of the comments. Apart from several nits and the OOM-reporting opt-out flag, I'm okay to merge this.

Let me or @GLinnik21 know if you want to follow up in this PR or ready to merge.

I've cleaned up a few last things and addressed the rest of the feedback. You folks are awesome and made this way better than it was when it started, thanks for spending time on it.

Please give it another once over and then I'm ok to merge. I'm happy to follow up on anything if needed as well.

naftaly commented 5 months ago

@bamx23 @GLinnik21 I addressed the last few pieces of feedback. I found a simple way to enable/disable reporting OOMs but keep the data flowing and added to other types of reports.

I also wanted to let both of you know I wrote a small piece about what we did here. https://medium.com/@alexandercohen/reducing-memory-terminations-in-ios-apps-3e76797ca5bd

naftaly commented 5 months ago

Could you folks rerun the actions, they seem to have failed due to a timeout.

GLinnik21 commented 5 months ago

@GLinnik21 do you have anything left to discuss here or are you good to merge?

As always, the more tests, the better. If other code is too low-level to test, I'm fine with that.

GLinnik21 commented 5 months ago

I also wanted to let both of you know I wrote a small piece about what we did here. https://medium.com/@alexandercohen/reducing-memory-terminations-in-ios-apps-3e76797ca5bd

Wow! I read the whole article and found it very insightful. Thank you for the shoutout!