pleriche / FastMM4

A memory manager for Delphi and C++ Builder with powerful debugging facilities
446 stars 157 forks source link

Exception on program termination in RemoveMediumFreeBlock #80

Open rosch100 opened 5 years ago

rosch100 commented 5 years ago

When my program terminates, I get an Access Violation exception in RemoveMediumFreeBlock. The exception occurs because LBinGroupNumber is invalid.

Not sure, how to further debug the issue.

zunzster commented 5 years ago

It sounds like something is corrupting the heap and you're seeing the symptoms after the fact, as is commonly the case in unsafe languages like Delphi, C++, etc. This is the sort of stuff that makes Rust look promising :-)

When I'm trying to track down when and where the heap is getting corrupted (commonly due to a use-after-free or buffer-overflow somewhere), I try running in FullDebugMode if I'm not already. If you've not used it before, FullDebugMode adds code paths that check much of the heap state on each allocation and free. Your app will run a bit slower with all the checking and it doesn't actually free all memory until the end so your app memory use will build up. Not usually a problem on modern machines but I thought I'd mention it in case your in a constrained memory or CPU situation which will make running in this mode trickier.

If the failure isn't happening every time, the first job is to try to find a sequence of actions that does make it reliably A/V. You can still use the below technique if it's not 100% reliable but you have to repeat the steps more times to get evidence you can rely on to move forward.

Once I'm in FullDebugMode, I can start calling ScanMemoryPoolForCorruption at various places to try and narrow in on where the corruption occurs. Looking at the stack trace where the A/V exception currently occurs usually gives me the point where the corruption is first noticed. Then working backwards, I start putting in some ScanMemoryPoolForCorruption to establish an earlier point where the heap state is still OK. This gives me a 'known good' point and a 'known bad' point.

Then I take the span of code between those two points and I aim to find a point that roughly splits the code in half and add another ScanMemoryPoolForCorruption there. If it fails the scan at that point, I know the corruption is in the code between the two scans, otherwise I know the code is between the second scan and the A/V point. I then split that code and add another scan narrowing ever tighter.

This is effectively a form of binary search or bisection through code akin to divide-and-conquer quicksort. A few iterations of this and I'll have narrowed things down to a code block I can just read through and I'll usually spot the problematic code where I'm using a freed object or copying a block of memory and used the wrong size or the classic swapping the size and char parameters to FillChar :-)

Then I usually marvel at how the code didn't screw up more often - but shutdown code is often on the way out and you can often you get away with some use-after-free's without A/V hiding the bug until you refactor some other code such that it now matters. FullDebugMode will helpfully overwrite freed memory with a known pattern to catch those cases sooner.

I hope that's all helpful and not too overwhelming.