pleriche / FastMM4

A memory manager for Delphi and C++ Builder with powerful debugging facilities
443 stars 162 forks source link

Odd behavior when FullDebugMode is used #49

Open obones opened 6 years ago

obones commented 6 years ago

Hello,

We've been using FastMM to track down memory leaks with great success and it always showed issues in our code. Today, we are running in a situation where we get an access violation in FreeMem which origin we cannot explain. That crash only occurs in "FullDebugMode" and is shown by the IDE but it appears to be trapped somewhere as it does not prevent the application from continuing just fine. We suspect our architecture to be the origin of the issue, which is why I'm describing it now: The application is built (x86) using Delphi Seattle. It loads a DLL via LoadLibrary and calls a method to tell the DLL about existing callbacks inside our application. Then later on, the application calls a method in the DLL that does its work and calls our callback. If we do any memory allocation in that callback, we get an access violation later on while freeing an unrelated TObjectList. If we make sure no memory allocation occurs during the callback, then we do not get the access violation. All these calls are done inside the same thread, which is not the main one. Any DLL method and the callbacks use the register convention.

To sum up, here is the code lifetime Application Start Thread creation Thread code loads DLL Thread code tells DLL about callbacks Thread prepares work, creates a TObjectList Thread calls DLL method DLL method does its work, calls one of our callback Callback runs Callback does getmem(P, 1) Callback exits DLL method exists Thread continues its execution (may call the same DLL method again) Thread finalizes, frees the created TObjectList Access violation is raised in the TObject.Destroy / FreeInstance call

If we remove the getmem(P, 1) call, we don't get the AV. If we replace that call with a refcounted memory object (assign a string, set TBytes length...), we get the AV

The initial code was setting a string to a new value, but we got down to a simple getmem(P, 1) call to trigger the AV.

As the DLL is compiled with FPC, we initially suspected a memory manager replacement, but the DLL lives in its own world, no sharemem is involved whatsoever. And stepping through the assembly code, we do end up in FastMM code itself which the DLL does not include in any way. If we run without tracking memory leaks, we don't get the AV either.

Here are the options for our FullDebugMode setup:

{$define FullDebugMode}
{$define EnableMemoryLeakReporting}
{$define NoMessageBoxes}
{$define LoadDebugDLLDynamically}
{$define DoNotInstallIfDLLMissing}
{$undef RequireDebuggerPresenceForLeakReporting}

I'm quite sure this comes from our own code, but for the life of me, I can't figure out what we have done wrong. Any help would be greatly appreciated.

pleriche commented 6 years ago

Hi,

If the A/V is handled then it is most likely the stack tracing code, and is nothing to worry about. The stack tracing code maintains a map of memory address ranges that are most likely backed by code. This speeds up the stack tracing process. Calling VirtualQuery for every address it has to check would have murdered performance, and in a multithreaded environment it would not have been 100% safe anyway. Sometimes the map goes stale without the stack tracer knowing about it - like when a DLL is unloaded. The next time an address in that range is read to see whether it is an actual call site an A/V will be raised. This A/V is handled and the map is invalidated, so the stack tracer knows to call VirtualQuery the next time an address in that range must be checked.

If you disable "raw" stack traces you should not see this kind of A/V often, but then you typically give up some detail in your stack traces.

Hope this clears it up.

Best regards, Pierre

obones commented 6 years ago

Thanks for the explanation, it makes a lot of sense when trying to get the stack trace for a memory that was allocated with call locations inside a DLL that is no longer there. However I'm having an issue linking this to my case, because this is what is happening:

TObjectList is created TObject items are added to list DLL is loaded DLL call is made, calls callback Callback allocates memory DLL is unloaded TObjectList is freed TObject contained in the list are freed AV occurs when performing rawstacktrace walk

To me, it is strange that there would be a call site from the DLL inside the callstack when the objects are freed. What is even stranger is that this call site is present if, and only if, the callback allocates memory. The explanation would be for my own culture, it does not need to be precise.

And reading the comment you made on the linked issue, I'll experiment with stack frames as it seems it would allow having detailed stack traces while avoiding the AV when run with the debugger.