Closed z-guohui-qian closed 1 year ago
Hi @z-guohui-qian, thank you for the feedback. We'll back to you after investigating the issue.
Could you please indicate
Which version of the Hazelcast .NET client is used for this benchmark (seems to be 5.2.2)?
Version of the .NET runtime you are building for (seems to be 6.0)?
Which version of the .NET runtime is used for running the benchmark (maybe you have 7.0 installed)?
From your screenshot, I assume that the OS is Windows, correct?
We tried docker images like mcr.microsoft.com/dotnet/sdk:6.0-jammy and mcr.microsoft.com/dotnet/sdk:7.0-jammy
Update: reproduced the situation by running the process for about 15hrs, Managed memory is totally stable, but unmanaged memory grows - very slowly (from 25MB to 32MB in 15hrs) - but it grows. Investigating.
Notes: while running over a period of ~24hrs, number of threads is stable, number of process handles is stable, managed memory is rock stable as per detailed managed heaps comparisons, yet total memory (private bytes) goes up by about 5MB per 12hrs which although small... is not what we expect. Investigating.
Notes: while running over a period of ~48hrs, everything is stable but the total memory (private bytes)... when monitoring with dotMemory. Our of curiosity, I left an un-monitored instance of the process run for 24hrs. The Virtual Memory Private Bytes as well as the Physical Memory Working Set are, guess what, absolutely rock stable. As in, without instrumentation, there seem to be zero unmanaged memory leak.
This heavily suggests that the leak is caused by instrumentation. Will try to confirm the hypothesis.
Notes:
Inspecting a memory dump of the process being monitored by dotMemory:
>!exts.heap -s -h
*********************************************************************************
NT HEAP STATS BELOW
*********************************************************************************
LFH Key : 0x5749e3ae1d5d6576
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000001d54fbf0000 00000002 32944 22960 32552 795 316 6 0 b LFH
000001d54f9a0000 00008000 64 4 64 2 1 1 0 0
000001d551630000 00001002 1472 112 1080 32 7 2 0 0 LFH
000001d5515d0000 00001002 60 8 60 2 1 1 0 0
000001d551580000 00041002 60 8 60 5 1 1 0 0
-------------------------------------------------------------------------------------
>!heap -l
Searching the memory for potential unreachable busy blocks.
[...]
378102 potential unreachable blocks were detected.
Inspecting another dump of that same process, taken some hours later:
>!exts.heap -s -h
*********************************************************************************
NT HEAP STATS BELOW
*********************************************************************************
LFH Key : 0x5749e3ae1d5d6576
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000001d54fbf0000 00000002 49132 35524 48740 702 321 7 0 c LFH
000001d54f9a0000 00008000 64 4 64 2 1 1 0 0
000001d551630000 00001002 1472 112 1080 32 7 2 0 0 LFH
000001d5515d0000 00001002 60 8 60 2 1 1 0 0
000001d551580000 00041002 60 8 60 5 1 1 0 0
-------------------------------------------------------------------------------------
>!heap -l
Searching the memory for potential unreachable busy blocks.
[...]
748827 potential unreachable blocks were detected.
We see that the first heap size has grown, and the number of "potential unreachable blocks" has massively increased.
Now, if we inspect a dump of another run of the process, not monitored by dotMemory, after 24hrs of execution:
>!exts.heap -s -h
*********************************************************************************
NT HEAP STATS BELOW
*********************************************************************************
LFH Key : 0x3ab6ce2b1b9890c7
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
00000200f78f0000 00000002 8568 6500 8176 1554 173 5 0 f LFH
External fragmentation 24 % (173 free blocks)
00000200f76f0000 00008000 64 4 64 2 1 1 0 0
00000200f9350000 00001002 1472 112 1080 32 7 2 0 0 LFH
00000200f94b0000 00001002 60 8 60 2 1 1 0 0
00000200f9630000 00041002 60 8 60 5 1 1 0 0
-------------------------------------------------------------------------------------
>!heap -l
Searching the memory for potential unreachable busy blocks.
Heap 00000200f78f0000
Heap 00000200f76f0000
Heap 00000200f9350000
Heap 00000200f94b0000
Heap 00000200f9630000
Scanning VM ...
Scanning references from 18183 busy blocks (3 MBytes) ......
Entry User Heap Segment Size PrevSize Unused Flags
-------------------------------------------------------------------------------------------------------------
00000200f78f9ee0 00000200f78f9ef0 00000200f78f0000 00000200f78f0000 20 50 8 busy
[...]
00000200fbe2aaa0 00000200fbe2aab0 00000200f78f0000 00000200f9cc5200 20 - 8 LFH;busy
00000200fbe2ab40 00000200fbe2ab50 00000200f78f0000 00000200f9cc5200 20 - 8 LFH;busy
1553 potential unreachable blocks were detected.
The first heap size is way smaller, as well as the number of "potential unreachable blocks" (which is several orders of magnitude smaller). Memory usages as reported by a tool such as SysInternals' Process Explorer show stable values for VM Private Bytes and PM Working Set.
At that point we start dotMemory to monitor that very process.
Approximatively 30 minutes later, Process Explorer shows that VM Private Bytes has raised by about 3MB and PM Working Set has noticeably increased too. The first heap size hasn't changed yet, but the number of "potential unreachable blocks" has already jumped to 8230. Another 30 minutes, that number is 13861 and the heap is raising.
About 12 hours later, the first heap size has reached 32MB and reports 267433 blocks.
At that point we detach dotMemory from the process. WinDBG reports that "the CLR has fully detached and unloaded the profiler". A few hours later, the first heap size has not grown, and it reports 267616 blocks. About 12 hours later, the first heap size has not changed and it reports 267714 blocks.
At that point everything seems to indicate that the leak is caused by dotMemory itself. Going to run a few more analysis before closing this issue.
Internal Jira issue: API-2066
Figuring out what exactly in dotMemory is leaking is out of the scope of this issue and highly time-consuming. I am going to close this issue as non-repro on the Hazelcast .NET Client side, as it cannot be reproduced when not monitoring with dotMemory.
Please feel free to reopen the issue with further details (context, etc). in case you feel like it.
Thanks for your investigation. Between dotMemory and Hazecaslt .NET Client maybe some conflict inside. We tried monitoring our program with no hazelcast r/w and the dotmemory monitoring worked fine and no leaks happen. We will try some other profilers.
This issue is happening when we use massive hazelcast lock/get/set/unlock/publish action.
As you can see in the screenshot of dotmemory, the heap size of the application is stable, but unmanaged memory size keep growing.
The sample project for this issue. MassiveTransfer.zip