microsoft / ApplicationInsights-dotnet-server

Microsoft Application Insights for .NET Web Applications
https://azure.microsoft.com/services/application-insights/
133 stars 67 forks source link

Possible memory leak with ConditionalWeakTable wrapping SQL dependency #1187

Closed AshasTob closed 5 years ago

AshasTob commented 5 years ago

Dear AppInsights team, I am not sure if this is issue with app insights. Please help investigating. We have encountered a memory problem after running load test for over 20hours. Solution slowly but steadily increased memory usage until rebooted. (see screenshot) memory_leak_under_load

Load test was bombarding our API endpoints with simple GET request and the only dependency that was used is Azure SQL server at rate of 40 requests per second.

Repro Steps

  1. Clone https://github.com/AshasTob/MemoryLeakRepro - this is a minimum solution that reproducec the issue
  2. Change AI key to something meaningful in Startup.cs
  3. Change DB connection in Controller and set some SELECT query to be executed.
  4. Use some tool to send requests to the app at some meaningful rate, at least 50/sec
  5. Attach some memory inspector (I used dotMemory)

Actual Behavior

See attached screenshot: GC2 and Unmanaged memory constantly slowly grow 25m test with app insights

Expected Behavior

Unmanaged memory not grow

Version Info

SDK Version : Application Insights AspNetCore 2.6.1 .NET Version : aspnet core 2.2
Hosting Info (IIS/Azure WebApps/ etc) : Windows 10 Kestrel/Azure webapp

cijothomas commented 5 years ago

@AshasTob Thanks for reporting the issue and sharing the repro app. We'll investigate this.

cijothomas commented 5 years ago

Have setup a long running test to get repo.

AshasTob commented 5 years ago

I was able to reproduce the issue on the Azure S3 webApp with 2 instances. Under load of 200requests per second, Memory and CPU steadily grow up until service throws a couple of OutOfMemoryExceptions and restart. But now I am not sure if it is .net core issue OR ApplicationInsights. I am kinda lost at this point.

cijothomas commented 5 years ago

@AshasTob I didnt repro with 5 hour run.. I will run for more hours.. Can you desribe load again? 200 RPS served by 2 instances, so 100 RPS for a single instance. And you have 10 SQL calls per incoming request?

AshasTob commented 5 years ago

With 100RPS per instance, where each request does 3 quick SQL queries(10 was just to try make it escalate quicker locally). Please note that Application Insights site extension on Azure is ON with profiler ON.

You can see memory grow up to 50% already in 2 hours in my case...

cijothomas commented 5 years ago

Please share results of: https://yoursitename.scm.azurewebsites.net/applicationinsights

If possible disable extension (profiler, snapshotdebugger etc..), and just install SDK to isolate issue.

AshasTob commented 5 years ago

AI_data I have now launched test without AI extension. Lets see how it goes.

I will tomorrow update a test of a repository, as it does not fully correspond to what code I have in Azure right now, so if you can't reproduce than maybe is something different and I am missing the point here completely.

cijothomas commented 5 years ago

@AshasTob can you update the repository with correct code which reproes.

AshasTob commented 5 years ago

I am still working on investigating this issue. I have figured out that if I remove site extension on web app OR if I remove ApplicstionInsights from code - memory is stable. Sorry for not uploading updated code base. I will inform you when I have it on github, I need to rip it from our production solution that I cant share. Please don't close this issue.

AshasTob commented 5 years ago

I have updated the GitHub repo. Please add db connection string, AI key and change in controller table/column name. Please notice I am reproducing it now at webapp at Azure with sending 110 requests per second.

Here's what I see at Azure metrics: память

please note that very same app but without APP insights is hanging around 300-400MB of memory

Please help me understand this behavior

pharring commented 5 years ago

Could be https://github.com/microsoft/ApplicationInsights-dotnet/issues/1102 The fix (https://github.com/dotnet/coreclr/pull/23661) has been back-ported to .NET Core 2.2, but probably hasn't got to Azure App Services yet.

cijothomas commented 5 years ago

@pharring is there any way to confirm if this is same issue? The link you shared is 404 for me!

pharring commented 5 years ago

[I don't know what happened to that issue. I can't access it either. It's as if it was deleted (by the owner?) It was back-linked from the CLR issue too and that's also "broken".]

The way to check is to get a memory dump of the process when memory usage is high. Open it in windbg and use it to view unmanaged heap. The largest heap (probably the process heap) will consist of repeated strings that are actually the ETW manifest for the RichPayloadEventSource. I had more details in the now-missing issue (1102 in the base SDK repo).

AshasTob commented 5 years ago

I have investigated this issue in details. How to reproduce it 100%. Upload my app to Azure WebApp, go to app telemetry configuration, turn on 'site extension' and in turn on Application Insights profiling. Under requests bombarding - you will have a memory leak. Please try&reproduce it.

I have SOLVED this case for our project by redeploying WebApp and not turning ON the Application insights profiling.

cijothomas commented 5 years ago

@AshasTob can you share the screenshot from your WebApp ApplicationInsights tab. I assume that this is only caused when Profiler is also turned on.

image

AshasTob commented 5 years ago

@cijothomas , I suggest to just turn everything on to just reproduce first. But in theory, yes just having Profiler should be enough. Sorry I cant provide screenshot as I have that thing turned OFF completely on all environments now :) Application insights still work - as I have instrumentation_key provided via config in code itself.

cijothomas commented 5 years ago

@AshasTob since you are using the SDK in your project via nuget and facing no issues, the issue is related to Profiler component and not the SDK. If this is same issue paul mentioned, then it's being fixed and ported to 2.2 as well. So once AppService gets the new 2..2 version it should be solved.

No further action is needed on Application Insights SDK itself.

cijothomas commented 5 years ago

To summarize the current findings: This issue occurs when Profiler is turned on. Workaround is to turn off Profiler, until the fix is available. Other features of application insights can be used.

Thanks @pharring :) https://github.com/microsoft/ApplicationInsights-dotnet/issues/1102#issuecomment-477990559