Open xiaomi7732 opened 7 months ago
Since the process was killed, there's little we can do during the session. However, what we can do, is to clean the trace files up in the next session, should the process started again.
The feature is built into the new beta release, no user action is needed to leverage it:
https://www.nuget.org/packages/Microsoft.ApplicationInsights.Profiler.AspNetCore/2.6.0-beta2
At the same time, it is also possible to customize the behavior:
{
"ServiceProfiler": {
"TraceScavenger": {
"InitialDelay": "00:00:00", // Delay 0 seconds, no delay effectively.
"Interval": "01:00:00", // Run trace scavenger every 1 hour.
"GracePeriod": "00:10:00" // Do not delete the file if it was accessed within 10 minuts.
}
}
Nice improvement, looking forward to it...
@esbenbach Thanks for the comment. Will you be able to try it with the beta version or will you have to wait for the stable release? We are running tests but it will greatly increase our confidence of the feature if we have your confirmation that it is working.
Not sure we are going to test it with the beta. Regardless we are not going to release before january some time. Ill look into using the beta package for that relase though.
One thing im almost certain about is that this is only half a solution. There is still something causing the profiler to get stuck in the first place, generating huge files.
But one thing at a time
On Wed, 6 Dec 2023, 19:51 Saar Shen, @.***> wrote:
@esbenbach https://github.com/esbenbach Will you be able to try it with the beta version or will you have to wait for the stable release? We are running tests but it will greatly increase our confidence of the feature if we have your confirmation that it is working.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/ApplicationInsights-Profiler-AspNetCore/issues/204#issuecomment-1843504465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKQP7WNXDBJAV4NNLGCJXDYIC5EFAVCNFSM6AAAAAA74XQ2QSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGUYDINBWGU . You are receiving this because you were mentioned.Message ID: <microsoft/ApplicationInsights-Profiler-AspNetCore/issues/204/1843504465@ github.com>
2.6.0 is released. Please check out the stable package: https://www.nuget.org/packages/Microsoft.ApplicationInsights.Profiler.AspNetCore/2.6.0
@xiami7732 We finally had the time to run this in production a bit.
The file is now removed, but it is still generated in the first place. So the issue is NOT solved, as a crash needs to occur before the file is cleaned up (and stops growing).
Secondly, all subsequent profiler writes (different processes on the same VM), will fail with "not enough disk space", even AFTER the file is removed and there is enough disk space.
@esbenbach Thanks for the confirming. There are several things that we need to touch base:
The file is generated - yes, that's by design as a cache to holding all events happened during the applications' running; I am not sure holding all those envets in the memory is a good idea. But the deal is, after the running, if the app does NOT crash, it should clean after itself. And it crashes, the file scavenger will do the clean up upon next start. Am I missing something?
A crash needs to occur before the file is cleaned up - that's not expected. The file scavenger should have been keep running. What happens if you set the running of it to more frequent:
{
"ServiceProfiler": {
"TraceScavenger": {
"InitialDelay": "00:00:00", // Delay 0 seconds, no delay effectively.
"Interval": "00:10:00", // Run trace scavenger every 10 minutes.
"GracePeriod": "00:5:00" // Do not delete the file if it was accessed within 5 minuts.
}
}
On the other hand, if there's no crash, files would have been removed automatically.
If there's enough space, it would be weird that it reports back as not enough space. Do you mind post a callstack for when the issue happens?
There's no other reports on this issue, which leads me to think, there's some special setup / environment that maybe I missed. Can you think of any?
Let me re-phrase: A file of serveral gbs is still generated, meaning something still caused it to create a 70+GB file which is insane in the first plase.
The crash part may very well be my mis-intepreting things - but the PR mentions killing the process (step 2), I am not 100% certain thats what happens...
System.IO.IOException:
at System.IO.RandomAccess+<>c.<WriteAtOffsetAsync>b__21_0 (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.AsyncOverSyncWithIoCancellation+<InvokeAsync>d__7`1.MoveNext (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.IO.Strategies.BufferedFileStreamStrategy+<WriteAsyncSlowPath>d__48.MoveNext (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.IO.Stream+<<CopyToAsync>g__Core|27_0>d.MoveNext (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.ApplicationInsights.Profiler.Core.TraceControls.DiagnosticsClientTraceControl+<StartWriteAsync>d__20.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2403.1501, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.ApplicationInsights.Profiler.Core.TraceControls.DiagnosticsClientTraceControl+<DisableAsync>d__16.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2403.1501, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.ApplicationInsights.Profiler.Core.ServiceProfilerProvider+<StopServiceProfilerAsync>d__3.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2403.1501, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.ApplicationInsights.Profiler.Core.Orchestration.OrchestratorEventPipe+<StopProfilingAsync>d__3.MoveNext (Microsoft.ServiceProfiler.EventPipe.Orchestration, Version=2.6.2403.1501, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.ServiceProfiler.Orchestration.SchedulingPolicy+<StartPolicyAsync>d__1.MoveNext (Microsoft.ServiceProfiler.Orchestration, Version=2.6.2403.1501, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
Yes, as mentioned in the Azure support case that ended up with this PR (and as described here: https://github.com/microsoft/ApplicationInsights-Profiler-AspNetCore/issues/180) we are running this on SF with the SF profiler and the asp.net profiler (since they serve different data and purposes). And as mentioned in that other thread, there is really no recommended way to work with that setup. And I suspect this is ACTUAL cause of the file going crazy in the first place. The ASP.NET Profiler is mostly disabled by the settings in application insights (Only the Sampling Trigger is turned on).
Thanks for the response. The callstack is also interesting. Based on those, I have spotted something that are diff from the expectations.
Why file that worth of gbs - that is way larger than the usual trace file
From the callstack, I see an IO exception, that is not necessary out of space exception unless checked by the error code, right? Did you do something to confirm it?
There is nothing "easy" about doing cache path adjustments in an SF cluster with both the "sdk" and the "sf" profiler running at the same time :) But it points to the same place....
There are NO containers in play here btw, only SF services.....(which in this case hosts the kestrel process)
Describe the bug RT. Basically, when the process is terminated unexpectedly, the temp profiling file will be left there.
To Reproduce
Expect temp file be deleted.
Actual The temp file is still there.