Closed CodedBeard closed 2 years ago
@CodedBeard, thank you for filing the issue. I am sorry to hear that Profiler's not working for you.
The packages you referenced should work with .NET 6. That said, there are various reasons to not having any profiling trace. I'd like to know more for troubleshooting.
Just for some quick check:
Firstly, during the profiling session, is there traffic to your application? No traffic no profile :-)
Secondly, do you have a writable folder on your container for uploader to work? We assume /tmp
to be writable but there had been exceptions and we need special settings for those read-only environments.
If those don't hit, we will need logs to tell us what is going on.
Could you please turn on logs for category of Microsoft.ApplicationInsights.Profiler
and see what is missing?
Just in case, one quick way to do it is, in appsettings.json
, you add/append:
{
...
"Logging": {
"LogLevel": {
"Default": "...", // You might have some other settings
"Microsoft.ApplicationInsights.Profiler": "Debug" // Turn on debug logs for Profiler
}
}
...
}
Refer to Configurations for more details on configuration. I would suggest you start with Debug
and if necessary, use Trace
.
You should see information like Profiler started, activity captured, and when the session is ready for uploading, how is uploader located and more. See if there's error or warnings.
If you want, post the logs and I can help take a look as well.
Thanks for the tips @xiaomi7732.
Yes, there is definetly traffic hitting the system, as we are in the process of testing a large new release :)
I've got to go to a conference today, but will give the other suggestions a try when im back.
Please let us know.
@xiaomi7732 I've confirmed that /tmp is writtable within the containers.
After enabling the Debug logging level, we've found the following exception being thrown:
{"EventId":0,"LogLevel":"Debug","Category":"Microsoft.ApplicationInsights.Profiler.Core.Orchestration.RemoteProfilerSettingsService","Message":"Unexpected error contacting service profiler service endpoint for settings. Details: System.ArgumentNullException: Value cannot be null. (Parameter \u0027command\u0027)\n at Microsoft.ServiceProfiler.DataContract.Settings.SettingsParser.ParseAgentSettings(String command, AgentSettings baseSettings)\n at Microsoft.ServiceProfiler.DataContract.Settings.SettingsParser.ParseManyAgentSettings(String[] commands, AgentSettings baseSettings)\n at Microsoft.ApplicationInsights.Profiler.Core.Contracts.ProfilerSettings.SetFromSettingsContract(SettingsContract settingsContract)\n at Microsoft.ApplicationInsights.Profiler.Core.Orchestration.RemoteProfilerSettingsService.FetchRemoteSettingsAsync(CancellationToken cancellationToken)","State":{"Message":"Unexpected error contacting service profiler service endpoint for settings. Details: System.ArgumentNullException: Value cannot be null. (Parameter \u0027command\u0027)\n at Microsoft.ServiceProfiler.DataContract.Settings.SettingsParser.ParseAgentSettings(String command, AgentSettings baseSettings)\n at Microsoft.ServiceProfiler.DataContract.Settings.SettingsParser.ParseManyAgentSettings(String[] commands, AgentSettings baseSettings)\n at Microsoft.ApplicationInsights.Profiler.Core.Contracts.ProfilerSettings.SetFromSettingsContract(SettingsContract settingsContract)\n at Microsoft.ApplicationInsights.Profiler.Core.Orchestration.RemoteProfilerSettingsService.FetchRemoteSettingsAsync(CancellationToken cancellationToken)","0":"System.ArgumentNullException: Value cannot be null. (Parameter \u0027command\u0027)\n at Microsoft.ServiceProfiler.DataContract.Settings.SettingsParser.ParseAgentSettings(String command, AgentSettings baseSettings)\n at Microsoft.ServiceProfiler.DataContract.Settings.SettingsParser.ParseManyAgentSettings(String[] commands, AgentSettings baseSettings)\n at Microsoft.ApplicationInsights.Profiler.Core.Contracts.ProfilerSettings.SetFromSettingsContract(SettingsContract settingsContract)\n at Microsoft.ApplicationInsights.Profiler.Core.Orchestration.RemoteProfilerSettingsService.FetchRemoteSettingsAsync(CancellationToken cancellationToken)","{OriginalFormat}":"Unexpected error contacting service profiler service endpoint for settings. Details: {0}"}}
Hey @CodedBeard Thanks for the info.
Based on what provided, I have 2 points that I want to share with you:
This is the code that not checking null well, worth a fix.
// Any line to be null in this code could leading to ArgumentNullException for the parser
string[] settingsToParse = {
settingsContract.CollectionPlan,
settingsContract.CpuTriggerConfiguration,
settingsContract.MemoryTriggerConfiguration,
settingsContract.DefaultConfiguration
};
AgentSettings parsedSettings = SettingsParser.Instance.ParseManyAgentSettings(settingsToParse);
This is why it is optional for successful profiling:
#pragma warning disable CA1031 // Only to allow for getting the value for the next iteration. The exception will be logged.
catch (Exception ex)
#pragma warning restore CA1031 // Only to allow for getting the value for the next iteration. The exception will be logged.
{
// Move on for the next iteration.
_logger.LogDebug("Unexpected error contacting service profiler service endpoint for settings. Details: {0}", ex);
}
Thanks @xiaomi7732.
I've sent you a mail with the Ikey and the log buffer from the pod.
One thing I've noticed, is it appears that error has now stopped, and the RandomSchedulingPolicy
seems to have started firing and uploading results. But attempts to start a profiling session from the portal are still resulting in a timeout error.
I've just tried another Profile Now
run, and this time it worked. A little confused as we've not changed anything further, so it seems like something was going wrong on the service side related to the original exception.
Hey @CodedBeard Thanks for the updating and the logs.
Unfortunately there's no timestamp on the log so I can't tell when. But here's what I observed:
The exception we mentioned in the last conversation was resolved about half way of the logs (line 3397) - combing it with our backend data, I assume you or someone owns your application insights resource at that moment somehow touched the settings of the profiler on the portal, that's when the buggy code path got bypassed.
I was thinking the remote settings shouldn't impact the execution of the profiler, however, I might oversighted the fact that bad settings (by exception) might trick the agent to think that OnDemand policy or other policy got disabled. We will need to investigate more about it.
At the same time, I am leaning toward changing the Profiler settings, thus generate the settings instead of using default, is a work around, just in case anyone encountered the same:
I'd keep this issue open until we reach the bottom of the investigation.
@xiaomi7732 Ah, now you mention it, I did try lowering the memory threshold to below what I knew the pod was using at some point to see if that would cause it to work, but as far as I recall, it was a good hour or so before the first trace fired which is why I completely forgot about it.
That was certainly the first and only time I changed the Trigger
settings (and I'm the only one who has access to do so), so I think the hypothesis that doing so fixed the settings error makes sense.
@CodedBeard Thanks for the quick turn-around. I think we have enough information at this point to find out the root cause. Thank you so much for providing the information.
No problem @xiaomi7732. Just to add some further confirmation, I've now replicated this in another branch of the same project.
Profile now
and receieve a timeout error.Profile now
still seems to be timing out right now, but maybe thats due to the profiler kicking off for the memory threshold.Sorry, one thing that I forgot to check with you:
I understand without the workaround, it appears that Profile Now
is timing out; However - can you see other profiling sessions show up in the portal listed as Recent profiling sessions
? And if there's any, mind telling us what are the values for Triggers
?
That info's gonna to help us narrow down the scope - whether it is globally not working or is it just one or two specific policy.
For the branch I just applied it to, no, there are no entries yet.
The original branch fired twice on the RandomSchedulingPolicy
with memory values of 92% and 89% before the Profile now
first succedded.
Thank you for the confirm!
Hey @CodedBeard, while @JacobBovee is actively working on fixing the bug, we are looking forward to hear back from you about the investigation on the other branch, if you are willing to share. And we will let you know when there's a newer package that you could try on for the bug fix.
Hi @xiaomi7732, sorry for the delay, I've been super busy at work.
Due to being focused on other projects i hadn't checked the other branch in a while, but i've just looked this morning and can see the profiler has since been firing correctly based on cpu/memory usage.
@CodedBeard, No worries. I am glad to hear back from you :-) Actually, we released 2.4.0 with the bug fix and you don't need the workaround any further. Feel free to give it a shoot: https://www.nuget.org/packages/Microsoft.ApplicationInsights.Profiler.AspNetCore/
@xiaomi7732 Great, I'll be deploying the updated package to prod this evening, will let you know if it all works.
@xiaomi7732 Have now deployed 2.4 package to production and everything worked first time without having to change the settings, so looks like this can be closed now thanks.
Thank you @CodedBeard. I am glad to hear that. Thanks for the prompt replies. With your help, Profiler is better now :-D
We've been trying to get the profiler to work for our aspnet core app running in AKS. While forcing a profiling session from the portal appears to correctly start the tracing, the results are never uploaded to the portal, and no errors are recorded anywhere that we can find.
If we exec into the container, we can see there is a
TraceUpload.zip
file being created in the container, but there are no logs to suggest why the upload keeps timing out.Versions Used:
We also tried using the 2.3.1 version of the package, but that had the same issue.