microsoft / ApplicationInsights-Profiler-AspNetCore

Application Insights Profiler sample and documentation
MIT License
66 stars 24 forks source link

File not found when uploading trace in AspNetCore 3.1 service in Service Fabric cluster #83

Closed mkleijn closed 3 years ago

mkleijn commented 4 years ago

I tried to profile my service by including this nuget and use the environment variable to set the hosting startup assembly. It looks like the profiling gets triggered, but it goes wrong when uploading the trace logs:

System.ComponentModel.Win32Exception (2): The system cannot find the file specified. at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo) at System.Diagnostics.Process.Start() at System.Diagnostics.Process.Start(ProcessStartInfo startInfo) at System.Diagnostics.Process.Start(String fileName, String arguments) at ServiceProfiler.EventPipe.Client.UploaderCallers.OutOfProcCaller.Execute() at ServiceProfiler.EventPipe.Client.TraceUploader.TraceUploaderProxy.<>c__DisplayClass2_0.b__0() at System.Threading.Tasks.Task`1.InnerInvoke() at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location where exception was thrown --- at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location where exception was thrown --- at ServiceProfiler.EventPipe.Client.TraceUploader.TraceUploaderProxy.UploadAsync(DateTimeOffset sessionId, String traceFilePath, String metadataFilePath, Guid dataCube) at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider.UploadTraceAsync(PostStopOptions options, Int32 processId) at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider.PostStopProcessAsync(PostStopOptions e)

In the AppInsights i also see messages like "Trace validation failed. Upload aborted and no customer events released." and "AI: Process was called on the TelemetrySink after it was disposed, the telemetry data was dropped." but i'm not sure the latter one is related to this problem.

Is using this nuget in a Service Fabric cluster supported?

xiaomi7732 commented 4 years ago

@mkleijn Thanks for the report. The profiler hasn't been tested within Service Fabric. Have you tried the traditional App Insights Profiler?

That said, we are interested to understand why it is not working for you. Could you please lay more info out for us to understand the environment:

mkleijn commented 4 years ago

@xiaomi7732 Thanks for your reply. I tried the traditional one but never managed to get any results with it back when i used .NET Core 2.x. By now the project i'm working on has been migrated to .NET Core 3.1 and as I understood currently the only way of profiling these apps is by using this nuget, correct?

My environment:

I noticed the folder "ServiceProfiler" with the file TraceUpload.zip is added to my project (i guess this is done by the nuget?) and this file is actually present (and extracted) in the code package on the Service fabric nodes. Still it looks like a utility file that is attempted to be started could not be found. Maybe it has to do with the directory structures of service fabric services being a bit different than 'normal'?

xiaomi7732 commented 4 years ago

@mkleijn Hey, thanks for the detailed info. It is great that you have access to the SF (Service Fabric) node. Here's the code to search for uploader, do you think it will find the uploader properly?

using System.IO;
using System.IO.Compression;
using Microsoft.Extensions.Logging;
using ServiceProfiler.EventPipe.Client.Utilities;

namespace ServiceProfiler.EventPipe.Client.UploaderCallers
{
    public class UploaderLocator : IUploaderLocator
    {
        public UploaderLocator(ILogger<UploaderLocator> logger)
        {
            _logger = logger.ThrowIfNull(nameof(logger));
        }

        public string Locate(string expectedUploaderPath)
        {
            _logger.LogDebug("Locating file path with input: {0}", expectedUploaderPath);
            if (File.Exists(expectedUploaderPath))
            {
                _logger.LogDebug("File Uploader found: {0}", expectedUploaderPath);
                return expectedUploaderPath;
            }
            string zipped = Path.Combine(Path.GetDirectoryName(expectedUploaderPath), TraceUploaderArchiveFileName);
            _logger.LogDebug("File uploader is not found. Try the zip file: {0}", zipped);
            if (File.Exists(zipped))
            {
                // Unzip
                _logger.LogDebug("Zip file exist. Unzipping . . .");
                ZipFile.ExtractToDirectory(zipped, Path.GetDirectoryName(expectedUploaderPath));
            }
            else
            {
                _logger.LogDebug("Zip file does not exist. Fail the locating.");
                return null;
            }

            if (File.Exists(expectedUploaderPath))
            {
                _logger.LogDebug("File uploader is found: {0}", expectedUploaderPath);
                return expectedUploaderPath;
            }
            else
            {
                _logger.LogDebug("File uploader is not found: {0}", expectedUploaderPath);
                return null;
            }
        }

        private const string TraceUploaderArchiveFileName = "TraceUpload.zip";

        private ILogger _logger;
    }
}

Also, if you set log level of ServiceProfiler to debug, there shall be logs to help see what path has been searched.

mkleijn commented 4 years ago

I expect that it can locate the zip file as its content got extracted into the directory it resides in.

But I'll try to change the log level of the profiler so I can test it. Can I do that with a environment variable or something?

xiaomi7732 commented 4 years ago

I expect that it can locate the zip file as its content got extracted into the directory it resides in.

But I'll try to change the log level of the profiler so I can test it. Can I do that with a environment variable or something?

Yes, set:

Logging__LogLevel__ServiceProfiler=Debug

For details, refer to Configurations.

Also, do you know is there process limitations for SF hosted applications? Like is it possible calling an external application to start up a process is prohibited?

mkleijn commented 4 years ago

Hi xiaomi7732, i notice some messages indicating

Exception type ServiceProfiler.EventPipe.Client.TraceValidators.ValidateFailedException Exception message [ActivityListValidator] No sample activity matches the trace.

ServiceProfiler.EventPipe.Client.TraceValidators.ValidateFailedException: [ActivityListValidator] No sample activity matches the trace. at ServiceProfiler.EventPipe.Client.TraceValidators.ActivityListValidator.ValidateImp(IEnumerable1 samples) at ServiceProfiler.EventPipe.Client.TraceValidators.TraceValidatorBase.Validate(IEnumerable1 samples) at ServiceProfiler.EventPipe.Client.TraceValidators.TraceValidatorBase.Validate(IEnumerable1 samples) at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider.ValidateSamples(String traceFilePath, List1 samples)

could that be related to #66 ?

xiaomi7732 commented 4 years ago

@mkleijn Thanks for noticing that. The issue of validation may happen occasionally. That normal as long as it doesn't happen all the time. Also, it won't proceed to looking for uploader if the validation failed. The issue here that it can't find uploader is abnormal. Did you get a chance to run it with debug level? Is it possible for you to share the logs here?

xiaomi7732 commented 4 years ago

Hey @mkleijn it has been a while. Are you still experiencing the issue?

mkleijn commented 4 years ago

Hi @xiaomi7732,

While i haven't been able to spend much time on the issue, i am still experiencing it yes. Below are some traces that I get in AppInsights after setting the debug level environment var:

trace without traffic:

10:42:17 StartProfiler succeeded. 10:42:17 AI: Process was called on the TelemetrySink after it was disposed, the telemetry data was dropped. 10:42:17 AI: Process was called on the TelemetrySink after it was disposed, the telemetry data was dropped. 10:42:17 Service Profiler session started. 10:44:17 AI: Process was called on the TelemetrySink after it was disposed, the telemetry data was dropped. 10:44:17 StopProfiler triggered. 10:44:18 Trace validation result: False 10:44:18 Trace validation failed. Upload aborted and no customer events released. 10:44:18 Trace validation failed. Upload aborted and no customer events released.
10:44:18 StopProfiler succeeded. 10:44:18 Service Profiler session finished.

Trace with traffic: 10:53:56.469: Tracked call to https://agent.azureserviceprofiler.net/api/profileragent/v4/settings?iKey=2a863931-f1e1-4772-b180-f3239d321206&featureVersion=1.0.0&oldTimestamp=2020-04-28T08:47:09.0000000Z, return code 304 10:53:56.530: Trace validation result: True 10.53:56.625: System.ComponentModel.Win32Exception 'The system cannot find the file specified.' in ServiceProfiler.EventPipe.Client.UploaderCallers.OutOfProcCaller.Execute

System.ComponentModel.Win32Exception: at System.Diagnostics.Process.StartWithCreateProcess (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Diagnostics.Process.Start (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Diagnostics.Process.Start (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Diagnostics.Process.Start (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at ServiceProfiler.EventPipe.Client.UploaderCallers.OutOfProcCaller.Execute (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2001.2402, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at ServiceProfiler.EventPipe.Client.TraceUploader.TraceUploaderProxy+<>c__DisplayClass2_0.b0 (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2001.2402, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Threading.Tasks.Task1.InnerInvoke (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at ServiceProfiler.EventPipe.Client.TraceUploader.TraceUploaderProxy+d1.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2001.2402, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider+<UploadTraceAsync>d__5.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2001.2402, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider+d__3.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2001.2402, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)

10:53:56.633: Unexpected error happens on stopping service profiler. (Same call stack as previous event)

So not much more info there i'm afraid. I only can see the logs in AI by now, and especially if I generate traffic it is difficult to make sense of the logs as it is cluttered with traces of all endpoints and dependencies being called (which I suspect is the result of setting the log level to be more verbose). Is it possible to view the logs on the servers in event viewer or in some .log files maybe? I haven't been able to locate them.

thanks for your help

xiaomi7732 commented 4 years ago

@mkleijn Refer to Logging in .NET Core and ASP.NET Core for how to get the logs in console or files. Here's a piece of code I use with Serilog for logging files just FYI:

using System;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Hosting;
using Serilog;
using Serilog.Events;

namespace ServiceProfiler.EventPipe.UserApp30
{
    public class Program
    {
        public static void Main(string[] args)
        {
            Log.Logger = new LoggerConfiguration()
                .MinimumLevel.Warning()
                .MinimumLevel.Override("Microsoft.ServiceProfiler", LogEventLevel.Debug)
                .MinimumLevel.Override("ServiceProfiler", LogEventLevel.Debug)
                .Enrich.FromLogContext()
                .WriteTo.Console()
                .WriteTo.File("Profile.log")
                .CreateLogger();

            try
            {
                CreateHostBuilder(args).Build().Run();
            }
            catch (Exception ex)
            {
                Log.Fatal(ex, "Host terminated unexpectedly");
            }
            finally
            {
                Log.CloseAndFlush();
            }
        }

        public static IHostBuilder CreateHostBuilder(string[] args) =>
            Host.CreateDefaultBuilder(args)
                .UseSerilog()
                .ConfigureWebHostDefaults(webBuilder =>
                {
                    webBuilder.UseStartup<Startup>();
                });
    }
}

Let me know if you need more help. Feel free to send logs back and I'd be glad to go over the logs with you.

s-krawczyk commented 4 years ago

Hi, I have same issue on dotnet core 2.2 on ASF. Can you give some info what is going on? Any progress?

xiaomi7732 commented 4 years ago

@s-krawczyk Thanks for the contact. I understand you might have some similar issue. Could you please start another issue on github for troubleshooting? Please attach the logs for the error whenever possible. I have a guess that the issue might be there's not a writable temp folder.

mkleijn commented 4 years ago

Hi @xiaomi7732,

In the end for me it just suddenly started working. I don't know what I changed in the end to make it work, whether it was a nuget update or that some changes in initializing the ApplicationInsights packages for Service Fabric.

I do still occasionally get the same error though:

System.ComponentModel.Win32Exception:
   at System.Diagnostics.Process.StartWithCreateProcess (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Diagnostics.Process.Start (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Diagnostics.Process.Start (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Diagnostics.Process.Start (System.Diagnostics.Process, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at ServiceProfiler.EventPipe.Client.UploaderCallers.OutOfProcCaller.Execute (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2005.2701, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at ServiceProfiler.EventPipe.Client.TraceUploader.TraceUploaderProxy+<>c__DisplayClass2_0.<DefaultCallUploadAsync>b__0 (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2005.2701, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Threading.Tasks.Task`1.InnerInvoke (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at ServiceProfiler.EventPipe.Client.TraceUploader.TraceUploaderProxy+<UploadAsync>d__1.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2005.2701, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider+<UploadTraceAsync>d__6.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2005.2701, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at ServiceProfiler.EventPipe.Client.ServiceProfilerProvider+<PostStopProcessAsync>d__4.MoveNext (Microsoft.ApplicationInsights.Profiler.Core, Version=2.6.2005.2701, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)

From the debug snapshot i find the following values for the parameters supplied to the DefaultCallUploadAsync(string exePath, string args) method:

exePath: "D:\\SvcFab\\_App\\**  My Application Name 
 **Type_App8\\** My Service Name **Pkg.Code.1.0.0.20200902.2\\ServiceProfiler\\Microsoft.ApplicationInsights.Profiler.Uploader.dll"

args: "-t \"D:\\SvcFab\\_App\\**  My Application Name 
 **Type_App8\\temp\\bbe3f83c-7525-4787-8d2a-3e02ad8b19e9.netperf\" -i a0f018f2-54c9-4f00-b65c-de285b38f5e1 -d 260ac13a-ca42-46ca-9170-2ed09c0a8290 --sessionId \"2020-09-03T10:03:24.8160416Z\" -s \"westeurope-kpui2txzlbz5m\" --host https://agent.azureserviceprofiler.net --metadata \"D:\\SvcFab\\_App\\**  My Application Name 
 **Type_App8\\temp\\bbe3f83c-7525-4787-8d2a-3e02ad8b19e9.metadata\""

Maybe you can use this information. I don't feel comfortable sharing the debug snapshot here but i can send it to you privately if it would help you.

Like i said, for me it is working well enough to profile my application with the files that do get uploaded, but it might help others and getting rid of some exceptions in my Application Insights logs wouldn't hurt as well of course :)

xiaomi7732 commented 4 years ago

@mkleijn Thanks for the callback. Now that I see the path, there could be a few reasons:

Either way, putting the uploader to a different folder should resolve the issue.

If you don't mind, try set the profiler cache folder to the path that is shorter: LocalCacheFolder = "D:\\SPCache", for example. Refer to Configurations for details.

mkleijn commented 4 years ago

@xiaomi7732 Am i responsible for making sure that directory exists, or will it be created if it doesn't? And is it safe to have multiple processes use the same temp dir? Or should I use e.g. D:\SPCache\Service1 and D:\SPCache\Service2 etc.

xiaomi7732 commented 4 years ago

@mkleijn Thanks for the questions.

  1. The folders will be created if not exists. One catch is, make sure the apps have permissions to create the folders.
  2. Same folder for multiple processes should be fine.
mkleijn commented 4 years ago

@xiaomi7732 I tried changing the local cache folder, but the problem remains. I still get an error the file cannot be found, in the debug snapshot is see the argument passed to the shell now is:

"D:\AITemp\SP\Microsoft.ApplicationInsights.Profiler.Uploader.dll -t \"D:\AITemp\SP\7f22be73-391d-4dd9-a331-ff243d22e2da.netperf\" -i 2a863931-f1e1-4772-b180-f3239d321206 -d 2c5916f8-36b5-43ff-81b4-a701827f7658 --sessionId \"2020-09-18T06:51:15.4287377Z\" -s \"westeurope-kpui2txzlbz5m\" --host https://agent.azureserviceprofiler.net --metadata \"D:\AITemp\SP\7f22be73-391d-4dd9-a331-ff243d22e2da.metadata\""

This path does exist and there are several of these .metadata and .netperf files in there, next to a bunch of dll files. I also do get profile traces in AI so it seems like it is working sometimes. Maybe a racing condition where it tries to upload files just before they are flushed out to disk or something?

Anyway, it isn't really hurting me at the moment other than cluttering up my AI logs. I am thinking of removing the profiler from my production cluster though as I got warnings of diskspace running low on all my nodes and ended up having to delete quite a lot of these .netperf files to free up about 20gb on every node. Is there something in place to delete these temp files after uploading? Maybe only the ones that failed with the exception above are remaining in the temp dir.

xiaomi7732 commented 4 years ago

@mkleijn Thanks for the feedback. The customize location is working by reading the path you posted. Yes, successfully uploaded traces will be deleted automatically.

Only when you have time, could you please check to see if the file of D:\AITemp\SP\7f22be73-391d-4dd9-a331-ff243d22e2da.netperf in your log exists? That will help determine if that is a racing condition you mentioned.

xiaomi7732 commented 3 years ago

@mkleijn, sorry for the late reply. It looks like the same issue as in #97. Please try to see if that resolves the issue. TL;DR: Add directory of dotnet.exe to %PATH%.

xiaomi7732 commented 3 years ago

Close due to no actionable for now. Feel free to open new issues.