serilog-tracing / serilog-tracing

A minimal tracing system that integrates Serilog with System.Diagnostics.Activity.
Apache License 2.0
123 stars 13 forks source link

Opt out of consuming external activities #18

Closed srogovtsev closed 9 months ago

srogovtsev commented 9 months ago

Hello, @nblumhardt,

I was just recently integrating OpenTelemetry in a project that already uses Serilog and SerilogTimings, so naturally I was very excited when I heard about this one. I gave it a quick browse, and I wonder if I can share some perspective with you.

But first, let me give you a quick view of the context. The application we're working on is more of a platform; meaning there's a lot of code produced by third parties, which runs in our context and uses our facilities. Naturally, as much as I personally love an endorse Serilog, we decided to hide it from third-party code, making them use Microsoft.Extensions.Logging instead. Inside, in our own code, we use Serilog, and M.E.L loggers use Serilog behind the scenes as well. We also use SerilogTimings internally so that we can have nice information of some operations interesting to us (you might even remember me contributing a tiny bit there).

Enter the OpenTelemetry. We add the otel support to our platform, the out-of-the-box instrumentations gather some data, we also add some of our own. We do not expect the third-party code to add their instrumentation, but if they want to, System.Diagnostics.Activity is already a part of the API, so no worries on our side.

In the back-end it's simple: otel goes to the collector, logs go to some Serilog-supported sink. When want everything in otel? We plug in Serilog.Sinks.OpenTelemetry (and we can route it back from the collector to Seq if we want). But in any case we have the correlation ids, so everything works more or less magically.

The last moving part that was missing was SerilogTimings, which - as you also show in this project - is a natural candidate for the inclusion in the trace, in addition to logs. Here comes this project, and with it do come some concerns unfortunately.

The thing is, for our use case I believe I only need the part that allows to simultaneously open an Activity and write to logs - i.e., logger.StartActivity from your example. I don't want all the other instrumentation-generated activities to end up in our logs. Nor do I want to use Serilog as an otel producer (i.e. the part that captures the Activities and outputs them to otel backend), because I'd prefer to use OpenTelemetry SDK for that.

I think that you'll end up with enough configuration to achieve what we need anyways, so we'll be able to use this anyway, but it has occurred to me that might be a more granular approach would be of use here. What I mean is that, from what I see, there are three different concerns in this project that might be better off as a separate components:

  1. Capturing existing Activities to logs, which you mention:

    In addition to generating spans, SerilogTracing also consumes spans generated elsewhere in an application via the System.Diagnostics.Activity APIs, such as those produced by ASP.NET Core or HttpClient. Activity sources can be enabled or disabled using the logger's MinimumLevel.Override settings." It would be very helpful to have an option to dump some already emitted activities (that come from existing instrumentations) into logs with configurable levels - i.e., I would want to write HttpClient activities as Verbose, but incoming HTTP requests as Information. As such, it is not a "minimum level -> logs" kind of setup, but "activity -> level" one. This part ensures that something that is already being written to otel trace, is also added to logs.

  2. The capture above would generate very generic-looking logs, which might not be sufficient for all cases. Here's where I'd use logger.StartActivity, which would emit both an activity and a log entry of our choosing. This part ensures that something that wasn't previously written at all, would be written to both otel trace and logs.
  3. And finally, for the environments where there is no OpenTelemetry SDK, a sink that would gather all the trace-related information from the log pipeline and export it as traces (I believe this is what you're implying in #13). This part ensures that something that was being written to logs, ends up in trace too.

I'm sorry for a very large message, especially at such an early stage of your project, but I believe that some things are better done earlier than later.

nblumhardt commented 9 months ago

Thanks for the thoughtful input @srogovtsev, much appreciated!

I think we're on the same page; so far, things have been in bootstrapping mode without much configuration exposed (a lot of nuts and bolts still being figured out), but I'm expecting this to change fairly soon.

Would it be correct for me to re-title this thread as "Opt out of consuming external activities", and implement it (logically) like:

    .CreateTracingLogger(options => options.ConsumeExternalActivities = false)

which would, behind the scenes, disable the ActivityListners?

In this mode I don't think any instrumentation would be applied, but I'm also expecting that the minimal instrumentation we're implementing will be removable using something equivalent to options.ActivityEnrichers.Clear() in the same scheme.

srogovtsev commented 9 months ago

Would it be correct for me to re-title this thread as "Opt out of consuming external activities",

I don't know if it would be better to opt-out or opt-in, but you can of course rename the item to better reflect the issue; I was struggling to find something appropriate myself.

.CreateTracingLogger(options => options.ConsumeExternalActivities = false)

Frankly, I would expect that if I wanted to only use the first part (logger.StartActivity), I wouldn't have to do anything at the logger configuration at all. Why would that be necessary?

(I should probably spend more time and just browse around your code)

nblumhardt commented 9 months ago

Thanks for the reply!

Consuming external activities is a "pit of success" choice, I think; without them, the traces arriving at sinks are incomplete, and in the case of HTTP client requests, not having a listener present will (indirectly) end up preventing trace id propagation.

I can see how it's undesirable in your scenario. For people giving this a shot straight out of the box without some other infrastructure at play, those activities will be important, though.

srogovtsev commented 9 months ago

I understand the argument for the "pit of success", and yet I believe the things that we want are not mutually exclusive.

Here's a very small PoC based on the latest version of your code:

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Trace;
using Serilog;
using Serilog.Formatting.Compact;
using SerilogTracing;
using SerilogTracing.Interop;

using var logger = new LoggerConfiguration()
    .Enrich.WithProperty("Application", typeof(Program).Assembly.GetName().Name)
    .WriteTo.Console(new RenderedCompactJsonFormatter())
    .CreateLogger()
    ;

using var source = new ActivitySource("qqq");

using var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource(source.Name)
    .AddSource(LoggerActivitySource.Instance.Name)
    .AddConsoleExporter()
    .Build();

using var rootActivity = source.StartActivity();

logger.Information("Start");

using (var a1 = logger.StartActivity("Activity 1: {prop}", "abc"))
{
    logger.Information("Inside activity 1");

    using (var a2 = logger.StartActivity("Activity 2: {prop}", "qwe"))
    {
        logger.Information("Inside activity 2");
    }
}

logger.Information("End");

(the required dependencies are:

<PackageReference Include="OpenTelemetry" Version="1.7.0" />
<PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.7.0" />
<PackageReference Include="Serilog.Formatting.Compact" Version="2.0.0" />
<PackageReference Include="Serilog.Sinks.Console" Version="5.0.0" />

)

This simulates my situation: tracing goes to otel, logging goes to Serilog.

And here's sample output:

{
    "@t": "2024-01-10T23:09:29.4239356Z",
    "@m": "Start",
    "@i": "b72c5e85",
    "@tr": "eda3d89ae8a5b35b590bfcb037c25385",
    "@sp": "95e817ef13ebd065",
    "Application": "ConsoleApp1"
}
{
    "@t": "2024-01-10T23:09:29.4441182Z",
    "@m": "Inside activity 1",
    "@i": "3b149571",
    "@tr": "eda3d89ae8a5b35b590bfcb037c25385",
    "@sp": "d6174fcb83060b0b",
    "Application":"ConsoleApp1"
}
{
    "@t": "2024-01-10T23:09:29.4443063Z",
    "@m": "Inside activity 2",
    "@i": "2959f1f4",
    "@tr": "eda3d89ae8a5b35b590bfcb037c25385",
    "@sp": "cfed38d24ef80c76",
    "Application":"ConsoleApp1"
}

Activity.TraceId:            eda3d89ae8a5b35b590bfcb037c25385
Activity.SpanId:             cfed38d24ef80c76                
Activity.TraceFlags:         Recorded                        
Activity.ParentSpanId:       d6174fcb83060b0b                
Activity.ActivitySourceName: Serilog                         
Activity.DisplayName:        Activity 2: {prop}              
Activity.Kind:               Internal                        
Activity.StartTime:          2024-01-10T23:09:29.4442924Z    
Activity.Duration:           00:00:00.0010566
StatusCode: Ok
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.7.0
    service.name: unknown_service:ConsoleApp1

{
    "@t": "2024-01-10T23:09:29.4453490Z",
    "@m": "Activity 2: \"qwe\"",
    "@i": "29ecd511",
    "@tr": "eda3d89ae8a5b35b590bfcb037c25385",
    "@sp": "cfed38d24ef80c76",
    "prop": "qwe",
    "SpanStartTimestamp": "2024-01-10T23: 09: 29.4442924Z",
    "ParentSpanId": "d6174fcb83060b0b",
    "Application": "ConsoleApp1"
}

Activity.TraceId:            eda3d89ae8a5b35b590bfcb037c25385
Activity.SpanId:             d6174fcb83060b0b
Activity.TraceFlags:         Recorded
Activity.ParentSpanId:       95e817ef13ebd065
Activity.ActivitySourceName: Serilog
Activity.DisplayName:        Activity 1: {prop}
Activity.Kind:               Internal
Activity.StartTime:          2024-01-10T23:09:29.4437213Z
Activity.Duration:           00:00:00.0398132
StatusCode: Ok
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.7.0
    service.name: unknown_service:ConsoleApp1

{
    "@t": "2024-01-10T23:09:29.4835345Z",
    "@m": "Activity 1: \"abc\"",
    "@i": "14c015d8",
    "@tr": "eda3d89ae8a5b35b590bfcb037c25385",
    "@sp": "d6174fcb83060b0b",
    "prop": "abc",
    "SpanStartTimestamp": "2024-01-10T23: 09: 29.4437213Z",
    "ParentSpanId": "95e817ef13ebd065",
    "Application": "ConsoleApp1"
}
{
    "@t": "2024-01-10T23:09:29.4849365Z",
    "@m": "End",
    "@i": "1a68c028",
    "@tr": "eda3d89ae8a5b35b590bfcb037c25385",
    "@sp": "95e817ef13ebd065",
    "Application": "ConsoleApp1"
}

Activity.TraceId:            eda3d89ae8a5b35b590bfcb037c25385
Activity.SpanId:             95e817ef13ebd065
Activity.TraceFlags:         Recorded
Activity.ActivitySourceName: qqq
Activity.DisplayName:        <Main>$
Activity.Kind:               Internal
Activity.StartTime:          2024-01-10T23:09:29.4226745Z
Activity.Duration:           00:00:00.0624634
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.7.0
    service.name: unknown_service:ConsoleApp1

I get what I need: StartActivity opens trace and writes log, all the correlation is there, and the only change to your code that I needed to make this work was to make LoggerActivitySource public, so that I could tell otel to watch it.

Seems to be an easy win?

srogovtsev commented 9 months ago

I've also noticed that while activity.AddProperty will add Tag to activity, values passed as part of starting call won't:

var a = logger.StartActivity("Activity 2: {prop}", "qwe")
a.AddProperty("test", "value");
Activity.TraceId:            6f51dd4e1bb5283c759d3e047bf826d6
Activity.SpanId:             7b246e5a60d47af7  
Activity.TraceFlags:         Recorded          
Activity.ParentSpanId:       f3bc5bb23b3b20fe  
Activity.ActivitySourceName: Serilog           
Activity.DisplayName:        Activity 2: {prop}
Activity.Kind:               Internal          
Activity.StartTime:          2024-01-10T23:30:29.3626482Z
Activity.Duration:           00:00:00.0024287            
Activity.Tags:                                           
    test: value                                          
StatusCode: Ok  
nblumhardt commented 9 months ago

Thanks for raising the tags issue, I've opened a new ticket to track it and figure out the best way forward with it :+1:

LoggerActivitySource.Instance.Name should (currently) always just be "Serilog", so at least in the short term, this setup should work without any changes, as far as I can tell.

I'm revisiting how CreateTracingLogger() and the activity listener work, now, I'll keep all of this in mind and see what I can come up with. Thanks again!

nblumhardt commented 9 months ago

Setup is now:

Log.Logger = new LoggerConfiguration()
    ...
    .CreateLogger();

using var _ = new TracingConfiguration().EnableTracing();

The EnableTracing() step is optional, and everything should work as expected when you listen to the "Serilog" activity source directly. Hopefully this covers it.

nblumhardt commented 9 months ago

This was fixed in #21, it'd be great to hear how it goes!

srogovtsev commented 9 months ago

I'll do my best to check it this week.

srogovtsev commented 9 months ago

I've pulled the last version (fd43e88ff324bf342f3232ba8cb4fb5bec6df36d), and so far it lines up with my checklist just nicely:

Thanks a lot for this great work, I'm already looking forward to lean on it in one of our features.

The only thing that slightly bothers me here is that I have to use magic string to reference activities from SerilogTracting:

using var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource(source.Name)
    .AddSource("Serilog") //<---here
    .AddConsoleExporter()
    .Build();

Is there any conceptual reason why we cannot expose this as a constant?

srogovtsev commented 9 months ago

As a separate observation, when one comes from SerilogTimings, and looks at this as a logging implement, there's a couple of things "missing", which would make this kind of usage easier. The idea is that as we configure our logging and tracing separately, we want them to be cross-correlated, but independent; i.e. in our particular case logging is enabled more often than tracing, so we were relying heavily on SerilogTimings to gather some insight into the system performance.

Which means that if we want to replace SerilogTiming with SerilogTracing, it would be nice to be able to gather the same things from the logs, most importantly the span duration (which used to be ElapsedMs, if I'm not mistaken), and the status ( completed/error/aborted, or something along those lines). I'm not sure if we can achieve the same level of human-readable messages as SerilogTimings has, without compromising the concept of activities, but at least having the data would be nice.

I can move this out to a separate issue if you think this is useful, but I wanted to start here to keep the context.

nblumhardt commented 9 months ago

RE the "Serilog" constant, the original intention was for the source name to be derived from the logger's context/type, though I've cooled on that idea for now. I'll PR a Constants.ActivitySourceName - thanks!

nblumhardt commented 9 months ago

RE:

it would be nice to be able to gather the same things from the logs, most importantly the span duration (which used to be ElapsedMs, if I'm not mistaken), and the status ( completed/error/aborted, or something along those lines).

The Formatters.CreateConsoleTextFormatter() method shows some of how this could be achieved, but although the "completed" and "failed" states are carried through in the log level (and could be included in the message this way using #if), there's no equivalent to "abandoned".

Let me know if you need a hand getting the text format you're after. It'd be interesting to explore whether or not there's still a case for tracking some kind of "abandoned" status, too.

srogovtsev commented 9 months ago

Let me know if you need a hand getting the text format you're after.

Thanks for the offering of help, but we're not using text formatting - we're either dumping the data into JSON with CompactJsonFormatter to be later parsed by tools or simply ingested into Seq, or send it via our own sink to a proprietary system.

So I probably wasn't articulate enough - I'm looking for a regular log event property which would carry the duration of the specific activity. I understand that we can calculate it ourselves based on SpanStartTimestamp, but it would be much easier to just have the ElapsedMs as SerilogTimings did.

srogovtsev commented 9 months ago

the original intention was for the source name to be derived from the logger's context/type, though I've cooled on that idea for now

It's great that you've changed your mind on this, because integrating multiple dynamic sources into OpenTelemetry would be... a challenge.

nblumhardt commented 9 months ago

Ah I see what you mean, RE the elapsed property. Might be one to discuss in a new ticket? I can see either an optional enricher, or a tweaked CompactJsonFormatter both doing a reasonable job of this but need to nail down the constraints :-)