Feature/log trace context

sgryphon commented 4 years ago

Adds W3C Trace Context information to logs, either directly (as additional parameter) or via injected provider, and send them up to the server endpoint. This is the client side of the change; there is also a corresponding change in the .NET server side.

Background

There is now a W3C standard for correlating distributed tracing. The standard is supported by .NET Core, by simply setting the traceparent header in a web api call, the trace-id and local span-id to be used as a parent will be sent to the server.

These can then be seen in the server as TraceId, ParentId (the client's span), and a new SpanId property (for the server operation) set in the scope, and automatically logged by providers such as Seq.

This passing of the trace-id during API calls is not part of JSNLog, but just how the ID is passed between the client and server, so both can generate logs.

This Change

This change allows the client-side trace-id, for the current operation (e.g. button press), current span-id, and any parent span-id to be sent up with each client log message. It is sent on the log message (rather than header), as multiple messages might be batched. Also, we aren't doing an API transfer call to generate a new span ID (as using the header would do), but passing up the raw values from the client for the client side log.

These are the injected in the scope on the server for logging the message.

This means any log statements from the client that happen due to the button press can be correlated with any events on the server, as they all have the same trace-id for that event. (Also the parent-id of any server traces would match the span-id of any client traces).

mperdeck commented 4 years ago

Thanks for your pull request. You obviously spent quite a bit of time on this, and even figured out my build system.

I have now read up on the Trace Context W3C Recommendation: https://www.w3.org/TR/trace-context/#considerations-for-trace-id-field-generation and .Net Core support for the Trace Context Recommendation: https://devblogs.microsoft.com/aspnet/improvements-in-net-core-3-0-for-troubleshooting-and-monitoring-distributed-apps/

To me it looks like this sits in the same space as the requestId feature already supported by JSNLog: http://jsnlog.com/Documentation/JSNLogJs/JL/SetOptions (find "requestId") http://jsnlog.com/Documentation/HowTo/RequestIds

I appreciate the Trace Context W3C Recommendation is better in that 1) it is a standard; and 2) it captures trace information generated on multiple systems.

I am currently going through your pull request, understanding your solution.

I will get back to you as soon as I can with my thoughts.

mperdeck commented 4 years ago

Again thank you for the interesting pull request. I have gone through it, and have the following issues / questions. Would be interested in your feedback on these.

1) TraceId, SpanId, ParentId

I'm not sure about the meaning of SpanId.

I read the Trace Context W3C Recommendation https://www.w3.org/TR/trace-context/#considerations-for-trace-id-field-generation

In section "3.2.2 traceparent Header Field Values" this describes:

trace-id: essentially identifies the complete distributed transaction.
parent-id: generated by the caller of an API end point.
span-id: section "3.2.2.4 parent-id" describes this as the same as parent-id.

So I'm not sure about the meaning of span-id in your pull request.

2) Flow, and creation of standard traceparent and tracestate

In your pull request, a trace context always starts life on the client:

When a log methods is called with a traceContext parameter. For example: trace(logObject: any, traceContext?: JSNLogTraceContext): JSNLogLogger;
Or by setting the traceContextProvider method property in JSNlogOptions.

However, I would want it to start life on the server:

1) GET request from client to server. 2) Server generates a trace-id and parent-id, using this when logging server side, and when sending API calls to other servers (as part of handling the GET request). 3) Server sends HTML response to client. The HTML contains the traceparent and tracestate. This cannot be sent in response headers (the JavaScript would not be able to read it), but would live in a bit of JavaScript code in the HMTL. 4) Callers to log methods on the client would only specify the parent-id. The log method would update the tracestate with that parent-id. It could use vendor "jsnlog". 5) The log request from jsnlog.js to the server would have the traceparent and tracestate.

Currently, JSNLog supports request ids. These are generated on the server and sent to the client inside a bit of JavaScript. For details, see my previous message. Essentially request ids have the same role as the trace-id (identifying a transaction across systems, in this case server and client).

Reasons I'm keen on this:

More compliant with the Trace Context W3C Recommendation.
Opens the door to creating an extension library, which extends say jQuery's ajax method to 1) send traceparent and tracestate request headers; 2) log timing and error information.

3) Sending of traceparent and tracestate

I appreciate you're sending the traceContext inside each log item, because multiple log items can be batched into one HTTP request.

On the other hand, the Trace Context W3C Recommendation says this information travels in traceparent and tracestate request headers. Staying compliant with the standard may make things easier in the future:

Using .Net Core 3 out of the box support for traceparent and tracestate.
jsnlog.js talking to logging end points other than the server side JSNLog end point.
The JSNLog server side end point being used by other systems to generate log entries on the server.

So my peferred solution is:

Send the parent-id in each log item (because it is specific to a log item).
Send traceparent and tracestate request headers, based on the last item in the batch.

4) ILogger.BeginScope is not available before .Net Core 3.1

In LoggingAdapter.cs line 33, you call the BeginScope method on ILogger to log a string with the TraceId, etc. Which makes sense (maybe logging as a dictionary would have been better).

However, the BeginScope method is only available in .Net Platform Extensions 3.1, and for example not in .Net Core 3.0. Problem is that JSNLog at the moment only has a dependency on .Net Standard 2.0. This way, code bases that haven't been upgraded yet to .Net Core 3 (which is not trivial) can still use JSNLog.

To solve this issue would require multi targeting - or not using BeginScope.

sgryphon commented 4 years ago

1) TraceId, SpanId, ParentId

Distributed tracing is based around a single logical operation, usually initiated when a user presses a button. This could be the start of a new page request, or could be just an operation within a single page.

A span is one segment of that logical operation; each service involved will have a span, although some may have multiple spans.

Parent ID would be better described as parent span ID, the span ID of the parent span, and is used to link a new span to the previous span.

When making a distributed service call, each span passes it's span ID in the 'parent-id' field of the traceparent, for the next service to record as the parent. The traceparent header is for passing the distributed context between services.

Log messages from each service should include the traceid, their spanid, and the parentid (spanid of their parent) that was passed in. The traceparent header is not used in the logging part (only the service call part).

The W3C Trace Context standard is mostly about the service calls, not the logging system.

Example sequence diagram with 5 spans within one trace, showing the parent relationships of each span:

DistributedTraceSpans

The following log messages would be recorded. The traceid can be used to correlate all messages resulting from the button press, even if some of them are missing e.g. if logv4 was missing.

The spanid/parentid details can be used to build the dependency of the different calls. If there are missing logs you can't reconstruct the full tree, but you can still correlate on traceid.

log 'start', traceid = d157, span = 59a1, parent = 0000
log '2', traceid = d157, span = 59a2, parent = 59a1
log '3', traceid = d157, span = 6a01, parent = 59a2
log '4', traceid = d157, span = 7b01, parent = 59a1
log '5', traceid = d157, span = ff01, parent = 7b01
log '6', traceid = d157, span = ff01, parent = 7b01
log 'end', traceid = d157, span = 59a1, parent = 0000

sgryphon commented 4 years ago

2) Flow, and creation of standard traceparent and tracestate

A distributed trace logical operation (a single traceid) can start anywhere. Usually this corresponds to a single user action, although it could be for each item in a batch process, or could involve multiple user interactions (e.g. I might use a single operation for a delete button and confirmation dialog and the processing).

In my case I was looking at a single page app, so all operations were client side button presses.

In a multiple page app you might end up with operations (separate trace-id's) like the following, which has operations started on the server from page requests (user clicked a link), as well as a client-side started operation from a button click.

MultiPageAppOps

I didn't build handling for this, as I was looking at single page. You may need to look at Activity.Current.TraceId (it will have been auto-created by asp.net) and inject that as a header into outgoing pages, then read the header client side.

sgryphon commented 4 years ago

3) Sending of traceparent and tracestate

The Trace Context W3C standard for traceparent is how to pass the context information to other services as part of service calls. It has nothing to do with how you transmit log information from clients to logging end points.

The JSNLog components are not application services, but a pipeline/proxy for transmitting logging messages from the browser to the logging endpoint.

To augment the first diagram to remove all the direct calls from the browser to the logger, we add the JSNLog forwarder/proxy on the web server. The traceparent header is not relevant for these calls, they are just transmitting the logging information.

WebProxy

A component diagram shows the JSNLog components are just a way to route logs from the client to the logging endpoint.

ForwardedLogs

I guess for internal logging / self-logging purposes you could send a traceparent header, if you wanted to connect JSNLog self logging on the client and server, but that should be a separate operation (traceid) as it isn't really connected to any of the, potentially multiple, application operations.

It also has nothing to do with the log messages that are sent across (it would only apply to internal JSNLog self-logging).

sgryphon commented 4 years ago

4) ILogger.BeginScope is not available before .Net Core 3.1

Easy fix as you said to log as a dictionary. Behind the scenes the BeginScope() extension method with a format string creates an internal dictionary, and logger providers use the signature IEnumerable<KeyValuePair<string,object>> to access the values.

mperdeck commented 3 years ago

Sorry for the delay. I've read your response, which made a lot of sense.

I'll send you my reply in a few days.

mperdeck commented 3 years ago

I found that ILogger.BeginScope is actually supported in .Netstandard 2.0, so that is no longer an issue.

I added one comment to your changes for jsnlog.js, regarding traceContextProvider

Once that is solved, I would be happy to accept your pull request.

mperdeck / jsnlog.js

Feature/log trace context #78