Open travisby opened 8 years ago
If you can post an example, that would be nice! :-) Eventually you could start your app with the ConsoleRecorder, which logs trace data to stdout, and paste the output here!
I think I am also experiencing this issue. Below are the details I generated from my local implementation which may help.
I have two APIs setup with zipkin and console recorder. One API http://localhost:3000/users/getDependencyUsers makes a GET call to the other API using Superagent @ http://localhost:3001/users/get and returns the result in JSON.
One call to getDependencyUsers returns the following console outputs:
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): ServiceName("zipkinTestingParent")
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): Rpc("GET")
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): BinaryAnnotation(http.url="http://localhost:3000/users/getDependencyUsers")
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): ServerRecv()
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): LocalAddr(host="InetAddress(X.X.X.X)", port=0)
Parent calls for users from child service
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): BinaryAnnotation(http.status_code="304")
Record at (spanId=e862250d0c949029, parentId=e862250d0c949029, traceId=e862250d0c949029): ServerSend()
GET /users/getDependencyUsers 304 3.240 ms - -
Child Service Returns the following:
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): ServiceName("zipkinChildDependency")
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): Rpc("GET")
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): BinaryAnnotation(http.url="http://localhost:3001/users/get")
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): ServerRecv()
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): LocalAddr(host="InetAddress(X.X.X.X)", port=0)
returning user list to parent
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): BinaryAnnotation(http.status_code="200")
Record at (spanId=dee209460623163c, parentId=dee209460623163c, traceId=dee209460623163c): ServerSend()
GET /users/get 200 0.417 ms - 58
Adding App.js contents for additional context
var express = require('express');
var path = require('path');
var favicon = require('serve-favicon');
var logger = require('morgan');
var cookieParser = require('cookie-parser');
var bodyParser = require('body-parser');
var routes = require('./routes/index');
var users = require('./routes/users');
//New for Zipkin
const {Tracer, ExplicitContext, BatchRecorder, ConsoleRecorder} = require('zipkin');
//const {Tracer, ExplicitContext, BatchRecorder} = require('zipkin');
var zipkinMiddleware = require('zipkin-instrumentation-express').expressMiddleware;
const {ScribeLogger} = require('zipkin-transport-scribe');
const CLSContext = require('zipkin-context-cls');
// var scribeRecorder = new BatchRecorder({
// logger: new ScribeLogger({
// scribeHost: '127.0.0.1',
// scribePort: 9410,
// scribeInterval: 1
// })
// });
var ctxImpl = new ExplicitContext();
var recorder = new ConsoleRecorder();
// const tracer = new Tracer({
// recorder: scribeRecorder,
// ctxImpl: new CLSContext('zipkin')
// });
const tracer = new Tracer({
recorder: recorder,
ctxImpl: new CLSContext('zipkin')
});
//
var app = express();
//Add the Zipkin Middleware
app.use(zipkinMiddleware({
tracer,
serviceName: 'zipkinTestingParent'
}));
// view engine setup
app.set('views', path.join(__dirname, 'views'));
app.set('view engine', 'jade');
// uncomment after placing your favicon in /public
//app.use(favicon(path.join(__dirname, 'public', 'favicon.ico')));
app.use(logger('dev'));
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));
app.use(cookieParser());
app.use(express.static(path.join(__dirname, 'public')));
app.use('/', routes);
app.use('/users', users);
// catch 404 and forward to error handler
app.use(function(req, res, next) {
var err = new Error('Not Found');
err.status = 404;
next(err);
});
// error handlers
// development error handler
// will print stacktrace
if (app.get('env') === 'development') {
app.use(function(err, req, res, next) {
res.status(err.status || 500);
res.render('error', {
message: err.message,
error: err
});
});
}
// production error handler
// no stacktraces leaked to user
app.use(function(err, req, res, next) {
res.status(err.status || 500);
res.render('error', {
message: err.message,
error: {}
});
});
module.exports = app;```
call to the other API using Superagent
The problem might be that Superagent isn't instrumented (yet). In order for Zipkin to pick up the trace state from one service to the next, you need the client to add the correct B3 headers (look at zipkin-instrumentation-fetch). It should be fairly simple to make a zipkin-instrumentation-superagent which does the same as the fetch instrumentation. That should work.
Do you have a sample app which has instrumented fetch properly that can be shared, or can the readme be expanded to explain how all of the different packages under zipkin JS interact with each other. I rewrote my sample app using cujojs-rest, and I am still experiencing issues. I can provide a small express app as a starter if that would be helpful.
It seems like additional documentation around where instrumentation happens could also help. Based on what I have learned thus far, it seems like you have to instrument every third party call manually, and that you have to add code across both the standard app.js and all routes files. Is this accurate?
@cwoodworth not sure if you have seen this, yet https://github.com/openzipkin/zipkin-js/tree/master/packages/zipkin-instrumentation-fetch
@adriancole, I use zipkin middleware and zipkin request wrapper and express http proxy wrapper. I am experience the same thing as @cwoodworth. I notice context is not continuing between middleware and wrappers. I tried using zipkins context cls and it should connect the context but it ends up having same issue. In the end, I have to pass context using the request/response object and setContext with the passing context before calling the wrapper. After that the traceId is set correctly between middleware and wrappers. I may not be doing it correctly. Please advise.
@clementchi do you still have this issue?
This appears to be the same issue as https://github.com/openzipkin/zipkin-js/issues/218 (although reported later) seems to have identified an issue with registering the zipkin middleware before bodyParser while using CLSContext and is now noted in the README https://github.com/openzipkin/zipkin-js/tree/master/packages/zipkin-context-cls#troubleshooting
There is a note about using using async_hooks may resolve these issue and I noticed there is a fork of node-continuation-local-storage that uses async_hooks and I wonder if the issue might be resolved by switching from async-listener to https://www.npmjs.com/package/cls-hooked
Hi @kurtharriger, thanks for reaching out. There is a PR introducing the usage of cls-hooked
in https://github.com/openzipkin/zipkin-js/pull/476 but I did not merge because I wanted a real user to try it out (zipkin-context-cls@0.19.2-alpha.7
) and then confirm that this fixes the problem. Are you up to that?
I only stumbled on this issue as a co-worker mentioned having run into an issue moving the middleware previously. I spent a little bit of time trying reproduce the original issue but I wasn't able to reproduce as of yet. So I'm not sure I can say the above fixes it without reproducing it... I didn't think to try it with older node version and we are running node 12.16.1 now so maybe its no longer an issue or perhaps I just haven't been able to force the necessary race condition with our app. :shrug:
If I get a bit more time this week I'll try to write a small test app and/or unit test and try again with some code I can publish here
I ran into this issue and was able to debug the issue to the root cause.
The issue shows itself when there is a body and you set the Content-Type
header properly. The root cause of the bug turns out to be not in the zipkin libraries per se. The reason the middleware stack was unraveling itself early is because of the body-parse library calls next()
from hooks so the invocation of the body-parser middleware completes and the middleware chain is picked up from when the hooked event is emitted. Both the json parser and text parser middlewares call into this method. The body of code is a function that gets eventually passed into this function (which is in a different node module called raw-body). Here the function attaches listeners to the stream and then exits. When it exists, it never calls next()
which was passed into the callback function; the callback function will get invoked on stream end. Thus when the body-parser middleware will unravel the call stack never having invoked any middleware below itself. The problem is that the zipkin middleware is above body-parse so it gets unraveled which means the tracer.scoped ends and the tracer id is unset. When the stream onEnd
event is fired, we invoke the next()
call that was in the callback, but at this point the tracer no longer has the id set which is why the controller ends up with no id.
I've fixed it by writing a simple middleware after body-parser that sets the id from req._trace_id
in a scope and calling next()
. This was preferable to moving the order of the tracing because we want to trace as much as possible.
@eugene-yao-zocdoc thanks a lot for such a detailed explanation. Is there anything you think we should do in our middlewares more than documenting this behavior? Could you share the middleware you wrote to include in such documentation?
@jcchavezs apologies for the lengthy delay in response.
Here is the middleware I wrote to deal with this:
app.use(function(req, res, next) {
if (
req.hasOwnProperty('zipkinTracer') &&
req.hasOwnProperty('_trace_id')
) {
req.zipkinTracer.letId(req._trace_id, () => {
next();
});
} else {
next();
}
});
zipkinTracer is set by our custom middlewares to store the tracer object.
As far as fixes go on your side, I don't see an obvious solution other than documentation more prominently. Perhaps another avenue is to raise an issue with body-parse library to allow you to pass in a function to override how the next middleware gets called? So devs can invoke something like
app.use(
bodyParser.text({
limit: options.bodySizeLimit,
type: 'text/plain',
next: () => {
tracer.letId(req._trace_id, () => {
next();
});
}
})
);
I'll post up a quick sample as soon as I have a second, but I wanted to record it here so I don't forget.
I was using both body-parser and zipkinMiddleware, with body-parser being the last item in the middleware chain.
Using getContext() + letContext() would give me a new spanid/traceid every time, rather than using the ID generated in the middleware.
(Once I can show you as an example) - is this intended behavior that we want documented, or something we can enhance?