Closed sambaran69 closed 1 year ago
Here is my configuration for Datadog
tracing. But couldn't send tracing to Datadog APM
{
"enabled": true,
"stackTrace": true,
"exporter": {
"type": "Datadog",
"options": {
"agentUrl": "http://10.104.0.7:8126",
"env": null,
"samplingPriority": "AUTO_KEEP",
"defaultTags": [
"recruit"
],
"tracerOptions": {
"analytics": true
}
}
}
}
I installed Datadog by using helm
chart with bellow values
datadog:
apiKey: KEY
logLevel: INFO
kubeStateMetricsEnabled: true
clusterChecks:
enabled: false
dogstatsd:
port: 8125
useHostPort: true
nonLocalTraffic: true
logs:
enabled: true
containerCollectAll: true
apmEnabled: true
apm:
enabled: true
port: 8126
...
I'm seeing this issue too. I filed https://github.com/DataDog/dd-trace-js/issues/970, but after more investigation I believe the core problem is moleculer improperly setting IDs on the dd-trace spans. dd-trace's MessagePack encoder expects the traceId
and spanId
to be 8 bytes, because that's what the id generator defaults to. However, moleculer overrides the dd-trace span's generated IDs with the 16-byte GUIDs used by moleculer's internal tracer. The extra bytes aren't handled in the MessagePack encoding and it ends up corrupting the trace payload, resulting in decoding errors on the agent.
In addition to corrupting the trace payload, it seems to also corrupt some internal state in dd-trace because future trace payloads will also be corrupt, even if their ids are the expected 8 bytes. I haven't yet gone far enough to figure out why this happens.
(Related: #690. Perhaps these issues can be solved at the same time.)
Yeah, it can be a problem, btw these 16-byte IDs work earlier. The main problem is that dd-trace has no exported method to create span_id & trace_id from an existing UUID and the private ID generation method in dd_trace always changes between versions. And I can't always rewrite this part in the exporter... I mean, I can fix it for the current latest version of dd-trace, but it doesn't mean it will work in the future. The final solution would be if dd-trace has method for this function.
@ngraef could you try it with the master, please? It shows tracing for me
The final solution would be if dd-trace has method for this function.
I don't think this covers everything. Even if we were to expose everything that is needed by Moleculer, how will spans created by Moleculer interact with spans created by Datadog? For example, we usually automatically set the parent of a span based on the span that is currently active in the current execution context. This can only be done if spans are created while they are active, and not after the fact from external tracing metadata.
As a more concrete example, say you have service A and service B. Service A calls service B using HTTP. What I would expect to see, is a trace with a moleculer.request
span as its root, which would have a http.request
client span as its child, which itself would have a moleculer.request
child from service B. Right now, the 2 moleculer.request
spans would be created by Moleculer, and the http.request
client span would be created by Datadog. How can these be linked together properly during execution?
In general we solve this by adding our own integration which replaces the built-in tracing functionality of the framework. I wonder how we could potentially instead achieve interoperability.
(please correct any of the above if incorrect since I'm not very familiar with Moleculer)
@rochdev Yeah, this is currently a problem. The moleculer spans & datadog spans are not connected. The problem is that Moleculer is a full complex framework with a built-in tracing solution which monitors all services (actions, events, custom spans...etc). So you can't create a dd-agent instrument for this, similar for express or redis where sufficient wrap some methods and the tracing will work.
In previous versions (maybe dd-trace 0.16) I've tried to make the connection between moleculer & datadog spans and it worked. I checked the full dd-trace source code and found the way how could I get the active span and how could I overwrite it from Moleculer (it was a dirty way because it uses private variables in dd-trace). But in the later dd-trace versions this internal things has been changed and this code stopped working. Here is a full trace from those times: You can see that the first span comes from DD-trace (http instrument) and there are Redis dd-trace spans between Moleculer spans.
In brief, I can solve it, if dd-trace would be two public methods. One method to get the current datadog span (trace_id & span_id & parent_span_id) based on the current executionAsyncId
, and another method to set the current span (span_id & parent_span_id).
Is there any chance that you can create them inside dd-trace?
I have the same problems. If making a perfect fix for this isn't easy, it would be better not producing unhandled exception.
Encoding trace: [{"trace_id":"1c1f719b2c0a3898","span_id":"1c1f719b2c0a3898","parent_id":"0000000000000000","name":"redis.command","resource":"brpoplpush","error":0,"meta":{"service":"moleculer","version":"0.14.7","span.kind":"client","db.type":"redis","out.host":"localhost","redis.raw_command":"BRPOPLPUSH bull:users.retirement:wait bull:users.retirement:active 5"},"metrics":{"_sample_rate":1,"db.name":1,"out.port":6379,"_dd.agent_psr":1,"_sampling_priority_v1":1},"start":1591221956761919000,"duration":5559285889,"service":"moleculer-redis","type":"redis"}]
Added encoded trace to buffer: 91 8c a8 74 72 61 63 65 5f 69 64 cf 1c 1f 71 9b 2c 0a 38 98 88 ea 60 27 59 30 29 9a cf 1c 1f 71 9b 2c 0a 38 98 88 ea 60 27 59 30 29 16 15 27 f0 63 98 9e 00 a8 64 75 72 61 74 69 6f 6e d3 00 00 00 01 4b 5b f8 81 a5 65 72 72 6f 72 00 a9 70 61 72 65 6e 74 5f 69 64 cf 00 00 00 00 00 00 00 00 a4 6e 61 6d 65 ad 72 65 64 69 73 2e 63 6f 6d 6d 61 6e 64 a8 72 65 73 6f 75 72 63 65 aa 62 72 70 6f 70 6c 70 75 73 68 a7 73 65 72 76 69 63 65 af 6d 6f 6c 65 63 75 6c 65 72 2d 72 65 64 69 73 a4 74 79 70 65 a5 72 65 64 69 73 a4 6d 65 74 61 86 a7 73 65 72 76 69 63 65 a9 6d 6f 6c 65 63 75 6c 65 72 a7 76 65 72 73 69 6f 6e a6 30 2e 31 34 2e 37 a9 73 70 61 6e 2e 6b 69 6e 64 a6 63 6c 69 65 6e 74 a7 64 62 2e 74 79 70 65 a5 72 65 64 69 73 a8 6f 75 74 2e 68 6f 73 74 a9 6c 6f 63 61 6c 68 6f 73 74 b1 72 65 64 69 73 2e 72 61 77 5f 63 6f 6d 6d 61 6e 64 d9 44 42 52 50 4f 50 4c 50 55 53 48 20 62 75 6c 6c 3a 75 73 65 72 73 2e 72 65 74 69 72 65 6d 65 6e 74 3a 77 61 69 74 20 62 75 6c 6c 3a 75 73 65 72 73 2e 72 65 74 69 72 65 6d 65 6e 74 3a 61 63 74 69 76 65 20 35 a7 6d 65 74 72 69 63 73 85 ac 5f 73 61 6d 70 6c 65 5f 72 61 74 65 cb 3f f0 00 00 00 00 00 00 a7 64 62 2e 6e 61 6d 65 cb 3f f0 00 00 00 00 00 00 a8 6f 75 74 2e 70 6f 72 74 cb 40 b8 eb 00 00 00 00 00 ad 5f 64 64 2e 61 67 65 6e 74 5f 70 73 72 cb 3f f0 00 00 00 00 00 00 b5 5f 73 61 6d 70 6c 69 6e 67 5f 70 72 69 6f 72 69 74 79 5f 76 31 cb 3f f0 00 00 00 00 00 00
Request to the agent: {"path":"/v0.4/traces","method":"PUT","headers":{"Content-Type":"application/msgpack","Datadog-Meta-Tracer-Version":"0.21.0","X-Datadog-Trace-Count":"2","Datadog-Meta-Lang":"nodejs","Datadog-Meta-Lang-Version":"v12.16.1","Datadog-Meta-Lang-Interpreter":"v8"},"protocol":"http:","hostname":"localhost","port":"8126"}
Inject into carrier: {"x-datadog-trace-id":"5204814146905357208","x-datadog-parent-id":"5204814146905357208","x-datadog-sampling-priority":"-1"}.
Error: Error from the agent: 400 Bad Request
at IncomingMessage.<anonymous> (/www/poomgo_backend/node_modules/dd-trace/packages/dd-trace/src/platform/node/request.js:35:23)
at /www/poomgo_backend/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/www/poomgo_backend/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:51:14)
at Scope.activate (/www/poomgo_backend/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
at IncomingMessage.bound (/www/poomgo_backend/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
at IncomingMessage.emit (events.js:323:22)
at endReadableNT (_stream_readable.js:1204:12)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
status: 400
}
$ tail -f /var/log/datadog/trace-agent.log
2020-06-04 07:25:23 KST | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
@icebob Any way you can simply create spans using the existing API? Unless I'm misunderstanding something I think our public API already offers what you need, as long as it's possible to let dd-trace
pick the IDs.
@rochdev could you show example how can I do that with dd-trace?
@icebob You mentioned several operations that are traced above. If you can link me to the code of one of them and also to how it's currently traced I can try to do it using only our public interface if at all possible. If not, it will at least give me a better idea of how tracing works in Moleculer and what could be done to improve the interoperability.
@rochdev Great!
In Moleculer there are actions and events. Both of them traced. Both have a Context
instance which contains data for tracing, as well (id
, parentID
, requestID
).
The built-in tracing feature is created as a middleware which can wrap the internal methods and can add additional functions. Here is the source of the tracing middleware
It wraps the service action & event handlers and generates tracing span based on the Context
. It calls the ctx.startSpan
and ctx.finishSpan
methods which are a shorthand for the Tracer
startSpan
& span.finish
The Tracer
has exporters, it can be multiple. One of them the Datadog exporter which uses the dd-trace
to convert the Moleculer traces & spans to Datadog traces & spans.
The developers can create custom spans as well. For this they call the ctx.startSpan()
/ctx.finishSpan()
methods. Or if Context is not available, broker.tracer.startSpan()
/span.finish()
When a request goes from NodeA to NodeB via transporter, the trace info is carried with the Context
to the remote node. All nodes collect the spans locally and send to the exporters
Here is the simple generated demo where you can play with Moleculer: https://github.com/moleculerjs/project-demo
If you need any more info, just ask me.
Thanks for that great overview, @icebob!
@rochdev Datadog is a crucial part of my team's operations. Any insight you can provide to help us improve the moleculer integration is greatly appreciated.
datadog now has 2 methods for this
to activate current spantracer.scope().activate(span)
to get current parent spantrace.scope().active()
@icebob
Update:
I have tried this by doing activate in tracing middleware just before calling the actual handler and scope propagates fine. I am not sure yet how to do it for all of the exporters
datadog.js
(exporter)
span.meta.datadog = {
span: ddSpan,
ddTracer: this.ddTracer,
asyncId,
oldSpan
};
tracer.js (middleware)
span.meta.datadog.ddTracer.scope().activate(span.meta.datadog.span, () => {
// Call the handler
return handler(ctx).then(res => {
const tags = {
fromCache: ctx.cachedResult
};
if (_.isFunction(opts.tags)) {
const r = opts.tags.call(ctx.service, ctx, res);
if (r)
Object.assign(tags, r);
} else if (_.isPlainObject(opts.tags)) {
if (opts.tags.response === true)
tags.response = res != null ? Object.assign({}, res) : res;
else if (Array.isArray(opts.tags.response))
tags.response = _.pick(res, opts.tags.response);
}
span.addTags(tags);
ctx.finishSpan(span);
//ctx.duration = span.duration;
return res;
}).catch(err => {
span.setError(err);
ctx.finishSpan(span);
throw err;
});
})
CC: @icebob
also some open standards in works: https://github.com/open-telemetry/opentelemetry-js
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Current Behavior
I should be able to get traces from my moleculer services on Datadog APM. I am using the latest version of Moleculer.
Expected Behavior
I do not get any Trace information. My moleculer service logs reports 400 bad request error. Correspondingly in the Datadog agent logs there are error messages stating trace payload could not be decoded.
Failure Information
The agent is version is v6.18.0 and the tracer version is 0.19.1.
I got the following in the moleculer service log:
I am getting the following in the Datadog agent log:
Steps to Reproduce
Please provide detailed steps for reproducing the issue.
I have created two env variables for the container running my moleculer service: DD_AGENT_HOST and DD_AGENT_URL. The service is able to connect to Datadog agent and sent tracing data.
I am initialising the service as:
Reproduce code snippet
Context
Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.
Failure Logs