googleapis / cloud-trace-nodejs

Node.js agent for Cloud Trace: automatically gather latency data about your application
https://cloud.google.com/trace/
Apache License 2.0
279 stars 98 forks source link

Spans when using scatter gather are not collected #884

Closed jacktuck closed 6 years ago

jacktuck commented 6 years ago

When using scatter gather in the client it looks like spans are not collected for subsequent calls. Not only are A and B not associated, B is not collected at all. I thought async hooks were used to associated these types of patterns?

If the client requests /foo and /bar in the same promise context i'd expect them to be collected but only /foo is. Whereas If I have a client request to just /foo and /foo calls /bar in server, i do see both.

In my test i'm using node v9.4, the server is using restify, and the client is using axios for requests. I do see traces for subcalls just not when the client is making the subcall rather than the server. Also the projectId is i was testing is seventh-odyssey-219220' and last traces are about 24hrs old.

Is this a known issue?

client.js

require('@google-cloud/trace-agent').start()
const axios = require('axios')

axios.get('http://localhost/foo') // A
  .then(() => axios.get('http://localhost/bar')) // B

server.js

require('@google-cloud/trace-agent').start()

const restify = require('restify')
const server = restify.createServer()
const axios = require('axios')

function foo (req, res, next) {
  console.log('foo')
  res.send({random: Math.random()})
  next()
}

function bar (req, res, next) {
  console.log('bar')
  res.send({random: Math.random()})
  next()
}

server.get('/foo', foo)
server.get('/bar', bar)

server.listen(80, function() {
  console.log('%s listening at %s', server.name, server.url)
})
jacktuck commented 6 years ago

Oh wait would #872 fix this?

kjin commented 6 years ago

Hi @jacktuck -- we have a samplingRate option that specifies the number of allowable traces per second. We implement this by only allowing at most one trace per 1 / samplingRate seconds. Since the default value of samplingRate is 10, it seems like the /bar request is getting unsampled because it happens too soon after the /foo request. You can repro a success case by either introducing a 100+ ms delay between the two client calls, or adding the config option to start().

Let us know if that doesn't work!

jacktuck commented 6 years ago

@kjin Thanks for the reply. That does now work in the sense foo and bar get traced. But bar is not a child of foo like i thought it should be.

kjin commented 6 years ago

From the server's point of view, the two different requests are completely independent (it has no awareness that the client logically treats one as being a continuation of the other), so the traces from the server reflect this. The client doesn't send any trace data because we don't trace top-level outgoing requests.

To form a true parent-child relationship, the /foo request handler on the server would need to be making an outgoing request to /bar, as ancestry is defined on the graph of requests across network boundaries. But if you are just looking to have /foo and /bar to appear in the same trace in the Stackdriver Trace web UI, you can wrap the client's outgoing requests in a request handler or a custom root span, so that the client would indeed send trace data. In this case /foo and /bar are still "siblings" rather than actual parent/child, but the visual output yields: image This is generated from this alternative client.js, with server.js's sampling rate set to "always sample" (0).

jacktuck commented 6 years ago

@kjin That makes sense. Thanks for the detailed explanation.