googleapis / nodejs-logging-bunyan

Node.js client integration between Stackdriver Logging and Bunyan.
https://cloud.google.com/logging/
Apache License 2.0
63 stars 34 forks source link

Trace logs not populating in Gen 2 Cloud Functions #697

Closed jeffw-bicycle closed 11 months ago

jeffw-bicycle commented 1 year ago

Hello, we recently upgraded a GCP Cloud Function from Gen 1 to Gen 2 and our bunyan logging no longer contains trace logs when viewing in Log Explorer. This was working out of the box for us in Gen 1. Gen 1 logs would contain a trace ID like trace: "projects/ourproject/traces/12345abcde, but the trace property doesnt show up in Gen 2 logs.

We have tried implementing @google-cloud/trace-agent as described in the documentation here to no effect.

Environment details

Steps to reproduce

  1. Create a gen 2 cloud function with bunyan and trace-agent enabled
  2. Check logs for trace ID

NOTE: It looks like nodejs-logging-winston had a similar issue in the past https://github.com/googleapis/nodejs-logging-winston/issues/287

andie1706 commented 1 year ago

Testing as the documentation mentions, the trace is the same for all the executions, is there a way to get traces in a dynamically way?

jeffw-bicycle commented 1 year ago

That is part of what we're unable to determine. Gen 1 CFs took care of this for us automatically.

We think it is possible to grab the value in X-Cloud-Trace-Context from the request header, but this has a few issues. This restricts traces to just HTTP Cloud functions, so event driven functions would not be covered. It would also require passing that value manually throughout each function.

kramarz commented 1 year ago

Can you share the code that worked in gen1? Was it also in Node18? I was testing this and the only solution that works for me also works in gen2. It is based on https://github.com/googleapis/cloud-trace-nodejs/issues/1076:

const tracer = require('@google-cloud/trace-agent').start({
  // Settings recommended for Cloud Functions.
  samplingRate: 0,
  bufferSize: 1
});
const bunyan = require('bunyan');
const {LoggingBunyan} = require('@google-cloud/logging-bunyan');
const loggingBunyan = new LoggingBunyan();
const logger = bunyan.createLogger({
  name: 'my-service',
  streams: [
    loggingBunyan.stream('info'),
  ],
});

exports.helloWorld = (req, res) => {
  tracer.runInRootSpan({ name: 'init',  traceContext: tracer.propagation.extract(key => req.headers[key])}, rootSpan => {
    let message = req.query.message || req.body.message || 'Hello World!';
    logger.info(message);
    res.status(200).send(message);
    rootSpan.endSpan();
  });
};

It is required to call tracer.runInRootSpan in each function. Note that this works for event functions as well - you can access trace context through context.traceparent(https://github.com/GoogleCloudPlatform/functions-framework-nodejs/pull/336) instead of X-Cloud-Trace-Context header.

jeffw-bicycle commented 1 year ago

Hi, thanks for getting back to me on this. In our gen1 cloud functions, we are running in Node18 as well but do not have any special configuration. It worked for us out of the box with just the basic bunyan logger setup. No need to even require trace-agent in our gen1 functions.

I tried out your code and got positive results as well, so thank you very much! We will continue testing with your recommendations and see how it goes.

I do have a question about the trace-agent options. When you say // Settings recommended for Cloud Functions. where are you getting these recommendations? I havent been able to find much documentation on these options, such as the impact of greatly increasing the bufferSize or something.

kramarz commented 1 year ago

Oh do you mean a setup like this:

const bunyan = require('bunyan');
const logger = bunyan.createLogger({
  name: 'my-service',
});

?

I see that this in fact adds trace to the logs in gen1, but not in gen2. The reason for that is simple: the logger here does not care about traces on its own and does not use @google-cloud/logging-bunyan or @google-cloud/trace-agent. It just writes logs to stdout. Since gen1 only allows for a single request execution at the time it sends all the logs from stdout to cloud logging with the trace id it got from the request that was handled when the log was written.

Since gen2 allows concurrent requests it is not possible to deduce which request the log relates to so it sends them to cloud logging without it, therefore it is necessary to set it up in the code.

I wrote the previous snippet to show how to manually work with @google-cloud/logging-bunyan, but after second thought this approach will also have the same issue as in https://github.com/googleapis/nodejs-logging-winston/issues/287 that you mentioned - it wont work with concurrent requests that are allowed in gen2. One request may be still running when the second one would try to create a new root span which will be ignored.

Instead maybe just set the trace manually using a child logger, again in a similar way as in that winston issue:

const {GoogleAuth} = require('google-auth-library');
const bunyan = require('bunyan');
const baseLogger = bunyan.createLogger({
  name: 'my-service',
});
const propagation = require('@opencensus/propagation-tracecontext');
const traceContext = new propagation.TraceContextFormat();

exports.helloWorld = async (req, res) => {
    const auth = new GoogleAuth();
    const projectId = await auth.getProjectId();
    const {traceId} = traceContext.extract({getHeader: key => req.headers[key]});
    const logger = baseLogger.child({'logging.googleapis.com/trace': `projects/${projectId}/traces/${traceId}`})
    let message = req.query.message || req.body.message || 'Hello World!';
    logger.info(message);
    res.status(200).send(message);
};

Just to answer your question the recommendation was from the linked GitHub issue in the gist. The goal here was to avoid buffering as cloud functions can limit the CPU when there are no requests, which would delay or even break some API calls.

jeffw-bicycle commented 1 year ago

Hello, Yes, that is what our basic logging looks like for gen1.

I tested the gen2 traces with your original suggestion and did not run into any problems tracing concurrent requests. I made a function with a timeout and then hit the function several times while the first request was still processing. All the requests were started concurrently and assigned unique traceIds that I was able to use in the GCP logging. I did not try the child logger tracing because of that, but do you think that is still necessary?

I am still testing the trace-agent usage for event driven functions and hope to get back to you about that soon! Thanks!

jeffw-bicycle commented 1 year ago

Oh one thing I forgot to mention. When we have identical or very similar requests come in, it seems like theyre assigned the same trace Id. Ex. I get two requests with the same payload for a webhook. Theyre both assigned a traceId 8947a59cf65a1a2799e0f7b344902464. The first request has the following headers: traceparent: "00-8947a59cf65a1a2799e0f7b344902464-94ff57e382d28f80-01" x-cloud-trace-context: "8947a59cf65a1a2799e0f7b344902464/10736396671338581888;o=1" and the second has traceparent: "00-8947a59cf65a1a2799e0f7b344902464-ee640be8dadce188-01" x-cloud-trace-context: "8947a59cf65a1a2799e0f7b344902464/17177867973430141320;o=1" It does not seem to be parsing out the entire context here? Any idea why this would be or why the requests are being assigned the same root trace to start?

kramarz commented 1 year ago

Ok, I agree (and also tested) that the solution with using trace-agent works as it automatically keeps track of all the concurrent executions and identifies which context each one belongs to (which was my concern that it would fail to). However you may want to consider the second solution because

"The same trace" issue is a separate issue and is not related to this library at all so I would suggest to keep that discussion in the GCP support case. In short Cloud Functions will only assign the trace id if it is not already given in the header. Please check if whatever client is calling the function is not sending any of those headers. For instance if the client has @google-cloud/trace-agent running then it will automatically set those headers when calling any API .

cindy-peng commented 11 months ago

Hi @jeffw-bicycle , regarding the same trace Id issue, when forcing requests to be traced, X-Cloud-Trace-Context header is expected to have the following specification:

         `"X-Cloud-Trace-Context: TRACE_ID/SPAN_ID;o=TRACE_TRUE"`

Here TRACE_ID is expected to be unique between different requests. Refer How to force a request to be traced for more details. So I would suggest to specify different TRACE_ID in X-Cloud-Trace-Context for different requests.

Also, I am wondering with the trace-agent runInRootSpan solution, are you still running into trace missing issues for Gen2 functions?

cindy-peng commented 11 months ago

Closing the issue as Logger doesn't provide special handling for traceId between gen1 and gen2. Trace logs can still be enabled in gen2 by setting up trace context in code. It is also suggested to set up different TRACE_ID in X-Cloud-Trace-Context for different requests. @jeffw-bicycle Please feel free to re-open the issue if you are still running into this. Thanks!