googleapis / nodejs-logging-bunyan

Node.js client integration between Stackdriver Logging and Bunyan.
https://cloud.google.com/logging/
Apache License 2.0
63 stars 34 forks source link

fix: Use defaultCallback in LoggingBunyan class #601

Closed losalex closed 2 years ago

losalex commented 2 years ago

Error handling with a default global callback has an issue - bunyan library always uses default callback, so LoggingBunyan need to override it with default callback when provided.

Fixes #<595> 🦕

Taaanos commented 2 years ago

Even after upgrading to v3.2.2 the defaultCallback is not respected and the app crashes, interestingly sometimes it works. I observed that we get those errors at the same timestamp across our services. image

losalex commented 2 years ago

Even after upgrading to v3.2.2 the defaultCallback is not respected and the app crashes, interestingly sometimes it works. I observed that we get those errors at the same timestamp across our services. image

Thanks @Taaanos for trying this out! I wonder if there is any chance you can figure out what are the cases causing the crash? Based on what you said, seems sometimes the defaultCallback helps - having some details on what causing crash would help to find a root cause.

Taaanos commented 2 years ago

@losalex It's not clear what's causing the crash, "code 4" is all I get. This is the bigger picture. image

The following screenshot is from a different service, notice the timestamp across the screenshots I posted ("2022-03-14 16:28:34.263 CET"). We get the error at the same time (we log quite often). image if it matters we get these on GKE and CloudRun across multiple services.

losalex commented 2 years ago

Thank you @Taaanos! Do you have at least some clue what is a rate of crashes vs. handled errors?

Taaanos commented 2 years ago

@losalex actually it's only on 1/4 services that the error is handled with a rate of 75% handled. The other 3 are consistently crashing. An important point is that at those 3 services, we have our logger built in a separate module that is being used as an external dep. I can see in those 3 services, in yarn.lock, that v3.2.2 is used.

Taaanos commented 2 years ago

@losalex The issue regarding the inconsistency of the handling was due to two loggers being used and only one had a defaultCallback. I will write you soon again with results.

Taaanos commented 2 years ago

There are two very important observations regarding v3.2.2.

We released v3.2.2 on March 14th around 16:30 CET.

image Screenshot 2022-03-16 at 16 03 40

We have now reverted back to v3.2.1. I will report back.

losalex commented 2 years ago

Thank you @Taaanos for update and really sorry for an issue you have with 3.2.2 version. While I am looking into the issue, you can handle timeout errors with LoggingBunyan error handling mechanism by adding logger.on event handler as shown in example below and which can be used with 3.2.1:

// Imports the Google Cloud client library for Bunyan
const {LoggingBunyan} = require('@google-cloud/logging-bunyan');
// Creates a client
const loggingBunyan = new LoggingBunyan({
  projectId: 'your-project-id',
  keyFilename: '/path/to/key.json',
});

// Add event handler for errors
logger.on("error", (err:any) => console.log("Some error " + err + " !!!"));
Taaanos commented 2 years ago

@losalex thank you. It seems that the defaultCallback in v3.2.1 works for us. We log with the rate that we used too, and no memory leaks were observed again.

losalex commented 2 years ago

@Taaanos , thanks for update! I am sorry for all the inconvenience, but apparently I have a doubt that defaultCallback in v3.2.1 working properly - thats also a reason why I provided a sample which leverages LoggingBunyan error handling before. Please let me know if you see that timeout errors are handled correctly in v3.2.1 when using defaultCallback and if not, consider adopting the LoggingBunyan error handling. I am working now to resolve an issue with defaultCallback.

losalex commented 2 years ago

@Taaanos , I believe that 4.2.0 release is going to fix the issue with defaultCallback. Sorry again for any inconvenience.

Taaanos commented 2 years ago

@losalex, thanks for your comments. It's been 3 days on v3.2.1 and our services look good, errors are handled (still with "code 4"), no memory leak is observed. v4.2.0 already?

losalex commented 2 years ago

@Taaanos , sorry for some confusion - the version which has a latest fix is 3.3.0 and it is ready. Please let me know if you encounter any issues - really appreciate your feedback!

gnardini commented 1 year ago

Leaving this in case anyone in the future runs into this error.

I see occasional errors "GoogleError: Total timeout of API google.logging.v2.LoggingServiceV2 exceeded 60000 milliseconds before any response was received." which have become a bit more common lately (maybe I'm just logging more, not sure) which was crashing my app.

I added the defaultCallback but it didn't help, it just caused the error to be logged again before crashing.

I added the event handler which worked :) Now I just see the log but no crash.

This is my final setup:

import { LoggingBunyan } from "@google-cloud/logging-bunyan";
import * as Logger from "bunyan";

const streams: Logger.Stream[] = [];

// ... code omitted for brevity

const loggingBunyan = new LoggingBunyan({
  logName: "backend",
  projectId: "aram-zone",
  keyFilename: "serviceAccountKey.json",
});
const { level, type, stream } = loggingBunyan.stream("info");
streams.push({
  level,
  type,
  stream: stream.on("error", (err) => console.error("Logger error", err)),
})

export const logger = Logger.createLogger({
  name: "aram-zone",
  streams,
  serializers: Logger.stdSerializers,
});

This is what error logs look like:

caught error GoogleError: Total timeout of API google.logging.v2.LoggingServiceV2 exceeded 60000 milliseconds before any response was received.
    at repeat (/home/bitnami/aram-zone/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/home/bitnami/aram-zone/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7) {
  code: 4
}