googleapis / nodejs-logging-winston

Node.js client integration between Stackdriver Logging and Winston.
https://cloud.google.com/logging/
Apache License 2.0
106 stars 50 forks source link

Requests are held when exceeded a certain call rate/seconds #707

Open jRichardeau opened 2 years ago

jRichardeau commented 2 years ago

Hello,

I'm using @google-cloud/logging-winston and the express middleware for logging purpose.

When using it on production within a service that receives more than 100 requests/second I see a huge increase of memory usage and latency of the service.

Moreover when configuring the transport with redirectToStdout option the problem is not there.

I think I've identified who is holding the requests but I don't know why, it happens in the readable-stream module. If we add a console.log in this file /node_modules/readable-stream/lib/_stream_writable.js in the writeOrBuffer function like this console.log('bufferedRequestCount', state.bufferedRequestCount);, the value of state.bufferedRequestCount is continually growing and so holding all the requests in memory.

I think the pain threshold is about 50 requests/s, under that it's ok.

You can reproduce the issue with the example above.

Environment details

Steps to reproduce

  1. Create an express application.
  2. Use @google-cloud/logging-winston middleware for express
  3. Call a endpoint with a request rate > 50/s

Example code of express application

const express = require('express')
const winston = require('winston');
const lw = require('@google-cloud/logging-winston');
const port = 3000

const logger = winston.createLogger();

async function init() {
  const mw = await lw.express.makeMiddleware(logger);

  const app = express()
  app.use(mw);

  app.get('/hello', (req, res) => {
    req.log.info('Hello');
    res.send('Hello World!')
  });

  app.listen(port, () => {
    console.log(`App listening on port ${port}`)
  });
}

init();

I'm using Artillery to call the endpoint with a certain request rate, this is the yml configuration file :

config:
  target: "http://localhost:3000"
  phases:
    - duration: 60
      arrivalRate: 50
      name: Warm up

scenarios:
  - name: "Calling express"
    flow:
      - get:
          url: "/hello"

When stopping the calls the "buffer" is emptied but in production the calls never end 😄 so the memory and latency are increasing.

Thank you for your help

jRichardeau commented 2 years ago

Hi, any news on this ? Thanks

losalex commented 2 years ago

Thanks @jRichardeau for opening this issue and really sorry for late response. I wonder if this issue still relevant - can you please let me know if you still facing issues if you upgrade to the latest @google-cloud/logging-winston?

jRichardeau commented 2 years ago

Thank your for your reply @losalex but after testing again i'm still facing the same issues even when upgrading @google-cloud/logging-winston and winston to the latest versions.

NB: You may have to change the arrivalRate to 100 in the example above to reproduce.

losalex commented 2 years ago

Thanks @jRichardeau for response - correct me if I wrong, seems that there was slight improvement with respect of overall rate throughput, but still not enough. One more question - seems you perform testing on your local Mac, correct? If yes - I wonder how much powerful it is and whether you had any chance to test it in other environments like Google Cloud.

jRichardeau commented 2 years ago

There may be a slight improvement, but do you really think it's a power issue? The problem is not that the buffer fills up faster than it empties, but it never empties when there are too many requests, it just grows.

But ok I will try it on GCP.

Here is my local config :

image

losalex commented 2 years ago

@jRichardeau, thanks for answer - yes, I wanted to eliminate any factors related to HW or connectivity since those usually have the major impact on performance. Also correct me if I wrong, but readable-stream problem you are raising is not related to logs entries upload, but rather to middleware interaction, correct?

jRichardeau commented 2 years ago

I can't tell you much more about where the issue comes from, I've already spent some time trying to figure out if the problem is in my code or not and it seems that it's not the case.

The only thing I can say is that "when configuring the transport with redirectToStdout option the problem is not there" so it seems to come from uploading the log entries but can't be sure about it.

Does it happen only when using the middleware ? Probably yes but it would have to be tested.

Did you manage to reproduce the problem with the example above?

jRichardeau commented 1 year ago

Hello,

I finally tested it on GCP and I encountered the same performance issue. We so forced the redirectToStdout option on all our services with the new useMessageField: false option.

mochsner commented 1 year ago

We seem to be running into a similar issue with our platform. On GCP we have found that when we use GCP as a sink with either extremely large objects (and/or circular objects), the process eventually crashes with a heap overflow. This occured more frequently on GCP (App Engine) than locally, but we were eventually able to reproduce it locally by pointing to a GCP App Engine logging destination manually.

new LoggingWinston({
            labels: {
                name: "local",
            },
            projectId: process.env.PROJECT_ID,
            keyFilename: process.env.GCLOUD_APPLICATION_DEFAULT_CREDENTIALS,
            defaultCallback: (err) => {
                if (err) {
                    console.error(`GcpLogger Error occurred: ${err}`, err);
                }
            },
        })

We've gone through and reduces our overall object sizes for logs, but we are not comfortable enabling Winston sending to GCP directly at the moment, so we are needing to use console as a sink, which has impacts on performance since my understanding is that console.log is fully synchronous (even on AppEngine).

Notably:

  1. We did not have any issues or memory leaks when JUST using Winston sending over Serilog's SEQ data sink.
  2. We did have the cloudTrace.start() running for a while (import * as cloudTrace from '@google-cloud/trace-agent'; on our servers, but I believe we were running into it even after this was disabled.
cindy-peng commented 3 months ago

We seem to be running into a similar issue with our platform. On GCP we have found that when we use GCP as a sink with either extremely large objects (and/or circular objects), the process eventually crashes with a heap overflow. This occured more frequently on GCP (App Engine) than locally, but we were eventually able to reproduce it locally by pointing to a GCP App Engine logging destination manually.

new LoggingWinston({
          labels: {
              name: "local",
          },
          projectId: process.env.PROJECT_ID,
          keyFilename: process.env.GCLOUD_APPLICATION_DEFAULT_CREDENTIALS,
          defaultCallback: (err) => {
              if (err) {
                  console.error(`GcpLogger Error occurred: ${err}`, err);
              }
          },
      })

We've gone through and reduces our overall object sizes for logs, but we are not comfortable enabling Winston sending to GCP directly at the moment, so we are needing to use console as a sink, which has impacts on performance since my understanding is that console.log is fully synchronous (even on AppEngine).

Notably:

  1. We did not have any issues or memory leaks when JUST using Winston sending over Serilog's SEQ data sink.
  2. We did have the cloudTrace.start() running for a while (import * as cloudTrace from '@google-cloud/trace-agent'; on our servers, but I believe we were running into it even after this was disabled.

Hi @mochsner, Are you also using express middleware? Is this issue still happening for you?

cindy-peng commented 3 months ago

Hi @mochsner, It looks like the issue you are describing here is a little different (Wondering if you are using middleware and have high QPS). Could you open a new issue a latest repro environment if it is still happening for you?