googleapis / nodejs-logging

Node.js client for Stackdriver Logging: Store, search, analyze, monitor, and alert on log data and events from Google Cloud Platform and Amazon Web Services (AWS).
https://cloud.google.com/logging/
Apache License 2.0
173 stars 62 forks source link

Logging is holding up the request thread? #24

Closed stephenplusplus closed 6 years ago

stephenplusplus commented 7 years ago

From @bmenasha on June 27, 2017 20:41

var winston = require('winston');
require('@google-cloud/logging-winston');
winston.add(transport, {projectId: 'redacted', level: 'debug', name: 'mylogger',  logName: 'nodjs.log'});

when I examine the requests in cloud trace it's clear that the request is being held up by calls to the logging API, worse, there are MULTIPLE calls to the cloud logging api in a single request. This results in some pretty poor performance. Consider:

![selection_999 136](https://user-images.githubusercontent.com/6843728/27608499-c7ec1e3e-5b55-11e7-8946-e2402e38c3fb.png)

Here is a stackdtrace from one of these logging calls:

clientMethodTrace [as writeLogEntries] (node_modules/@google-cloud/trace-agent/src/plugins/plugin-grpc.js:67:22)
<anonymous function> (node_modules/@google-cloud/logging/src/v2/logging_service_v2_client.js:137:51)
timeoutFunc [as _apiCall] (node_modules/google-gax/lib/api_callable.js:177:12)
Task.run [as run] (node_modules/google-gax/lib/bundling.js:202:29)
BundleExecutor._runNow [as _runNow] (node_modules/google-gax/lib/bundling.js:451:8)
<anonymous function> (node_modules/google-gax/lib/bundling.js:390:12)
_onTimeout (node_modules/async-listener/glue.js:188:31)
ontimeout (timers.js:488:11)
tryOnTimeout (timers.js:323:5)
listOnTimeout (timers.js:283:5)

Perhaps this is due to the VERY aggressive 10ms bundling in the gax config file:

        "WriteLogEntries": {
          "timeout_millis": 30000,
          "retry_codes_name": "non_idempotent",
          "retry_params_name": "default",
          "bundling": {
            "element_count_threshold": 100,
            "request_byte_threshold": 1024,
            "delay_threshold_millis": 10
          }
        },

I think we should have better defaults (batch/bundle more) and also find a way to send these log (and trace) requests asynchronously so they don't block the current request from returning. Thanks

Copied from original issue: GoogleCloudPlatform/google-cloud-node#2416

stephenplusplus commented 7 years ago

From @lukesneeringer on July 5, 2017 15:14

@bmenasha Thank you for reporting this (and sorry it took so long to get back to you on it).

I am guessing delay_threshold_millis should have been 10,000 instead of 10, but I am not the most familiar with this module. Would that make sense in your mind?

I also agree that this should not block.

stephenplusplus commented 7 years ago

From @bmenasha on July 6, 2017 3:48

Would that make sense in your mind?

Sure.

stephenplusplus commented 7 years ago

@lukesneeringer should we go ahead and update this value, or is there someone we need to ping first?

stephenplusplus commented 7 years ago

From @landrito on August 9, 2017 22:28

Updating this gax value would involve changing it at the generator config level for logging. I'd like to invite @pongad to chime in on changing this to 10,000 from 10.

stephenplusplus commented 7 years ago

From @pongad on August 10, 2017 5:6

I'm not familiar with how bundling is implemented in node. I'm a little confused that it blocks. Node in single threaded, so if you block waiting for either 100 elements in the batch or 10ms to pass before sending the request, you'll always wait 10ms since nothing can add more elements into the batch right? Did I misunderstand?

Or maybe we're waiting to send one batch because there's another batch pending or something...?

stephenplusplus commented 7 years ago

From @ofrobots on August 14, 2017 18:37

A few points here:

@pongad Node is concurrent through asynchronous APIs. Network calls do not block user code, but you can write code that waits for the network call to finish before proceeding.

stephenplusplus commented 7 years ago

From @ofrobots on August 14, 2017 18:43

@bmenasha I also noticed that your logging calls seem very expensive – that doesn't look normal, unless you are logging large amounts of data. Can you verify that is the case? Also, are you using the logging library directly or are you using the higher level libraries (google-cloud/logging-winston, google-cloud/logging-bunyan)?

stephenplusplus commented 7 years ago

@bmenasha @ofrobots just for clarity-- is there a bug here?

stephenplusplus commented 7 years ago

From @ofrobots on September 1, 2017 22:37

I don't believe there is a bug here, but It would be good to get input from @bmenasha to make sure I am not making any incorrect assumptions.

stephenplusplus commented 6 years ago

I'm going to close, given the age of the issue. If there is still work left to be done, please let me know, and we will re-open.