googleapis / nodejs-logging-bunyan

Node.js client integration between Stackdriver Logging and Bunyan.
https://cloud.google.com/logging/
Apache License 2.0
63 stars 34 forks source link

Duplicate http request entries when running in Cloud Run #626

Closed fpbouchard closed 2 years ago

fpbouchard commented 2 years ago

Hi,

My application is deployed as a container on Cloud Run, and I was observing duplicate request entries in the logs explorer:

Screen Shot 2022-05-09 at 07 40 20

Digging in the code, I found out that the library omits the duplicate logs when running in app engine and cloud functions, so I forked the library and added the case for cloud run, and it fixed my issue:

https://github.com/valian-ca/nodejs-logging-bunyan/commit/0a175adc55c0efb774424f6e671fec91243e0b05

Would this be the correct fix for this?

Thanks!

minherz commented 2 years ago

hi @fpbouchard, thank you for taking time and investigating the issue. the provided commit is very helpful in understanding the problem. i changed the issue type to be the question for now, but we can revert it back if we decide on moving forward with the change. following the provided information the duplication you describe is between the infrastructure and application log entries. it means that while one entry is ingested by the application, another is written by Cloud Run. such duplication can be easily avoided by configuring the log level in the middleware options to be above "info" (e.g. "warning" or "error"). do you use other features of the middleware or you are looking for a specific behavior out-of-the-box?

@losalex can further comment about the options of using the middleware module.

fpbouchard commented 2 years ago

Thanks @minherz,

If I understand correctly, I should raise my log level to avoid the issue, but if I do that, this will also remove the log emitted by the infrastructure since it is also an info log.

The commit I provided just adds Cloud Run as another infrastructure where the log is emitted by the infrastructure itself.

Apart from that, the middleware does what it needs to do, thanks!

minherz commented 2 years ago

Thank you. Can you elaborate what other logs you think will be missed if you increase the middleware log threshold? Your commit extends the existing patching of the behavior for AppEngine and Cloud Functions. However, these patches aren't necessary. Middleware serves as a local (application layer) proxy to send requests. It should not produce any other logs beside information about intercepted http requests. I believe @losalex will be able to elaborate more about it.

fpbouchard commented 2 years ago

Oh ok I understand what you mean now. The middleware does not produce any other logs, but its logger is attached to the express request, and by using this logger in my app I get trace correlation (logs grouped in the logs explorer). This is the reason why I use the middleware.

minherz commented 2 years ago

I was glad to help. I will close this issue for now. Feel free to open new issues if you find a problem or reopen this one if you need a follow up.

fpbouchard commented 2 years ago

Not sure why the issue has been closed. My proposed change fixes the issue that I have. I can use the logger attached to the Express Request and no duplicate log is sent. It is exactly the same case described in the comment above the code where I added the CLOUD_RUN case:

  // Unless we are running on Google App Engine or Cloud Functions, generate a
  // parent request log entry that all the request-specific logs ("app logs")
  // will nest under. GAE and GCF generate the parent request logs
  // automatically.

The modified comment could read like this:

  // Unless we are running on Google App Engine, Cloud Functions or Cloud Run, generate a
  // parent request log entry that all the request-specific logs ("app logs")
  // will nest under. GAE, GCF and GCR generate the parent request logs
  // automatically.
minherz commented 2 years ago

My apologies. I've got an impression that the proposed solution was satisfying. I agree that your proposed change fixes your issue. However, it introduces a change in the behavior of the library that is used by many other developers and might be a breaking change to some of them. The requested behavior can be achieved by configuring the log severity threshold above info level. This change disables emitting logs from the middleware module. In this use case the configuration setup is a favorable solution for the described problem.

losalex commented 2 years ago

@fpbouchard, feel free to reactivate this issue. I believe that your fix could be checked in, but it requires a configuration value controlling if your fix is enabled or not. By default such flag should be off. As mentioned earlier, we don't want other users to be impacted by your change by default

fpbouchard commented 2 years ago

I see. It does introduce a new behavior, so I get why you would want a configuration value. Is the only issue backward compatibility?

I might be missing something here, I'm trying to see what is different with the App Engine & Cloud Functions cases that do not apply to Cloud Run?

If the answer is "yes it is somewhat a bug but we need a configuration value to fix it because it would change the behavior" then I get it, you can ignore what follows, which is me explaining the issue even more.


My reasoning here is that if you have to configure the log severity to disable logs being emitted by the middleware module, why would you use the middleware module in the first place? And while I do understand that creating the logger with a level of warn or above would prevent this, as I understand the source code, it also prevents us from actually using the info level, which is an undesirable side-effect. It's an "all or nothing" situation.

This middleware ultimately does this:

  1. Create a pre-configured logger to attach the correct trace/span/sample keys to all its entries and child loggers entries
  2. (through makeMiddleware) Attach this logger to the express request so the app code can actually use it to log events correctly correlated to the trace/span/sample keys <-- this is the feature we need/use
  3. (optionally, on non "GCP-serverless-environments" that already produce a parent request) Produce a parent log entry for the request

My point here is that 3) is flawed and is missing a serverless environment in its condition. By digging in the code, I can see that this bit of code was actually written before Cloud Run was released, so it is understandable that it is missing.

Thanks for you responses!

losalex commented 2 years ago

Thanks @fpbouchard for your responses and understanding! Definitely the backward compatibility is an issue here, since the feature is already released and might be used as is today by other users which could be impacted if we take your fix as is. You raised very good points in your explanation and apparently indeed there were some misalignment between the Cloud Run release and a fix which was never done... Thats a reason why configuration value controlling parent record generation only for Cloud Run would be needed here since unfortunately we cannot very efficiently reach out to all library users and ask them if they would be OK with a change... Thats said, we could also consider to make this change without configuration parameter during next major release (which assumes that developers would need to change their code to integrate latest library), but as of now we do not have specific date in mind. So, if you need this change now, feel free to add a config value to [MiddlewareOptions] (https://github.com/googleapis/nodejs-logging-bunyan/blob/babd2a1fee916b37cff93ce30af7b358ff866bf7/src/middleware/express.ts#L37) class (the parameter name could be addParentEntryForCloudRun?) which could be used for Cloud Run only and make it by default to be false. If you add this change, upon next major release we can remove the config. As of now I am going to reactivate this work item, but make it priority 3 - if we will have more users asking for same fix, we definitely will raise the priority. Please let me know if you have more questions or comments - really appreciate your inputs!

wvanderdeijl commented 2 years ago

I've submitted a PR with the requested changes since we suffer from this same issue. As a (dirty) workaround we added the following to our codebase:

// workaround for https://github.com/googleapis/nodejs-logging-bunyan/issues/626 to let lib know we are running in GCP
if (process.env.K_SERVICE && !process.env.GAE_SERVICE) {
    // trick library into thinking we run on AppEngine
    process.env.GAE_SERVICE = process.env.K_SERVICE;
}

But I am not sure if that will lead to other issues. This library uses the google-auth-library to do environment detection. I am not sure if google-auth-library or other google libraries will do weird stuff if we trick them into thinking we run on AppEngine instead of Cloud Run