googleapis / google-cloud-ruby

Google Cloud Client Library for Ruby
https://googleapis.github.io/google-cloud-ruby/
Apache License 2.0
1.35k stars 546 forks source link

Stackdriver and Google Cloud Logging not using new Kubernetes Engine Monitoring resource types #5111

Open ppdeassis opened 4 years ago

ppdeassis commented 4 years ago

The problem: after deploying a Rails app to GKE, I can't filter Google Cloud Logging by resource (Kubernetes container) to find its specific log entries (although I can find it by using advanced query filtering with logName alone).

Environment details

Steps to reproduce

  1. Deploy a Rails app to GKE (1.14.10-gke.17) - using Kubernetes Engine Monitoring - with the stackdriver gem to enable Google Cloud Logging (and other services)
  2. Access Google Cloud Logging > Logs Viewer tool and search for the rails app log entries by filtering by resource (Kubernetes container)

You'll find out that the monitored resource metadata are somehow incomplete and do not follow the new resource type values. So, if you try to filter the logs by resource (Kubernetes container), the search will be empty.

I'll try to list the resource values from log entries below to show you what I think is the problem:

Expected values (my guess, after reading the docs):

resource: {
  type: "k8s_container",
  labels: { 
    project_id: "your-project-id-here",
    cluster_name: "your-cluster-name-here",
    container_name: "your-container-name-here",
    namespace_id: "your-namespace-here"
  }
}

Actual values (found on Logs Viewer):

resource: {
  type: "container"
  labels: {
    pod_id: ""
    zone: ""
    project_id: "your-project-id-here"
    cluster_name: "your-cluster-name-here"
    container_name: ""
    namespace_id: "your-namespace-here"
    instance_id: ""
  }
}

As you can see:

Code example

Full backtrace

N/A


ps: I was in doubt to open a feature request or a bug request. Let me know if I can change anything in the description to make it more meaningful.

quartzmo commented 4 years ago

@ppdeassis Thank you for opening this issue. I agree, it's hard to know how to categorize this right now (we will investigate), so I'll start by labeling it as a question.

quartzmo commented 4 years ago

@ppdeassis I'm pretty certain that the resource structure in Middleware.default_monitored_resource has simply fallen out of sync with the service definition, and that it needs to be updated. It may be a week or two before we have time to work on it. Do you have an interest in attempting a pull request to fix this?

ppdeassis commented 4 years ago

Well, I started looking into it but I think Google::Cloud.env needs more data/metadata in order to discover if running app is in a Kubernetes Cluster using Kubernetes Engine Monitoring (or not), because it must stay as it is now to work on clusters using "Legacy Logging and Monitoring".

If you can point me in the right direction on how to get these data/metadata I can give it a try in the meantime.

quartzmo commented 4 years ago

I read through Migrating to Kubernetes Engine Monitoring but didn't see anything directly helpful. If anyone can suggest how to detect if Kubernetes Engine Monitoring is being used, please let us know.

simonz130 commented 3 years ago

@dazuma, could you please take a look?

matteo-rossi-wise commented 3 years ago

@dazuma I'm experiencing the same issue, still no fix after 9 months?

matteo-rossi-wise commented 3 years ago

TL;DR: remove "stackdriver" gem if you're on GKE

@ppdeassis lesson learned: If you're running your app on GKE you simply need to print to $stdout (text or JSON format) and the underlying custom fluentd agent will take care of the rest (like exporting STDOUT logs to Logs Viewer in the correct namespace)

Note: This will not solve the issue described in the main thread but can save you from headache if you need logging working in your cluster

ppdeassis commented 3 years ago

@matteo-rossi-wise yeah! we ended up doing that too (removing stackdriver gem and adapting our loggers to STDOUT).

nordringrayhide commented 2 years ago

Any updates? After 2 years the issue is still there.