open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.92k stars 2.28k forks source link

googlecloudpubsubreceiver : gRPC unavailable error on response #11073

Closed hex1848 closed 2 years ago

hex1848 commented 2 years ago

Describe the bug

We are running the open-telemetry-collector helm chart in our GKE kubernetes environment. I added and configured the googlecloudpubsubreceiver component to pull from a gcp pubsub subscription. It appears that authentication and the request goes through just fine, however the response always comes back with a gRPC status unavailable. The GPC metrics explorer shows that events remain in an unack'ed state.

Steps to reproduce

What did you expect to see?

What did you see instead?

2022-06-16T14:33:11.901Z    warn    internal/handler.go:135 End of recovery loop, restarting.   {"kind": "receiver", "name": "googlecloudpubsub"}
2022-06-16T14:33:11.901Z    warn    internal/handler.go:191 Request Stream loop ended.  {"kind": "receiver", "name": "googlecloudpubsub"}
2022-06-16T14:33:11.901Z    warn    internal/handler.go:180 requestStream <-ctx.Done()  {"kind": "receiver", "name": "googlecloudpubsub"}
2022-06-16T14:33:11.901Z    warn    internal/handler.go:241 Response Stream loop ended. {"kind": "receiver", "name": "googlecloudpubsub"}
2022-06-16T14:33:11.901Z    info    internal/handler.go:221 response stream breaking on gRPC s 'Unavailable'    {"kind": "receiver", "name": "googlecloudpubsub"}

What version did you use? v0.53.0

What config did you use?

config:
  receivers:
    logging:
      loglevel: debug
    googlecloudpubsub:
      subscription: projects/{PROJECT_ID}/subscriptions/otlp-metric-events-subscription
  exporters:
    logging:
      loglevel: debug
    signalfx:
      access_token: {{ "ref+vault://... }}
      access_token_passthrough: false
      realm: us1
      log_dimension_updates: true
      timeout: 10s
      max_connections: 100
  processors:
    memory_limiter:
      limit_mib: 3200
      spike_limit_mib: 1024
      check_interval: 5s
    batch/metric:
      send_batch_max_size: 200
      send_batch_size: 200
  extensions:
    memory_ballast:
      size_in_percentage: 40
  service:
    telemetry:
      logs:
        level: debug
    extensions: [health_check, memory_ballast]
    pipelines:
      metrics:
        receivers: [googlecloudpubsub]
        processors: [memory_limiter, batch/metric]
        exporters: [signalfx]

Environment OS: GKE Helm Chart: https://github.com/open-telemetry/opentelemetry-helm-charts

Additional context Add any other context about the problem here.

hex1848 commented 2 years ago

@alexvanboxel Do you have any thoughts on what might cause this?

alexvanboxel commented 2 years ago

This feels like an authentication issue. We're using our own charts, and we're sure that our workload identity is set up correctly. I'm not sure that the standard helm chart provides the correct settings for the workload identity.

hex1848 commented 2 years ago

I am also seeing this when running the 0.53.0 docker image locally. That seems to rule out GKE issues. I hadn't considered the authentication issue, but I am getting this with both the personal account and the service account.

docker run -v "${PWD}/otel-local-config.yaml":/otel-local-config.yaml -v ~/.config/:/etc/config -p 55681:55681 -e GOOGLE_APPLICATION_CREDENTIALS=/etc/config/gcloud/application_default_credentials.json otel/opentelemetry-collector-contrib:0.53.0 --config otel-local-config.yaml;

otel-local-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
  googlecloudpubsub:
    subscription: projects/{GCP_PROJECT}/subscriptions/otlp-metric-events-subscription

exporters:
  logging:

processors:
  batch:

extensions:
  health_check:
  zpages:
    endpoint: :55679

service:
  extensions: [zpages, health_check]
  pipelines:
    metrics:
      receivers: [googlecloudpubsub]
      processors: [batch]
      exporters: [logging]

Aside from pull, are there any other settings adjustments that should be made on the GCP side for the subscription?

alexvanboxel commented 2 years ago

I tried it with the docker build, but can't seem to get the same error. Tried even switching to another local gcloud auth application-defaults (with my gmail account) and then I really get explicit permission denied. I haven't tried removing the fine graned access on the topic (as I can't remove myself).

Maybe try:

I verified on a Mac with Docker Desktop

dmitryax commented 2 years ago

cc @alexvanboxel as code owner

alexvanboxel commented 2 years ago

@hex1848 can we close this? I can't reproduce your issue.

luk-ada commented 1 month ago

Hi @hex1848 were you able to fix this issue? I have the same error on the 0.106.0. I'm trying to stream logs from PubSub to log processing engine which is hosted on Azure. Error occurs on the VM or pod running on the AKS cluster. To authenticate I'm using secret.json file with service account details.

2024-07-31T07:09:27.712Z    info    internal/handler.go:210    response stream breaking on gRPC s 'Unavailable'    {"kind": "receiver", "name": "googlecloudpubsub/14", "data_type": "logs"}
2024-07-31T07:09:27.713Z    warn    internal/handler.go:230    Response Stream loop ended.    {"kind": "receiver", "name": "googlecloudpubsub/14", "data_type": "logs"}
2024-07-31T07:09:27.713Z    warn    internal/handler.go:169    requestStream <-ctx.Done()    {"kind": "receiver", "name": "googlecloudpubsub/14", "data_type": "logs"}
2024-07-31T07:09:27.716Z    warn    internal/handler.go:180    Request Stream loop ended.    {"kind": "receiver", "name": "googlecloudpubsub/14", "data_type": "logs"}
2024-07-31T07:09:27.716Z    warn    internal/handler.go:124    End of recovery loop, restarting.    {"kind": "receiver", "name": "googlecloudpubsub/14", "data_type": "logs"}
2024-07-31T07:09:27.967Z    info    internal/handler.go:106    Starting Streaming Pull    {"kind": "receiver", "name": "googlecloudpubsub/14", "data_type": "logs"}