open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.76k stars 2.19k forks source link

[receiver/vcenter] Nested Virtual Apps Causes Collection Errors #33189

Open StefanKurek opened 2 months ago

StefanKurek commented 2 months ago

Component(s)

receiver/vcenter

What happened?

Description

While collecting under a new environment setup, I noticed collection errors for a VM that was under a nested Virtual App (a Virtual App that was underneath another Virtual App). This was the first time I had personally seen Virtual Apps nested under Virtual Apps. Errors I saw were similar to:

2024-05-22T13:01:48.234-0400    error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "vcenter", "data_type": "metrics", "error": "no inventory path found for VM [alma-3]'s collected vApp: standalone-child-vapp-0", "scraper": "vcenter"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    /opt/homebrew/pkg/mod/go.opentelemetry.io/collector/receiver@v0.101.0/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    /opt/homebrew/pkg/mod/go.opentelemetry.io/collector/receiver@v0.101.0/scraperhelper/scrapercontroller.go:173

I was able to track this down to the fact that the Virtual App data was never being scraped for the nested Virtual Apps in my environment. Looking closer at the client, I am pretty sure that this is an issue with the govmomi library.

For the ResourcePoolList function, there are options for the internal find function for Parents and Nested. Virtual Machines under nested Resource Pools don't have any of these issues in this receiver.

But in the VirtualAppList function, none of these options are present. My guess is that this is an oversight and should be corrected in the govmomi repo.

Steps to Reproduce

Collect against any vSphere environment with nested Virtual Apps.

Expected Result

No collection errors.

Actual Result

Collection errors seen for Virtual Machines under any nested Virtual Apps.

Collector version

v0.101.0

Environment information

No response

OpenTelemetry Collector configuration

extensions:
  basicauth/prom:
    client_auth:
      username: [PROMUSER]
      password: [PROMPASS]

exporters:
  prometheusremotewrite:
    endpoint: [PROMENDPOINT]
    auth:
      authenticator: basicauth/prom
    resource_to_telemetry_conversion:
      enabled: true # Convert resource attributes to metric labels

processors:
  batch:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor

receivers:
  vcenter:
    endpoint: https://[VCENTERHOST]
    username: [VCENTERUSER]
    password: [VCENTERPASS]
    tls:
      insecure: true
    collection_interval: 1m
    initial_delay: 1s

service:
  extensions: [basicauth/prom]
  pipelines:
    metrics:
      receivers: [vcenter]
      processors: [batch]
      exporters: [prometheusremotewrite]

Log output

No response

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners for receiver/vcenter: @djaglowski @schmikei @StefanKurek. See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 commented 1 month ago

Removing needs triage as issue was filed by code owner, and generally makes sense to me.