open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

Additional system attributes #31627

Open bhupenbisht opened 7 months ago

bhupenbisht commented 7 months ago

Component(s)

Describe the issue you're reporting

Looking for system uptime metric . This metrics would provide useful context on the machine that is generating telemetry and would be useful for infrastructure monitoring.

  1. Server Uptime(since last boot)
github-actions[bot] commented 7 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

bhupenbisht commented 6 months ago

Server uptime is a crucial metrics for infrastructure observability point of view. Could anyone look into this request, as all other tool providing the same details.

github-actions[bot] commented 6 months ago

Pinging code owners for receiver/hostmetrics: @dmitryax @braydonk. See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel commented 6 months ago

This seems to fit into Host Metrics receiver more, so I have updated the label. It has been requested before (the issue was closed as inactive):

I continue to believe this is a reasonable addition. Would you be up for implementing it @bhupenbisht?

bhupenbisht commented 6 months ago

@astencel-sumo sure i would like contribute.. pls let me know, how can i..?

andrzej-stencel commented 6 months ago

@bhupenbisht You can prepare a pull request implementing the change. See the original issue for tips on how to implement this (a new system scraper in the Host Metrics receiver).

kevinnoel-be commented 5 months ago

We do have an internal receiver/scraper to gather uptime, if still interested we could push it part of the hostmetrics receiver

andrzej-stencel commented 4 months ago

@kevinnoel-be this sounds great. Is this code open source - can you point to it to take a look?

kevinnoel-be commented 4 months ago

@andrzej-stencel It is in an internal/private OTel build, so you won't be able to see it. We created a separate receiver with scraper as we cannot extend the hostmetrics receiver and we didn't want to fork it only for this metric.

I could take same time to port this, but wondering what would be an appropriate naming for this metric as I don't see much movement on https://github.com/open-telemetry/semantic-conventions/issues/648 Our internal definition for it is:

metrics:
  system.uptime:
    enabled: true
    description: The time since the system started
    unit: s
    sum:
      value_type: int
      monotonic: true
      aggregation_temporality: cumulative
andrzej-stencel commented 4 months ago

Understood, thanks for your response @kevinnoel-be.

I believe system.uptime is a good name and I think we can implement this without waiting on a semantic convention for this.

Also see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/14130 for previous discussion and considerations regarding implementing it.

andrzej-stencel commented 4 months ago

@kevinnoel-be do you want to prepare a PR adding the system.uptime metric to the Host Metrics receiver? If yes, I'll assign this issue to you.

kernelpanic77 commented 4 months ago

@andrzej-stencel I'd like to contribute as well, If it's okay with @kevinnoel-be, Can I take a shot at this?

kevinnoel-be commented 4 months ago

@kernelpanic77 Sure. If you cannot find the time, ping me back and I'll pick it up

andrzej-stencel commented 4 months ago

Sure, thanks for offering your help @kernelpanic77! Assigning this isuse to you.

mx-psi commented 4 months ago

I don't think we should add this metric to the hostmetrics receiver without adding it to semantic conventions. It's fine to add it under a feature flag as a PoC for the semantic conventions, but we must not risk deviating from semantic conventions here.

kernelpanic77 commented 4 months ago

Hi @kevinnoel-be,

I understand that the repository is internal. Could you guide me on how you implemented the uptime metric, or is there a way I can take a look at the implementation in your fork?

@andrzej-stencel @mx-psi we can create a draft PR for this until the semantic conventions is approved.

kevinnoel-be commented 4 months ago

@kernelpanic77 Created a new uptime scraper in the hostmetrics receiver using shirou/gopsutil host.UptimeWithContext() method behind the scenes

mx-psi commented 4 months ago

we can create a draft PR for this until the semantic conventions is approved.

To be clear: to my knowledge, nobody is actively working on this on the semantic conventions side. I am happy to guide you through the process if you want to contribute it yourself to semantic conventions

krantishetty commented 4 months ago

@kevinnoel-be is there any update on uptime metrics? We are looking forward

kernelpanic77 commented 4 months ago

@krantishetty Give me some time, I'm working on a draft PR.

kernelpanic77 commented 4 months ago

we can create a draft PR for this until the semantic conventions is approved.

To be clear: to my knowledge, nobody is actively working on this on the semantic conventions side. I am happy to guide you through the process if you want to contribute it yourself to semantic conventions

Sure, let's create a PR for semantic-conventions as well. Could you please guide me ? Sorry for my delayed response.

kernelpanic77 commented 4 months ago

@kernelpanic77 Created a new uptime scraper in the hostmetrics receiver using shirou/gopsutil host.UptimeWithContext() method behind the scenes

This is helpful @kevinnoel-be. thanks.

mx-psi commented 4 months ago

Sure, let's create a PR for semantic-conventions as well. Could you please guide me ? Sorry for my delayed response.

@kernelpanic77 No worries! Take a look at this recent PR that adds another metric to system metrics: open-telemetry/semantic-conventions/pull/1078. You would have to file a PR with roughly the same structure, noting that the Markdown files are autogenerated (see here how this works and how to set up your development environment).

andrzej-stencel commented 4 months ago

Previous work:

krantishetty commented 4 months ago

This is covering only process uptime, however we are looking for server uptime which can be captured from /proc/uptime. PR 2824 not covered with server uptime

kernelpanic77 commented 4 months ago

@kevinnoel-be I believe that the existing code for the processes scraper, is already calling gopsutils/host, so I think we can use the existing scraper to scrape host.uptimeWithContext().

kernelpanic77 commented 4 months ago

@krantishetty are you referring to application uptime ? Could you give an example of your usecase for more clarity.

bhupenbisht commented 4 months ago

@kernelpanic77 i believe krantishetty is talking about otel process uptime. Here we are looking for server uptime.

krantishetty commented 3 months ago

@krantishetty are you referring to application uptime ? Could you give an example of your usecase for more clarity.

Im talking about the server uptime which is last reboot of the server. Suppose if i give command # uptime, it shows server uptime since last reboot. This can read from /proc/uptime in linux machine

bhupenbisht commented 2 months ago

@kernelpanic77 any luck on system uptime ...?

krantishetty commented 1 month ago

Hi Any luck on system uptime?