open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.86k stars 2.24k forks source link

New Component: `ethtool receiver` #9593

Closed MovieStoreGuy closed 11 months ago

MovieStoreGuy commented 2 years ago

The purpose and use-cases of the new component

when monitor network bound compute nodes, it becomes important to understand per network interface statistics to check for saturation.

Example configuration for the component

Looking to make transitioning from the telegraf agent to Open Telemetry collector simple to understand.

I would like it to be something similar:

ethtool:
   collection_interval: 10s
   interfaces:
     include:
     - pattern1
     - pattern2
    exclude:
    - lo0

In the event that a conflict was to happen, the exclude definition will override any include patterns.

The default configuration will monitor all interfaces excluding the loopback interface.

Telemetry data types supported

Metrics Only

Sponsor (Optional)

Looking for a sponsor 🙏🏽

Open to any suggestions.

codeboten commented 2 years ago

@MovieStoreGuy thanks for proposing the component. Could this receiver become a scraper in hostmetrics receiver or do you think it's specific enough that it should be a separate receiver?

MovieStoreGuy commented 2 years ago

That sounds like a reasonable way forward, I don't intend for it to do much more than what is described here:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-network-performance.html

However, I wanted it to be adoptable by any infrastructure vendor

jamesmoessis commented 2 years ago

Having it as a scraper in the hostmetrics receiver could make sense. It would be linux-only, but everything that ethool queries I believe is kernel-level, so it seems general enough.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

atoulme commented 1 year ago

What metrics would you capture?

bmcalary-atlassian commented 1 year ago
[ec2-user ~]$ ethtool -S eth0
bw_in_allowance_exceeded: 0
bw_out_allowance_exceeded: 0
pps_allowance_exceeded: 0
conntrack_allowance_exceeded: 0
linklocal_allowance_exceeded: 0

From https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html

To know when an EC2 instance in AWS has hit otherwise hidden AWS imposed network allowances/limits.

for posterity, copied from that document:

bw_in_allowance_exceeded The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
bw_out_allowance_exceeded The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.
conntrack_allowance_exceeded The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance.
linklocal_allowance_exceeded The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service.
pps_allowance_exceeded The number of packets queued or dropped because the bidirectional PPS exceeded the maximum for the instance.
github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 11 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.

diranged commented 2 months ago

This should get re-opened I think - these are critical metrics I was surprised to find are not supported yet in the hostmetrics receiver. I do think that's where it should go though.

diranged commented 2 months ago

@atoulme Is there a process for asking for this to be re-opened and evaluated?