Enhance `kubeletstatsreceiver` to scrape non-standard endpoints

asweet-confluent commented 1 year ago

Component(s)

receiver/kubeletstats

Is your feature request related to a problem? Please describe.

kubeletstatsreceiver scrapes kubelet's /metrics Prometheus endpoint, but kubelet also exports other metrics at non-standard endpoints:

/metrics/cadvisor
/metrics/resource
/metrics/probes

Describe the solution you'd like

kubeletstatsreceiver should be enhanced to scrape those other endpoints. This is what's done by Datadog's kubelet integration - see the config here.

I've compiled a list of metrics from the source code as well as direct queries to the endpoints. Note that this may not be an exhaustive list:

Metric List

```sh # Log metrics kubelet_container_log_filesystem_used_bytes # Resource metrics node_cpu_usage_seconds_total node_memory_working_set_bytes container_cpu_usage_seconds_total container_memory_working_set_bytes pod_cpu_usage_seconds_total pod_memory_working_set_bytes scrape_error container_start_time_seconds # Volume metrics volume_stats_capacity_bytes volume_stats_available_bytes volume_stats_used_bytes volume_stats_inodes volume_stats_inodes_free volume_stats_inodes_used volume_stats_health_status_abnormal node_startup_pre_kubelet_duration_seconds node_startup_pre_registration_duration_seconds node_startup_registration_duration_seconds node_startup_post_registration_duration_seconds node_startup_duration_seconds pod_worker_duration_seconds pod_start_duration_seconds pod_start_sli_duration_seconds cgroup_manager_duration_seconds pod_worker_start_duration_seconds pod_status_sync_duration_seconds pleg_relist_duration_seconds pleg_discard_events pleg_relist_interval_seconds pleg_last_seen_seconds evented_pleg_connection_error_count evented_pleg_connection_success_count evented_pleg_connection_latency_seconds evictions eviction_stats_age_seconds preemptions running_pods running_containers desired_pods active_pods mirror_pods working_pods orphaned_runtime_pods_total restarted_pods_total # Metrics keys of remote runtime operations runtime_operations_total runtime_operations_duration_seconds runtime_operations_errors_total # Metrics keys of device plugin operations device_plugin_registration_total device_plugin_alloc_duration_seconds # Metrics keys of pod resources operations pod_resources_endpoint_requests_total pod_resources_endpoint_requests_list pod_resources_endpoint_requests_get_allocatable pod_resources_endpoint_errors_list pod_resources_endpoint_errors_get_allocatable pod_resources_endpoint_requests_get pod_resources_endpoint_errors_get # Metrics keys for RuntimeClass run_podsandbox_duration_seconds run_podsandbox_errors_total # Metrics to keep track of total number of Pods and Containers started started_pods_total started_pods_errors_total started_containers_total started_containers_errors_total # Metrics to track HostProcess container usage by this kubelet started_host_process_containers_total started_host_process_containers_errors_total # Metrics to track ephemeral container usage by this kubelet managed_ephemeral_containers # Metrics to track the CPU manager behavior cpu_manager_pinning_requests_total cpu_manager_pinning_errors_total # Metrics to track the Topology manager behavior topology_manager_admission_requests_total topology_manager_admission_errors_total topology_manager_admission_duration_ms # Metrics to track orphan pod cleanup orphan_pod_cleaned_volumes orphan_pod_cleaned_volumes_errors # Metric list directly from /metrics/cadvisor cadvisor_version_info container_cpu_cfs_periods_total container_cpu_cfs_throttled_periods_total container_cpu_cfs_throttled_seconds_total container_cpu_load_average_10s container_cpu_system_seconds_total container_cpu_usage_seconds_total container_cpu_user_seconds_total container_file_descriptors container_fs_inodes_free container_fs_inodes_total container_fs_io_current container_fs_io_time_seconds_total container_fs_io_time_weighted_seconds_total container_fs_limit_bytes container_fs_read_seconds_total container_fs_reads_merged_total container_fs_reads_total container_fs_sector_reads_total container_fs_sector_writes_total container_fs_usage_bytes container_fs_write_seconds_total container_fs_writes_merged_total container_fs_writes_total container_last_seen container_memory_cache container_memory_failcnt container_memory_failures_total container_memory_mapped_file container_memory_max_usage_bytes container_memory_rss container_memory_swap container_memory_usage_bytes container_memory_working_set_bytes container_network_receive_bytes_total container_network_receive_errors_total container_network_receive_packets_dropped_total container_network_receive_packets_total container_network_transmit_bytes_total container_network_transmit_errors_total container_network_transmit_packets_dropped_total container_network_transmit_packets_total container_oom_events_total container_processes container_sockets container_spec_cpu_period container_spec_cpu_quota container_spec_cpu_shares container_spec_memory_limit_bytes container_spec_memory_reservation_limit_bytes container_spec_memory_swap_limit_bytes container_start_time_seconds container_tasks_state container_threads container_threads_max container_ulimits_soft machine_cpu_cores machine_cpu_physical_cores machine_cpu_sockets machine_memory_bytes machine_nvm_avg_power_budget_watts machine_nvm_capacity # Metric list directly from /metrics/probes prober_probe_duration_seconds_bucket prober_probe_duration_seconds_count prober_probe_duration_seconds_sum prober_probe_total ```

Describe alternatives you've considered

As a workaround, you can configure Prometheus scrape jobs to hit those endpoints. This is not ideal because kubeletstatsreceiver renames the default metric attributes, e.g. namespace becomes k8s.namespace.name. Mixing kubeletstatsreceiver and Prometheus scrape jobs would create disjointed label sets unless you add a separate processing step that renames them.

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

receiver/kubeletstats: @dmitryax @TylerHelmuth

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth commented 1 year ago

@asweet-confluent sounds like a reasonable idea to me. Can you provide in this issue the metrics we'd be collecting? Are there any important differences between those endpoints and the stats/summary data we collect today?

asweet-confluent commented 1 year ago

Can you provide in this issue the metrics we'd be collecting?

I updated the issue description with the raw metric names, presumably kubeletstatsreceiver will rename them to be in line with the k8s. metric naming scheme.

Are there any important differences between those endpoints and the stats/summary data we collect today?

As noted in the K8S docs:

Those metrics do not have the same lifecycle.

I think the cadvisor metrics come directly from cadvisor itself so that makes sense.

github-actions[bot] commented 10 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/kubeletstats: @dmitryax @TylerHelmuth

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 8 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.

cmergenthaler commented 8 months ago

@asweet-confluent Can you please reopen the issue? I think the missing metrics are really necessary in order to make the receiver complete

diranged commented 6 months ago

Agreed - can this get re-opened @asweet-confluent?

ChrsMark commented 4 months ago

Aren't some metrics listed in the Metric List provided in the issue description already provided by the receiver?

If I remember correctly, the kubelet's /stats/summary endpoint provides metrics that can be partially coming from cAdvisor. There is also the option to collect directly from the CRI but it's still behind a feature flag: https://kubernetes.io/docs/reference/instrumentation/cri-pod-container-metrics/

We can consider getting additional metrics from other endpoints, but I believe we should be selective to metrics that are actually important. Once we have this specific list of metrics we could gradually start discussing them as part of the https://github.com/open-telemetry/semantic-conventions/issues/1032 as well.

On a slightly different note there were several discussions around these endpoints over the past years, so we would need to verify we are aligned with the most recent update. Some refs:

/cc @dashpole

dashpole commented 4 months ago

I don't think we should support scraping prometheus endpoints in the kubelet stats receiver.

You can see the proposal behind the CRI-direct feature here: https://github.com/kubernetes/enhancements/tree/6f648005d3b10d9c24984d139f96077f720726f7/keps/sig-node/2371-cri-pod-container-stats

That would be a good option to consider after it graduates to beta.

diranged commented 4 months ago

I don't think we should support scraping prometheus endpoints in the kubelet stats receiver.

Can you elaborate on why? I like the CRI active approach for sure - but I see that as unrelated to fully supporting kubelet stats. The kubelet stats approach is generic and easier to implement on the operator side (less permissions/volume mount configuration)..

dashpole commented 4 months ago

The prometheus receiver already supports the endpoints in question. Given how large the Prometheus ecosystem is, it doesn't seem sustainable to have specific receivers to translate from prometheus conventions to OTel conventions for each source of Prometheus metrics.

diranged commented 4 months ago

The prometheus receiver already supports the endpoints in question. Given how large the Prometheus ecosystem is, it doesn't seem sustainable to have specific receivers to translate from prometheus conventions to OTel conventions for each source of Prometheus metrics.

Given that - I might argue that the kubelet receiver then should be deprecated. I think it's worse to have half a solution than no dedicated solution at all.

I do like the idea of using the kubelet receiver because it's simpler to configure and standardized the metric names that are exported into something otel specific though..

alexgenon commented 3 months ago

Hi everyone, While it's indeed possible to scrape those metrics using the Prometheus receiver (and that's how we are currently scraping them), we're not fully satisfied with this approach. We end-up having timeseries in Prometheus/OpenMetric format converted to OpenTelemetry (which is not straightforward) and we miss the opportunity to have a native OpenTelemetry metrics where we can better structure the Resource Attributes (instead of only relying on the target_info metric) and we can enforce the semantic convention.

Receivers such as kubeletstat or k8scluster are perfect fit for a robust collection of k8s metrics.

I agree with @diranged's on the fact that the kubeletstat receiver (but also the k8scluster receiver) are limiting in their current states. Deprecating them might be too radical as they provide an easy way to get started by k8s o11y. But we should at least document their limitations and recommend going with the Prometheus scraping if more metrics are required.

ChrsMark commented 3 months ago

My question from https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26719#issuecomment-2178250983 is still valid here:

I think we still miss a well defined proposal which lists specific metrics that are not provided by the kubeletstats receiver (which scrapes the /stats/summary endpoint).

In addition, I think I'd agree with what @dashpole mentioned at https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26719#issuecomment-2186931634. The kubeletstats receiver scrapes a specific endpoint offering a selective set of metrics today. I'm not sure if expanding to scraping one or more additional endpoints is a good choice here. I'm not sure if it's done in other receivers but this can be problematic when it comes to maintenance, deprecation handling etc. So if we really need to collect these metrics maybe we need to find a way to differentiate their collection either in a standalone different receiver component or by splitting in multiple scrapers like it's done in the hostmetricsreceiver.

Last but not least, standardizing a prometheus based input on top of the prometheus receiver sounds like a good example for https://github.com/open-telemetry/opentelemetry-collector/issues/8372.

alexgenon commented 2 months ago

Sorry for my late reply. Thanks @ChrsMark for the detailed answer. I'll try to give a naïve user point on view.

I agree with you that we should start by listing which metrics we miss with the kubeletstats and k8scluster receivers and make sure they are part of the semantic convention. We'll do this exercise within my team and maybe post them as a comment on https://github.com/open-telemetry/semantic-conventions/issues/1032. What do you think ?

As for the way it should be implemented, multiple scrapers on the same receiver would make sense. We're also using the hostmetricsreceiver and we're happy with the way it works.

Our objectives is to use OpenTelemetry as much as possible within our observability pipeline to avoid any conversion issue. Scraping the Prometheus endpoint and having the collector doing the conversion to otlp can be cumbersome. In the current situation, we have some metrics coming via the kubeletstats receivers and others via the scraping of Prometheus endpoints on kubelet and cadvisor, this hybrid solution requires extra efforts during the setup of a monitoring solution.

ChrsMark commented 2 months ago

We'll do this exercise within my team and maybe post them as a comment on https://github.com/open-telemetry/semantic-conventions/issues/1032. What do you think ?

That would make sense. You could also create a standalone issue to propose this new batch of metrics and link back to https://github.com/open-telemetry/semantic-conventions/issues/1032 (we can use that issue as a meta issue).

TBH though, regarding the implementation we would need to think of the details thoroughly. As I mentioned already maybe the work for supporting templates on https://github.com/open-telemetry/opentelemetry-collector/issues/8372 can help here.

github-actions[bot] commented 4 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/kubeletstats: @dmitryax @TylerHelmuth @ChrsMark

See Adding Labels via Comments if you do not have permissions to add labels yourself.

open-telemetry / opentelemetry-collector-contrib