Closed jvoravong closed 1 year ago
Pinging code owners for receiver/k8scluster: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.
It is also on the Collector version: v0.73.0
and it is not only for the HPA... it is also related to the v1beta1.CronJob
See Example of my Logfile. splunk-otel-collector-agent-96r7z-splunk-otel-collector-agent.log
@AchimGrolimund can you please provide more details about your Kubernetes environment?
I didn't see this issue in my Kops created Kubernetes 1.25 cluster. We have support for batchv1.CronJob so I'm wondering how this is happening.
Hello @jvoravong We are using ROSA 4.12
https://docs.openshift.com/container-platform/4.12/release_notes/ocp-4-12-release-notes.html
Next week, i can provide more infos.
We are using the splunk-otel-collector v0.72.0
Gesendet von Outlook für iOShttps://aka.ms/o0ukef
Von: jvoravong @.> Gesendet: Friday, April 7, 2023 4:20:08 PM An: open-telemetry/opentelemetry-collector-contrib @.> Cc: Achim Grolimund @.>; Mention @.> Betreff: Re: [open-telemetry/opentelemetry-collector-contrib] [receiver/k8scluster] Use newer v2 HorizontalPodAutoscaler for Kubernetes 1.26 (Issue #20480)
@AchimGrolimundhttps://github.com/AchimGrolimund can you please provide more details about your Kubernetes environment?
I didn't see this issue in my Kops created Kubernetes 1.25 cluster. We have support for (batchv1.CronJob:)[https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/315fdf3e571088c855f359b85e79cfd6d3ad9e50/receiver/k8sclusterreceiver/internal/collection/collector.go#L136] so I'm wondering how this is happening.
— Reply to this email directly, view it on GitHubhttps://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20480#issuecomment-1500333061, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFIBOX72FYLSMKX4SGAEBX3XAAPBRANCNFSM6AAAAAAWMC3A6Q. You are receiving this because you were mentioned.Message ID: @.***>
I can help supporting HorizontalPodAutoscaler v2
@jvoravong Sorry for my late reply.
We are currently using the following version: https://github.com/signalfx/splunk-otel-collector/releases/tag/v0.76.0
$ oc version
Client Version: 4.12.0-202303081116.p0.g846602e.assembly.stream-846602e
Kustomize Version: v4.5.7
Server Version: 4.12.11
Kubernetes Version: v1.25.7+eab9cc9
and here still the logs:
...
2023-05-03T10:45:44.563Z info service/service.go:129 Starting otelcol... {"Version": "v0.76.0", "NumCPU": 16}
....
W0503 10:45:48.056292 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
E0503 10:45:48.056337 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v2beta1.HorizontalPodAutoscaler: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
W0503 10:45:49.019103 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1beta1.CronJob: the server could not find the requested resource
E0503 10:45:49.019186 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1beta1.CronJob: failed to list *v1beta1.CronJob: the server could not find the requested resource
W0503 10:45:53.008856 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
E0503 10:45:53.008902 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v2beta1.HorizontalPodAutoscaler: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
W0503 10:45:53.133807 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1beta1.CronJob: the server could not find the requested resource
E0503 10:45:53.133863 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1beta1.CronJob: failed to list *v1beta1.CronJob: the server could not find the requested resource
W0503 10:45:59.810228 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1beta1.CronJob: the server could not find the requested resource
E0503 10:45:59.810287 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1beta1.CronJob: failed to list *v1beta1.CronJob: the server could not find the requested resource
W0503 10:45:59.818576 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
E0503 10:45:59.818624 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v2beta1.HorizontalPodAutoscaler: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
W0503 10:46:16.106509 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1beta1.CronJob: the server could not find the requested resource
E0503 10:46:16.106555 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1beta1.CronJob: failed to list *v1beta1.CronJob: the server could not find the requested resource
Can we expect a solution soon?
What is supported is batchv1.CronJob
, but the question v1beta1.CronJob
and v2beta1.HorizontalPodAutoscaler
are taken care of in the code.
please provide an ETA
Here some additional Informations:
$ oc get apirequestcounts -o jsonpath='{range .items[?(@.status.removedInRelease!="")]}{.status.removedInRelease}{"\t"}{.metadata.name}{"\n"}{end}' | sort
1.25 cronjobs.v1beta1.batch
1.25 horizontalpodautoscalers.v2beta1.autoscaling
1.26 horizontalpodautoscalers.v2beta2.autoscaling
Looking into this, will get back here soon.
Thanks @jvoravong I am the support engineer on this CASE 3182925, appreciate your help on this.
I did miss adding a watcher for the HPA v2 code. Got a fix started for it. I verified k8s.hpa. and k8s.job. metrics are exported in Kubernetes 1.25 and 1.26. Couldn't get the HPA warnings to stop though on 1.25 even with this last fix, I think it's due to how we watch for both versions of HPA.
Couldn't get the HPA warnings to stop though on 1.25 even with this last fix, I think it's due to how we watch for both versions of HPA.
That's fine. We have the same for jobs when both versions supported by the k8s API
Closing as resolved by https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/21497
@AchimGrolimund, looking at the log output splunk-otel-collector-agent-96r7z-splunk-otel-collector-agent.log, it seems like the errors are coming from smartagent/openshift-cluster
not from k8scluster
receiver. Do you have k8scluster
receiver enabled in the collector pipelines?
Hey @dmitryax Here is our Configmap:
---
# Source: splunk-otel-collector/templates/configmap-agent.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: splunk-otel-collector-agent-configmap
namespace: xxxxxxxx-splunk-otel-collector
labels:
app: splunk-otel-collector-agent
data:
relay: |
exporters:
sapm:
access_token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
endpoint: https://xxxxxx:443/ingest/v2/trace
signalfx:
access_token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
api_url: https://xxxxxxx:443/api/
correlation: null
ingest_url: https://xxxxxxx:443/ingest/
sync_host_metadata: true
extensions:
health_check: null
k8s_observer:
auth_type: serviceAccount
node: ${K8S_NODE_NAME}
memory_ballast:
size_mib: ${SPLUNK_BALLAST_SIZE_MIB}
zpages: null
processors:
batch: null
filter/logs:
logs:
exclude:
match_type: strict
resource_attributes:
- key: splunk.com/exclude
value: "true"
groupbyattrs/logs:
keys:
- com.splunk.source
- com.splunk.sourcetype
- container.id
- fluent.tag
- istio_service_name
- k8s.container.name
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.uid
k8sattributes:
extract:
annotations:
- from: pod
key: splunk.com/sourcetype
- from: namespace
key: splunk.com/exclude
tag_name: splunk.com/exclude
- from: pod
key: splunk.com/exclude
tag_name: splunk.com/exclude
- from: namespace
key: splunk.com/index
tag_name: com.splunk.index
- from: pod
key: splunk.com/index
tag_name: com.splunk.index
labels:
- key: app
metadata:
- k8s.namespace.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- container.id
- container.image.name
- container.image.tag
filter:
node_from_env_var: K8S_NODE_NAME
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: ip
- sources:
- from: connection
- sources:
- from: resource_attribute
name: host.name
memory_limiter:
check_interval: 2s
limit_mib: ${SPLUNK_MEMORY_LIMIT_MIB}
resource:
attributes:
- action: insert
key: k8s.node.name
value: ${K8S_NODE_NAME}
- action: upsert
key: k8s.cluster.name
value: HCP-ROSA-PROD1
resource/add_agent_k8s:
attributes:
- action: insert
key: k8s.pod.name
value: ${K8S_POD_NAME}
- action: insert
key: k8s.pod.uid
value: ${K8S_POD_UID}
- action: insert
key: k8s.namespace.name
value: ${K8S_NAMESPACE}
resource/logs:
attributes:
- action: upsert
from_attribute: k8s.pod.annotations.splunk.com/sourcetype
key: com.splunk.sourcetype
- action: delete
key: k8s.pod.annotations.splunk.com/sourcetype
- action: delete
key: splunk.com/exclude
resourcedetection:
detectors:
- env
- ec2
- system
override: true
timeout: 10s
receivers:
smartagent/openshift-cluster:
type: openshift-cluster
alwaysClusterReporter: true
kubernetesAPI:
authType: serviceAccount
datapointsToExclude:
- dimensions:
metricNames:
- '*appliedclusterquota*'
- '*clusterquota*'
extraMetrics:
- kubernetes.container_cpu_request
- kubernetes.container_memory_request
- kubernetes.job.completions
- kubernetes.job.active
- kubernetes.job.succeeded
- kubernetes.job.failed
hostmetrics:
collection_interval: 10s
scrapers:
cpu: null
disk: null
filesystem: null
load: null
memory: null
network: null
paging: null
processes: null
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
kubeletstats:
auth_type: serviceAccount
collection_interval: 10s
endpoint: ${K8S_NODE_IP}:10250
extra_metadata_labels:
- container.id
metric_groups:
- container
- pod
- node
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus/agent:
config:
scrape_configs:
- job_name: otel-agent
scrape_interval: 10s
static_configs:
- targets:
- 127.0.0.1:8889
receiver_creator:
receivers:
smartagent/coredns:
config:
extraDimensions:
metric_source: k8s-coredns
port: 9154
skipVerify: true
type: coredns
useHTTPS: true
useServiceAccount: true
rule: type == "pod" && namespace == "openshift-dns" && name contains "dns"
smartagent/kube-controller-manager:
config:
extraDimensions:
metric_source: kubernetes-controller-manager
port: 10257
skipVerify: true
type: kube-controller-manager
useHTTPS: true
useServiceAccount: true
rule: type == "pod" && labels["app"] == "kube-controller-manager" && labels["kube-controller-manager"]
== "true"
smartagent/kubernetes-apiserver:
config:
extraDimensions:
metric_source: kubernetes-apiserver
skipVerify: true
type: kubernetes-apiserver
useHTTPS: true
useServiceAccount: true
rule: type == "port" && port == 6443 && pod.labels["app"] == "openshift-kube-apiserver"
&& pod.labels["apiserver"] == "true"
smartagent/kubernetes-proxy:
config:
extraDimensions:
metric_source: kubernetes-proxy
#port: 29101
port: 9101
useHTTPS: true
skipVerify: true
useServiceAccount: true
type: kubernetes-proxy
rule: type == "pod" && labels["app"] == "sdn"
smartagent/kubernetes-scheduler:
config:
extraDimensions:
metric_source: kubernetes-scheduler
# port: 10251
port: 10259
type: kubernetes-scheduler
useHTTPS: true
skipVerify: true
useServiceAccount: true
rule: type == "pod" && labels["app"] == "openshift-kube-scheduler" && labels["scheduler"]
== "true"
watch_observers:
- k8s_observer
signalfx:
endpoint: 0.0.0.0:9943
smartagent/signalfx-forwarder:
listenAddress: 0.0.0.0:9080
type: signalfx-forwarder
zipkin:
endpoint: 0.0.0.0:9411
service:
extensions:
- health_check
- k8s_observer
- memory_ballast
- zpages
pipelines:
metrics:
exporters:
- signalfx
processors:
- memory_limiter
- batch
- resourcedetection
- resource
receivers:
- hostmetrics
- kubeletstats
- otlp
- receiver_creator
- signalfx
- smartagent/openshift-cluster
metrics/agent:
exporters:
- signalfx
processors:
- memory_limiter
- batch
- resource/add_agent_k8s
- resourcedetection
- resource
receivers:
- prometheus/agent
traces:
exporters:
- sapm
- signalfx
processors:
- memory_limiter
- k8sattributes
- batch
- resourcedetection
- resource
receivers:
- otlp
- jaeger
- smartagent/signalfx-forwarder
- zipkin
telemetry:
metrics:
address: 127.0.0.1:8889
Best Regards Achim
@AchimGrolimund Thank you. This is coming from smartagent/openshift-cluster
. So it's unrelated to this issue and has to be solved separately. @jvoravong can you please follow up on this? I'm not sure if we have an OTel native receiver to replace it with
Looks like k8scluster receiver supports scraping additional OpenShift metrics https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver#openshift, but it should be run separately as 1-replica deployment. @AchimGrolimund did you try it by chance?
Just to add, in case of Azure you will not be able to upgrade from 1.25. to 1.26. as the agent is still querying the v2beta2 autoscaler API. As Azure prevents upgrading when deprecated API's are still being used the upgrade fails. You either have to force the upgrade, or remove the signalfx agent, wait for 12hours and then try again.
Would be nice if the agent checks the kubernetes version, if higher then 1.25 then do not monitoring the /apis/autoscaling/v2beta2/horizontalpodautoscalers
api endpoint.
The customer xxx updated the Splunk OTC agent to version 0.77.0 and still gets the same error messages.
W0522 06:11:24.226426 1 reflector.go:533] k8s.io/client-go@v0.27.1/tools/cache/reflector.go:231mailto:[k8s.io/client-go@v0.27.1/tools/cache/reflector.go:231](https://k8s.io/client-go@v0.27.1/tools/cache/reflector.go:231): failed to list v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource 129E0522 06:11:24.226454 1 reflector.go:148] k8s.io/client-go@v0.27.1/tools/cache/reflector.go:231mailto:[k8s.io/client-go@v0.27.1/tools/cache/reflector.go:231](https://k8s.io/client-go@v0.27.1/tools/cache/reflector.go:231): Failed to watch v2beta1.HorizontalPodAutoscaler: failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource
Update on Deprecated Endpoint Removal:
Additional Context:
Component(s)
receiver/k8scluster
What happened?
Description
Right now we only support v2beta2 HPA. To support Kubernetes v1.26, we need to add support for v2 HPA. Kubernetes v1.26 was released in December 2022. This version is still new and distributions like AKS, EKS, Openshift, and GKE will start using it soon (if not already).
Related Startup Log Warning Message: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler `
Steps to Reproduce
Spin up a Kubernetes 1.25 cluster. Deploy the k8scluster receiver to your cluster. Follow the startup logs of the collector and you will notice the error log mentioned above.
Expected Result
The k8scluster can monitor v2 HorizontalPodAutoscaler objects.
Actual Result
In Kubernetes 1.25, you get a warning within the collector logs. In Kubernetes 1.26, you will get an error in the logs and users might notice HPA metrics are missing that they were expecting.
Collector version
v0.72.0
Environment information
Environment
Will affect all Kubernetes 1.26 cluseters. I tested and found the related log warnings in Rosa 4.12 (Openshift 4.12, Kubernetes 1.25).
OpenTelemetry Collector configuration
Log output
Additional context
Related to: https://github.com/signalfx/splunk-otel-collector/issues/2457