open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.2k stars 436 forks source link

VPA does not work when using DaemonSet mode #2605

Closed mcanevet closed 8 months ago

mcanevet commented 8 months ago

Component(s)

No response

What happened?

I created an OpenTelemetryCollector in DaemonSet mode to collect my logs:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: logs
spec:
  config: ...
  env:
    - name: KUBE_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
  mode: daemonset
  priorityClassName: system-cluster-critical
  resources:
    limits:
      memory: 128Mi
    requests:
      cpu: 10m
      memory: 128Mi
  tolerations:
    - operator: Exists
  volumeMounts:
    - mountPath: /var/log
      name: varlog
      readOnly: true
  volumes:
    - hostPath:
        path: /var/log
      name: varlog

And a VerticalPodAutoscaler in order to automatically adjust the resources based on the actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: logs-collector
spec:
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        controlledResources:
          - cpu
          - memory
  targetRef:
    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    name: logs
  updatePolicy:
    updateMode: Auto

But I get this error message in the status of my VerticalPodAutoscaler:

Cannot read targetRef. Reason: Unhandled targetRef
opentelemetry.io/v1alpha1 / OpenTelemetryCollector / logs, last error
Resource monitoring/logs has an empty selector for scale sub-resource

I have another OpenTelemetryCollector in mode Deployment and another one in StatefulSet and both are working fine with VerticalPodAutoscaler. It looks like the issue is limited to DaemonSet mode.

Kubernetes Version

1.29.0

Operator version

0.92.0

Collector version

0.92.0

Environment information

No response

Log output

No response

Additional context

No response

yuriolisa commented 8 months ago

@mcanevet, thank you for raising this issue. Did you check if the VPA was deployed on the same namespace of OpenTelemetryCollector? I'm asking that due to the check of VPA does .

mcanevet commented 8 months ago

@yuriolisa yes, I actually have 3 OpenTelemetryCollector: one for logs with DaemonSet, one for metrics with StatefulSet and one for traces with Deployment) with the respective VPA in the same Namespace, and the only one that does not work is the one using DaemonSet.

jaronoff97 commented 8 months ago

this may be because we aren't setting the selector for daemonset, which we probably can just go ahead and do.

rivToadd commented 8 months ago

I got it

mcanevet commented 8 months ago

Probably related. For Deployment and StatefulSet I have the field status.scale.selector properly set, but this field does not exist for DaemonSet.

yuriolisa commented 8 months ago

Just a heads-up #1779

rivToadd commented 8 months ago

Will finish this up tom. Sorry for delay.

Get Outlook for iOShttps://aka.ms/o0ukef

Rivian Internal


From: Yuri Sa @.> Sent: Friday, February 23, 2024 3:08:47 PM To: open-telemetry/opentelemetry-operator @.> Cc: Todd Yan @.>; Assign @.> Subject: [EXTERNAL] Re: [open-telemetry/opentelemetry-operator] VPA does not work when using DaemonSet mode (Issue #2605)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Just a heads-up #1779https://urldefense.com/v3/__https://github.com/open-telemetry/opentelemetry-operator/pull/1779__;!!DhY4MeCLhgc!a56-N7odWe2AfxYP4AtR9lpI5GmI5HnNo-8P5RNNf40cRacakjiwvp5_1V6oR2_SvXYWYWR5qGFsTTblJRp0$

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/open-telemetry/opentelemetry-operator/issues/2605*issuecomment-1962117624__;Iw!!DhY4MeCLhgc!a56-N7odWe2AfxYP4AtR9lpI5GmI5HnNo-8P5RNNf40cRacakjiwvp5_1V6oR2_SvXYWYWR5qGFsTU8VDhX1$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AUBUHYQXAE5NMY4XSYQGKMDYVEOP7AVCNFSM6AAAAABC4BDDAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGEYTONRSGQ__;!!DhY4MeCLhgc!a56-N7odWe2AfxYP4AtR9lpI5GmI5HnNo-8P5RNNf40cRacakjiwvp5_1V6oR2_SvXYWYWR5qGFsTQ5fRP1w$. You are receiving this because you were assigned.Message ID: @.***>

__ CONFIDENTIALITY NOTE: This electronic message (including any attachments) may contain information that is privileged, confidential, and proprietary. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this electronic message in error, please immediately reply to the sender that you have received this communication and destroy the material in its entirety, whether in electronic or hard copy format. Although Rivian has taken reasonable precautions to ensure no viruses are present in this email, Rivian accepts no responsibility for any loss or damage arising from the use of this email or attachments.

rivToadd commented 8 months ago

https://github.com/open-telemetry/opentelemetry-operator/pull/2659

I've got this MR on the above issue