open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.19k stars 429 forks source link

Windows nodes support #642

Open Ne4to opened 2 years ago

Ne4to commented 2 years ago

Is there any plan to add Kubernetes Windows nodes support? We have an application running on Windows nodes and would like to export traces. Currently traces from the application are pushed to a Linux node using address hardcode in configuration. It would be great to have a DeamonSet with corresponding tolerations to be able to send traces to the same node.

jpkrohling commented 2 years ago

To be honest, I have no idea what it takes to get something running on a Windows node. Is it opt-in? Is it failing with a specific error? Should it use a different container? If you can scratch a PR or a set of DaemonSet manifests, we can certainly give it a try.

Ne4to commented 2 years ago

Should it use a different container?

Yes, it requires different container or adding support for multi-platform for an existing one.

> docker pull otel/opentelemetry-collector
Using default tag: latest
latest: Pulling from otel/opentelemetry-collector
image operating system "linux" cannot be used on this platform

It looks like I should start from https://github.com/open-telemetry/opentelemetry-collector/ and after a container is available I will try to add a PR to this repository.

pavolloffay commented 2 years ago

@Ne4to is this resolved?

The collector image is now published for multiple archs https://github.com/open-telemetry/opentelemetry-collector-releases/pkgs/container/opentelemetry-collector-releases%2Fopentelemetry-collector.

However, I doupt that the operator fully supports multi-arch e.g. the target allocator and instrumentation images are only published for linux.

Ne4to commented 2 years ago

No, it isn't resolved. I tried to setup it on a local environment and faced an issue. DaemonSet with hostPort does not work on Windows nodes. Based on documentation this feature should be enabled in the configuration file. Also, it requires CNI plugins v0.8.6 but we have v0.2.0 only. We are using Google Kubernetes Engine (GKE) and a cluster provisioned by Rancher on premise. They both need updating.

Dockerfile

# escape=`

FROM mcr.microsoft.com/windows/servercore:ltsc2019
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

WORKDIR 'C:\\otel'
RUN Invoke-WebRequest -Uri 'https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.41.0/otelcol_0.41.0_windows_amd64.tar.gz' -OutFile 'C:\otel\otelcol_0.41.0_windows_amd64.tar.gz'; `
    tar -xvzf  otelcol_0.41.0_windows_amd64.tar.gz; `
    Remove-Item 'otelcol_0.41.0_windows_amd64.tar.gz'

ENV NO_WINDOWS_SERVICE=1
EXPOSE 4317 4318 13133

COPY otel-collector-config.yaml .

ENTRYPOINT ["C:\\otel\\otelcol.exe", "--config=otel-collector-config.yaml"]

Deployment

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  namespace: otel
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      nodeSelector:
        kubernetes.io/os: windows
      containers:
      - command:
          - "C:\\otel\\otelcol.exe"
          - "--config=C:\\otel\\config\\otel-collector-config.yaml"
        image: private-registry/otel/windows-collector:ltsc2019
        name: otel-agent
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        ports:
        - containerPort: 55679 # ZPages endpoint.
        - containerPort: 8888  # Metrics.
        - containerPort: 13133 # health check
        - containerPort: 4317  # Default OpenTelemetry grpc receiver port.
          hostPort: 4317
        - containerPort: 4318  # Default OpenTelemetry http receiver port.
          hostPort: 4318
        - containerPort: 6831 # thrift_compact
          hostPort: 6831
          protocol: UDP
        volumeMounts:
        - name: otel-agent-config-vol
          mountPath: "C:\\otel\\config"
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-collector-config.yaml
          name: otel-agent-config-vol
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  namespace: otel
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    # How to get data into the Collector; these can be push or pull based
    receivers:
      # Data sources: traces, metrics, logs
      otlp:
        protocols:
          grpc: # port 4317
          http: # port 4318

      # Data sources: traces
      jaeger:
        protocols:
          # grpc:
          # thrift_binary:
          thrift_compact:
          # thrift_http:

    # What to do with received data
    processors:
      batch:
      memory_limiter:
        # 80% of maximum memory up to 2G
        limit_mib: 400
        # 25% of limit up to 2G
        spike_limit_mib: 100
        check_interval: 5s
      probabilistic_sampler:
        hash_seed: 22
        sampling_percentage: 15

    extensions:
      health_check:
        endpoint: "0.0.0.0:13133"
      zpages: {}
      memory_ballast:
        # Memory Ballast size should be max 1/3 to 1/2 of memory.
        size_mib: 64

    # Where to send received data; these can be push or pull based
    exporters:
      # Data sources: traces
      jaeger:
        endpoint: "jaeger-operator-jaeger-collector-headless.jaeger.svc.cluster.local:14250"
        tls:
          insecure: true
      # Data sources: metrics
      prometheus:
        endpoint: "prometheus:8889"
        namespace: "default"

    service:
      extensions: [health_check, zpages, memory_ballast]
      pipelines:
        traces/1:
          receivers: [jaeger, otlp]
          processors: [memory_limiter, batch]
          exporters: [jaeger]
templarfelix commented 2 years ago

hi guys, the feature for allowing set nodeSelector is a good idea.

sri-shetty commented 1 year ago

Is K8s windows nodes support still work in progress? or it is supported now. I don't see a docker image for windows.https://hub.docker.com/r/otel/opentelemetry-collector/tags?page=1

frzifus commented 6 months ago

hi guys, the feature for allowing set nodeSelector is a good idea.

That is available: https://github.com/open-telemetry/opentelemetry-operator/blob/dab898f6bb45d654fb138eb6c4860e15ee5eb59b/apis/v1alpha1/opentelemetrycollector_types.go#L105-L108

jaronoff97 commented 6 months ago

We discussed this at the SIG meeting. At the moment we do not have the capacity and people to investigate this seriously. If anyone in the community strongly wants this feature, we would love a contribution that has the following requirements:

This issue will remain open.