jaegertracing / jaeger-operator

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.
https://www.jaegertracing.io/docs/latest/operator/
Apache License 2.0
1.03k stars 345 forks source link

CR failures should be reported via status object #467

Open jkandasa opened 5 years ago

jkandasa commented 5 years ago

This issue somehow related to #466, however this may need a different fix

Invalid CR files error should be reported to the user console. And should not keep on trying to deploy the CR file.

I changed the boolean value to string (eg: true >> "true") and integer to string (ex: 10 >> "10") deployed the CR file. on the console, it looks all ok. But on the operator-log keep on throwing an exception.

Expected behavior:

No exception on the console.

$ oc create -f jaeger-services-final.yaml 
jaeger.jaegertracing.io/jaegerqe created
  storage:
    type: elasticsearch
    esIndexCleaner:
      enabled: true
      schedule: "*/10 * * * *"
      numberOfDays: "2"

operator log file: jaeger-operator-545c69455d-4clq5.log

CR file:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: "jaegerqe"
spec:
  ingress:
    security: none
  strategy: production
  collector:
    replicas: 1
    image: jaegertracing/jaeger-collector:1.12
    resources:
      requests:
        memory: "512Mi"
        cpu: "1"
      limits:
        memory: "512Mi"
        cpu: "1"
    options:
      log-level: info
      metrics-backend: prometheus
      collector:
        num-workers: 1
        queue-size: 20000
      es:
        bulk:
          size: 524288
          workers: 1
          flush-interval: 200ms
        tags-as-fields:
          all: false
  query:
    replicas: 1
    image: jaegertracing/jaeger-query:1.12
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"
    options:
      log-level: "info"
      metrics-backend: prometheus
      query:
        port: 16686
      es:
        timeout: 10s
  agent:
    strategy: sidecar
    image: jaegertracing/jaeger-agent:1.12
    resources:
      requests:
        cpu: "200m"
        memory: "128Mi"
      limits:
        cpu: "200m"
        memory: "128Mi"
    options:
      log-level: info
      metrics-backend: prometheus
      processor:
        jaeger-compact:
          server-queue-size: 10000
          workers: 10
  storage:
    type: elasticsearch
    esIndexCleaner:
      enabled: true
      schedule: "*/10 * * * *"
      numberOfDays: "2"      
    dependencies:
      enabled: false
    elasticsearch:
      image: quay.io/openshift/origin-logging-elasticsearch5:latest
      nodeCount: 3
      resources:
jpkrohling commented 5 years ago

But on the operator-log keep on throwing an exception.

What's the exception?

jkandasa commented 5 years ago

@jpkrohling Operator log is attached on the initial comment. E0607 06:27:47.976474 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190507014756-65905f29c17c/tools/cache/reflector.go:93: Failed to list *v1.Jaeger: v1.JaegerList.Items: []v1.Jaeger: v1.Jaeger.Spec: v1.JaegerSpec.Storage: v1.JaegerStorageSpec.EsIndexCleaner: v1.JaegerEsIndexCleanerSpec.NumberOfDays: readUint64: unexpected character: �, error found in #10 byte of ...|rOfDays":"2","schedu|..., bigger context ...|,"esIndexCleaner":{"enabled":true,"numberOfDays":"2","schedule":"*/10 * * * *"},"type":"elasticsearc|...

jpkrohling commented 5 years ago

Sorry, I missed that. Whenever possible, let's try to keep the error messages in the issue itself, so that people searching for the error message can find this bug report.

jpkrohling commented 5 years ago

I changed the scope of this issue a bit, but the idea is the same: users creating a CR should be able to see the error messages that happened during the process. Storing the error in a status object allows it to be viewed via kubectl describe jaeger

esnible commented 2 years ago

I recommend a Validating Webhook rather than admitting the jaegertracing.io/v1 Jaeger and then setting its Status.

The only time it is reasonable to set the Status is if the status can't be checked at admission time. (For example, a Jaeger can have a priorityClassName: collector-high-priority but at admission time the scheduling.k8s.io/v1 PriorityClass might not exist yet.). The only way to implement this is to watch for changes on the CR and every kind of object the CR might refer to, which is why this kind of checking is rarely implemented. (In Istio that kind of checking is opt-in because of performance problems.)