jaegertracing / jaeger-operator

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.
https://www.jaegertracing.io/docs/latest/operator/
Apache License 2.0
1k stars 340 forks source link

[Bug]: Cassandra traceTTL is ignored in Jaeger CRD #2589

Open stephen-tatari opened 1 month ago

stephen-tatari commented 1 month ago

What happened?

As a Jaeger administrator, I want to be able to set a custom TTL on my Cassandra datastore to something greater than the default (2d).

Steps to reproduce

  1. Define a Jaeger resource like the following:
    apiVersion: jaegertracing.io/v1
    kind: Jaeger
    metadata:
    name: jaeger
    spec:
    strategy: production  # creates separate pods for query and collector
    collector:
    maxReplicas: 10  # collector pod cannot scale out past this number
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/part-of: jaeger
                  app.kubernetes.io/component: collector
              topologyKey: 'kubernetes.io/hostname'
            weight: 100
    resources:
      requests:
        cpu: "250m"
        memory: "64Mi"
    options:
      collector:
        queue-size: 2000
    ingress:
    enabled: false  # don't use the default ingress, manage it separately
    storage:
    type: cassandra
    options:
      cassandra:
        servers: jaeger-cassandra-service.jaeger-system.svc
        keyspace: jaeger_v1
    secretName: cassandra-creds
    cassandraCreateSchema:
      datacenter: "cassandra"  # must match CassandraDatacenter name
      mode: "prod"  # ensures minimum replicas and replication method
      timeout: "15m"  # initial wait period for Cassandra to be ready prior to creating schema
      traceTTL: "7d"  # length of time traces will be stored before being dropped
  2. Create the resource.
  3. Observe the logs from the schema job when Jaeger first starts up, note the value of trace_ttl is still the default 172800 seconds (2 days):
    
    Checking if Cassandra is up at jaeger-cassandra-service.jaeger-system.svc:9042.

Warning: Using a password on the command line interface can be insecure. Recommendation: use the credentials file to securely provide the password.

system system_distributed system_traces system_virtual_schema system_auth system_schema system_views

Cassandra connection established.

Warning: Using a password on the command line interface can be insecure. Recommendation: use the credentials file to securely provide the password.

Cassandra version detected: 4 Generating the schema for the keyspace jaeger_v1 and datacenter cassandra. Using template file /cassandra-schema/v004.cql.tmpl with parameters: mode = prod datacenter = cassandra keyspace = jaeger_v1 replication = {'class': 'NetworkTopologyStrategy', 'cassandra': '2' } trace_ttl = 172800 dependencies_ttl = 0 compaction_window_size = 96 compaction_window_unit = MINUTES

Warning: Using a password on the command line interface can be insecure. Recommendation: use the credentials file to securely provide the password.

Schema generated.



### Expected behavior

I expected the schema job would run with an environment variable `TRACE_TTL` equal to `604800` seconds (7 days).

### Relevant log output

_No response_

### Screenshot

_No response_

### Additional context

_No response_

### Jaeger backend version

1.54.0

### SDK

OpenTelemetry Python SDK 0.48 via OpenTelemetry Collector 0.98.0

### Pipeline

Python OTEL SDK -> OTEL Collector -> Jaeger Collector -> Cassandra

### Stogage backend

Cassandra 4.1.4

### Operating system

Linux

### Deployment model

Kubernetes 1.27 running on EKS

### Deployment configs

_No response_
iblancasa commented 1 month ago

Hi @stephen-tatari thanks for reporting. Would you like to send a PR?

stephen-tatari commented 1 month ago

Looks like this is due to Argo stripping quotes from the YAML string, which causes values like 7d to be interpreted as hex values. Specifying something like 168h works.

@iblancasa I could, just not too familiar with the Go codebase to contribute something quickly.