kubewharf / kelemetry

Global control plane tracing for Kubernetes
Apache License 2.0
252 stars 28 forks source link

frontend: Why the service deployment resource uses 2 identical storage services and wants to be optimized #129

Closed jackwillsmith closed 1 year ago

jackwillsmith commented 1 year ago

Description

frontend deployment mainfest

...
      - command:
        - /usr/local/bin/kelemetry
        - --log-level=info
        - --pprof-enable=true
        - --jaeger-backend=jaeger-storage
        - --jaeger-cluster-names=cluster1
        - --jaeger-redirect-server-enable=true
        - --jaeger-storage-plugin-address=:17271   # localhost:17271
        - --jaeger-storage-plugin-enable=true
        - --jaeger-storage.grpc-storage.server=kelemetry-1689762474-storage.kelemetry.svc:17271   # storage-svc
        - --jaeger-storage.span-storage.type=grpc-plugin
        - --jaeger-trace-cache=etcd
        - --jaeger-trace-cache-etcd-endpoints=kelemetry-1689762474-etcd.kelemetry.svc:2379
        - --jaeger-trace-cache-etcd-prefix=/trace/
        - --trace-server-enable=true
        image: ghcr.io/kubewharf/kelemetry:0.1.0
...

User story

It is convenient for users to be more familiar with the source code of the project

SOF3 commented 1 year ago

See USAGE.txt:

      --jaeger-storage-plugin-address string                                 storage plugin grpc server bind address (default ":17271")
      --jaeger-storage.grpc-storage.server string                            The remote storage gRPC server address as host:port

and the diagram in DEPLOY.md:

image

jaeger-storage-plugin-address is the address that the storage plugin listens on, to serve requests from "Jaeger Query UI". In the case of helm chart, "Jaeger Query UI" and "Kelemetry storage plugin" are deployed as sidecar containers of the same pod, so this is always :17271 (I think we could make this localhost:17271 since sidecar containers are on the same network stack, but it is not necessary to change this for now).

The options starting with --jaeger-storage.{SPAN_STORAGE_TYPE}.* are options that determine how "Kelemetry storage plugin" connects to "Jaeger storage". In the case of helm chart with Badger DB, since frontend pods are stateless but Badger is a single-instance database, we need to run the database in a single-pod statefulset so that multiple frontend instances access the Badger volume through the same process (see https://www.jaegertracing.io/docs/1.47/deployment/#remote-storage-component for explanation):

graph LR
  subgraph frontend-pod-0
    jaeger-query-0 --> storage-plugin-0
  end
  subgraph frontend-pod-1
    jaeger-query-1 --> storage-plugin-1
  end
  subgraph frontend-pod-2
    jaeger-query-2 --> storage-plugin-2
  end
  storage-plugin-0 --> remote-badger
  storage-plugin-1 --> remote-badger
  storage-plugin-2 --> remote-badger
  subgraph badger [badger node]
    remote-badger --> badger-volume
  end

We cannot directly let jaeger-query-* access remote-badger because remote-badger is a native Jaeger image that does not know how to perform trace transformation, but we cannot directly let storage-plugin-* access badger-volume because that would cause concurrent access to the same badger DB from multiple processes.

If you use a distributed database instead of Badger, the helm chart will no longer generate the kelemetry-storage StatefulSet but call the database directly.