kubewharf / kelemetry

Global control plane tracing for Kubernetes
Apache License 2.0
249 stars 29 forks source link

ERROR: Failed to init storage factory #128

Closed jackwillsmith closed 10 months ago

jackwillsmith commented 1 year ago

Steps to reproduce

  1. git clone respository
  2. edit values.yaml replicas=1
  3. change templates stroageClass to local kuberneter default stroageClass
  4. helm install

Expected behavior

all pod is running

Actual behavior

all pods are running but frontend

kelemetry-1689762474-collector-79f5c59df4-52q4z   1/1     Running            2 (16h ago)   16h
kelemetry-1689762474-consumer-b88789bb4-5kzbf     1/1     Running            0             16h
kelemetry-1689762474-etcd-0                       1/1     Running            0             10m
kelemetry-1689762474-frontend-755b8f47ff-rl4jl    0/2     CrashLoopBackOff   8 (8s ago)    2m14s
kelemetry-1689762474-informers-76fb5d4458-s5ftw   1/1     Running            0             16h
kelemetry-1689762474-storage-0                    1/1     Running            0             16h

error.log

{"level":"warn","ts":1689822124.0339894,"caller":"channelz/funcs.go:342","msg":"[core][Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"localhost:17271\",\n  \"ServerName\": \"localhost:17271\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:17271: connect: connection refused\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822124.034001,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822124.0340135,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000196540, {TRANSIENT_FAILURE connection error: desc = \"transport: Error while dialing dial tcp [::1]:17271: connect: connection refused\"}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822124.0340188,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1586947,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1587367,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1587467,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel deleted","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1587508,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel deleted","system":"grpc","grpc_log":true}
{"level":"fatal","ts":1689822126.1587627,"caller":"./main.go:107","msg":"Failed to init storage factory","error":"grpc-plugin builder failed to create a store: error connecting to remote storage: context deadline exceeded","stacktrace":"main.main.func1\n\t./main.go:107\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t./main.go:170\nruntime.main\n\truntime/proc.go:250"}

Kelemetry version

c42c0ff010c570a984663c8911568f3fa05e5ee7

Environment

kubernetes version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:40:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:45Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

cloud provider: local vm

Jaeger version: jaegertracing/jaeger-collector:1.42 use deployment collector of Kelemetry

storage: custom nfs storage

SOF3 commented 1 year ago

Can you show the logs of the storage plugin container?

kubectl logs kelemetry-1689762474-frontend-755b8f47ff-rl4jl -c storage-plugin

By the way, if you already have an ElasticSearch cluster set up, you are encouraged to configure Kelemetry to use it instead of the single-instance Badger DB.

jackwillsmith commented 1 year ago
k -n kelemetry logs -f kelemetry-1689762474-frontend-755b8f47ff-rl4jl  -c storage-plugin
time="2023-07-20T05:00:57Z" level=error msg="unknown flag: --trace-server-enable"

and no ElasticSearch cluster set up my k8s cluster.

jackwillsmith commented 1 year ago

this pod is running when i move --trace-server-enable. Why is this incorrect parameter present in the official charts templates file?

SOF3 commented 1 year ago

What image version are you using for the frontend pod? The latest version has the --trace-server-enable flag.

jackwillsmith commented 1 year ago

kelemetry images version: ghcr.io/kubewharf/kelemetry:0.1.0, and when i move --trace-server-enable flag, the jeager is not trace k8s deployment.

SOF3 commented 1 year ago

Try using 0.2.2 instead. 0.1.0 is an old version.

jackwillsmith commented 1 year ago

this work after instead image version, but have other issue

{"level":"info","ts":1689835956.4600453,"caller":"channelz/funcs.go:340","msg":"[core][Server #4 ListenSocket #7] ListenSocket created","system":"grpc","grpc_log":true}
{"level":"info","ts":1689835956.4600942,"caller":"app/server.go:282","msg":"Starting HTTP server","port":16686,"addr":":16686"}
{"level":"info","ts":1689835957.464222,"caller":"channelz/funcs.go:340","msg":"[core][Channel #5 SubChannel #6] Subchannel Connectivity change to IDLE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689835957.4643304,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc00059c078, {IDLE connection error: desc = \"transport: Error while dialing dial tcp :16685: connect: connection refused\"}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689835957.4643457,"caller":"channelz/funcs.go:340","msg":"[core][Channel #5] Channel Connectivity change to IDLE","system":"grpc","grpc_log":true}

jeagerUi cannot get the tracing deployment change.

SOF3 commented 1 year ago

Are you also using v0.2.2 of the helm chart?

jackwillsmith commented 1 year ago

No specific version When using helm install, only the charts that commit c42c0ff010c570a984663c8911568f3fa05e5ee7 are used

SOF3 commented 1 year ago

Try using the latest version (v0.2.2).

jackwillsmith commented 1 year ago

how specific version(v0.2.2) When using helm install?

SOF3 commented 1 year ago

Set kelemetryImage.tag in values.yaml to 0.2.2 in the helm chart. Or just install oci://ghcr.io/kubewharf/kelemetry-chart:0.2.2 directly, which already indicates 0.2.2 image by default.

jackwillsmith commented 1 year ago

had specific kelemetryImage.tag was v0.2.2, and add --trace-server-enable .All pod are running ,but get error log

{"level":"info","ts":1689835956.4600453,"caller":"channelz/funcs.go:340","msg":"[core][Server #4 ListenSocket #7] ListenSocket created","system":"grpc","grpc_log":true}
{"level":"info","ts":1689835956.4600942,"caller":"app/server.go:282","msg":"Starting HTTP server","port":16686,"addr":":16686"}
{"level":"info","ts":1689835957.464222,"caller":"channelz/funcs.go:340","msg":"[core][Channel #5 SubChannel #6] Subchannel Connectivity change to IDLE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689835957.4643304,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc00059c078, {IDLE connection error: desc = \"transport: Error while dialing dial tcp :16685: connect: connection refused\"}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689835957.4643457,"caller":"channelz/funcs.go:340","msg":"[core][Channel #5] Channel Connectivity change to IDLE","system":"grpc","grpc_log":true}

jeagerUi cannot get the tracing deployment change.

SOF3 commented 1 year ago

Can you check the frontend pod logs? Might be a duplciate of #127

jackwillsmith commented 1 year ago

frontend logs

{"level":"info","ts":1689835955.4305463,"caller":"grpclog/component.go:71","msg":"[core]Creating new client transport to \"{\\n  \\\"Addr\\\": \\\"localhost:17271\\\",\\n  \\\"ServerName\\\": \\\"localhost:17271\\\",\\n  \\\"Attributes\\\": null,\\n  \\\"BalancerAttributes\\\": null,\\n  \\\"Type\\\": 0,\\n  \\\"Metadata\\\": null\\n}\": connection error: desc = \"transport: Error while dialing dial tcp [::1]:17271: connect: connection refused\"","system":"grpc","grpc_log":true}
{"level":"warn","ts":1689835955.4305613,"caller":"channelz/funcs.go:342","msg":"[core][Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"localhost:17271\",\n  \"ServerName\": \"localhost:17271\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:17271: connect: connection refused\"","system":"grpc","grpc_log":true}

but storage-plugin is running not error

jackwillsmith commented 1 year ago

Can you check the frontend pod logs? Might be a duplciate of #127

not similar this

SOF3 commented 1 year ago

Could you check the logs of the storage-plugin container? k logs deploy/kelemetry-frontend -c storage-plugin

SOF3 commented 10 months ago

closed as stale due to lack of response