thingsboard / thingsboard-pe-k8s

ThingsBoard Professional Edition Kubernetes scripts and docs
Apache License 2.0
21 stars 35 forks source link

tb-kafka-0 and tb-node-1 on CrashLoopBackOff status #48

Open hsanderr opened 2 years ago

hsanderr commented 2 years ago

I have installed TB via microservices using Azure Kubernetes service (we have followed this guide~). It worked for a few days but suddenly I wasn't not able to send HTTP requests to the platform anymore. I haven't changed any .yml file. When I run "kubectl get pods", I get:

~$ kubectl get pods
NAME                              READY   STATUS             RESTARTS         AGE
tb-http-transport-0               1/1     Running            0                2d2h
tb-http-transport-1               1/1     Running            0                2d2h
tb-js-executor-776cc56fc5-4wlns   1/1     Running            5 (2d2h ago)     2d2h
tb-js-executor-776cc56fc5-4zlt4   1/1     Running            5 (2d2h ago)     2d2h
tb-js-executor-776cc56fc5-8zds5   1/1     Running            5 (2d2h ago)     2d2h
tb-js-executor-776cc56fc5-hddnr   1/1     Running            5 (2d2h ago)     2d2h
tb-js-executor-776cc56fc5-msl4c   1/1     Running            5 (2d2h ago)     2d2h
tb-kafka-0                        0/1     CrashLoopBackOff   229 (27s ago)    2d2h
tb-node-0                         1/1     Running            3 (2d2h ago)     2d2h
tb-node-1                         0/1     CrashLoopBackOff   531 (113s ago)   2d2h
tb-web-report-5b98458947-qr5cc    1/1     Running            0                2d2h
tb-web-ui-5464b848f9-866x8        1/1     Running            0                2d2h
tb-web-ui-5464b848f9-p8x7r        1/1     Running            0                2d2h
zookeeper-0                       1/1     Running            0                2d2h
zookeeper-1                       1/1     Running            0                2d2h
zookeeper-2                       1/1     Running            0                2d2h

tb-kafka-0 logs: logs-tb-kafka-0.txt

tb-node-1 logs: logs-tb-node-1.txt

Can anyone help me with this?

polarfoxDev commented 1 year ago

We had problems with kafka as well. In our case, the storage for the "logs" volume wasn't enough. We fixed it by increasing the storage space from 200Mi to several GiBs for now (the "logs" volumeClaimTemplate in thirdparty.yml), it looks pretty stable now, but still monitoring it from time to time to see if it could get problematic again.

lks-hrsch commented 1 year ago

Are there any further investigations of the problem? We are facing the same issue at our AKS deployment of thingsboard-pe.

Because it seems very strange to me, especially when the following config is given:

value: "js_eval.requests:100:1:delete --config=retention.ms=60000 --config=segment.bytes=26214400 --config=retention.bytes=104857600,tb_transport.api.requests:30:1:delete --config=retention.ms=60000 --config=segment.bytes=26214400 --config=retention.bytes=104857600,tb_rule_engine:30:1:delete --config=retention.ms=60000 --config=segment.bytes=26214400 --config=retention.bytes=104857600"

to the line

For more information, our used storage percentage: Screenshot 2023-02-21 at 08 54 33

You can see we needed to increase the logs and the app-logs volume.

amarkevich commented 1 year ago

PR https://github.com/thingsboard/thingsboard-pe-k8s/pull/62