thingsboard / thingsboard-edge

Apache License 2.0
93 stars 71 forks source link

Tb edge No Root Rule Chain available! #105

Closed aistisdev closed 3 months ago

aistisdev commented 3 months ago

Describe the bug Tb edge node rule chain fails to process telemetry with exception "No Root Rule Chain available!"

Your Server Environment

To Reproduce The tb edge node statefulset crashed because of failing to validate a valid license against the thingsboard license server (bad gateway 502 error), after k3s restarted the pod 13 times, it started to validate normally again, but the rule chain will not process any messages. I tried to restart the pod and redis, but that did not help.

Expected behavior The rule chain should work

Screenshots image

Additional context The tb-edge pod was working normally for over a month since it was first setup, but then suddenly crashed due to license check error. This has not happened before and we had no other problems with it previously.

Downlinks from the cloud instance are working, I changed one device attributes in the cloud and these changes were synchronized with the edge.

Here is the configuration file:

k3s-tb-edge.txt

I also have a full log of the tb edge pod, but don't want to post the whole content here, as it has keys, device id's etc. However here is the very first exception after starting the pod:

2024-04-19 10:56:28,449 [grpc-default-executor-0] INFO org.owasp.validator.html.Policy - Attempting to load AntiSamy policy from an input stream. 2024-04-19 10:56:28,721 [grpc-default-executor-0] WARN org.owasp.validator.html.Policy - The directive "noopenerAndNoreferrerAnchors" is enabled by default, but disabled in this policy. It is recommended to leave it enabled to prevent reverse tabnabbing attacks. 2024-04-19 10:56:28,888 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-1 [Attributes] queueSize [0] totalAdded [1] totalSaved [1] totalFailed [0] 2024-04-19 10:56:28,900 [grpc-default-executor-0] INFO o.t.s.d.s.i.s.SqlPartitioningRepository - Saving partition 1713484800000-1713571200000 for table cloud_event 2024-04-19 10:56:30,455 [tb-rule-engine-consumer-50-thread-7 | QK(Main,TB_RULE_ENGINE,system)-9] INFO o.t.s.s.q.r.TbRuleEngineQueueConsumerManager - [QK(Main,TB_RULE_ENGINE,system)] Failed to process [1] messages 2024-04-19 10:56:30,458 [tb-rule-engine-consumer-50-thread-7 | QK(Main,TB_RULE_ENGINE,system)-9] INFO o.t.s.s.q.r.TbRuleEngineQueueConsumerManager - [150dcbc0-d536-11ee-bd83-ebc2e463161f] Failed to process message: TbMsg(queueName=Main, id=b6f5de0f-2320-4e00-af36-9c3178722bd2, ts=1713524190376, type=ATTRIBUTES_UPDATED, internalType=ATTRIBUTES_UPDATED, originator=f77786e0-d536-11ee-bd83-ebc2e463161f, customerId=null, metaData=TbMsgMetaData(data={edgeName=testing, edgeType=default, scope=SERVER_SCOPE, source=cloud}), dataType=JSON, data={"edgeVersion":"V_3_6_2","queueStartSeqId":1927,"queueStartTs":1713521002481}, ruleChainId=null, ruleNodeId=null, ctx=org.thingsboard.server.common.msg.TbMsgProcessingCtx@42417297, callback=org.thingsboard.server.common.msg.queue.TbMsgCallback$1@c832443), Last Rule Node: null 2024-04-19 10:56:30,444 [tenant-dispatcher-1-1] INFO o.t.server.actors.tenant.TenantActor - [150dcbc0-d536-11ee-bd83-ebc2e463161f] No Root Chain: QueueToRuleEngineMsg(super=TbRuleEngineActorMsg(msg=TbMsg(queueName=Main, id=b6f5de0f-2320-4e00-af36-9c3178722bd2, ts=1713524190376, type=ATTRIBUTES_UPDATED, internalType=ATTRIBUTES_UPDATED, originator=f77786e0-d536-11ee-bd83-ebc2e463161f, customerId=null, metaData=TbMsgMetaData(data={edgeName=testing, edgeType=default, scope=SERVER_SCOPE, source=cloud}), dataType=JSON, data={"edgeVersion":"V_3_6_2","queueStartSeqId":1927,"queueStartTs":1713521002481}, ruleChainId=null, ruleNodeId=null, ctx=org.thingsboard.server.common.msg.TbMsgProcessingCtx@1e7a4582, callback=org.thingsboard.server.service.queue.TbMsgPackCallback@66d71c58)), tenantId=150dcbc0-d536-11ee-bd83-ebc2e463161f, relationTypes=[], failureMessage=)

volodymyr-babak commented 3 months ago

@aistisdev

Hello, as a quick fix - could you please try to fix it from the cloud by setting root rule chain from Edge management. To achieve this please create some dummy rule chain in Rule chain templates, and assign this dummy rule chain to edge. One rule chain is assigned please make it as root, and then make your original root as Root Rule Chain: 2024-04-19_15-26

Please let me know if that will fix your issue.

aistisdev commented 3 months ago

@volodymyr-babak thanks, seems to be working after your suggestion: image

Any idea what could of caused this?

volodymyr-babak commented 3 months ago

@aistisdev

This is the bug that was introduced in 3.6.2 and was fixed in 3.6.3: https://github.com/thingsboard/thingsboard/pull/10151/files Bug is concurrent and happens under different conditions. It's hard to provide exact root cause in your case. Please do upgrade to the latest version if possible to avoid this issue in the future.

aistisdev commented 3 months ago

Ok, wil upgrade as soon as I will have time. Thanks for the help!