thingsboard / thingsboard-edge

Apache License 2.0
98 stars 74 forks source link

[Bug] GRPC error when syncing edge #46

Closed AndreMaz closed 1 year ago

AndreMaz commented 1 year ago

Describe the bug

When I press "Sync Edge" I can see in logs of TB edge the following error:

tb-edge 2023-03-09 20:57:49,080 [grpc-default-executor-0] WARN o.t.edge.rpc.EdgeGrpcClient - [024b44dd-ece4-c8ca-5302-b61fdca7cffb] Stream was terminated due to error: tb-edge io.grpc.StatusRuntimeException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR tb-edge at io.grpc.Status.asRuntimeException(Status.java:535) tb-edge at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479) tb-edge at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463) tb-edge at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427) tb-edge at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460) tb-edge at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) tb-edge at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) tb-edge at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) tb-edge at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) tb-edge at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) tb-edge at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) tb-edge at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) tb-edge at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) tb-edge at java.base/java.lang.Thread.run(Thread.java:829) tb-edge 2023-03-09 20:57:52,083 [cloud-manager-reconnect-57-thread-1] INFO o.t.s.s.cloud.CloudManagerService - Trying to reconnect due to the error: io.grpc.StatusRuntimeException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR!

Note 1 It manages to sync the Widgets, Users and device profiles but fails to sync the actual devices.

Note 2 TB cloud does not show any errors in the logs

Environment

Additional context It's a brand new deployment of TB edge 3.4.3

volodymyr-babak commented 1 year ago

@AndreMaz

could you please attach logs from your docker containers of thingsboard and tb-edge?

AndreMaz commented 1 year ago

Hi @volodymyr-babak

Checked the logs again and it fails even without pressing "Sync Edge" button. Here's are the logs that I've found

TB-Cloud

2023-03-10 17:54:23,280 [grpc-default-executor-1031] ERROR o.t.s.s.edge.rpc.EdgeGrpcSession - [3bbfa343-f43d-4d91-8a2f-e096a1be32ab] Stream was terminated due to error: io.grpc.StatusRuntimeException: CANCELLED: client cancelled at io.grpc.Status.asRuntimeException(Status.java:526) at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:291) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closedInternal(ServerCallImpl.java:362) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:350) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:913) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2023-03-10 17:54:23,281 [grpc-default-executor-1031] INFO o.t.s.s.edge.rpc.EdgeGrpcService - [4dbc6130-18b2-11ed-9b8c-ed22da672dc4] edge disconnected!

TB-Edge

tb-edge 2023-03-10 17:50:15,663 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-0 [Cloud Events] queueSize [0] totalAdded [2] totalSaved [2] totalFailed [0] tb-edge 2023-03-10 17:50:24,874 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-1 [TS] queueSize [0] totalAdded [6] totalSaved [6] totalFailed [0] tb-edge 2023-03-10 17:50:24,929 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-1 [TS Latest] queueSize [0] totalAdded [6] totalSaved [6] totalFailed [0] tb-edge 2023-03-10 17:50:34,874 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-1 [TS] queueSize [0] totalAdded [4] totalSaved [4] totalFailed [0] tb-edge 2023-03-10 17:50:34,929 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-1 [TS Latest] queueSize [0] totalAdded [4] totalSaved [4] totalFailed [0] tb-edge 2023-03-10 17:51:09,721 [grpc-default-executor-0] WARN o.t.edge.rpc.EdgeGrpcClient - [xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] Stream was terminated due to error: tb-edge io.grpc.StatusRuntimeException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR tb-edge at io.grpc.Status.asRuntimeException(Status.java:535) tb-edge at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479) tb-edge at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463) tb-edge at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427) tb-edge at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460) tb-edge at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) tb-edge at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) tb-edge at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) tb-edge at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) tb-edge at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) tb-edge at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) tb-edge at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) tb-edge at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) tb-edge at java.base/java.lang.Thread.run(Thread.java:829) tb-edge 2023-03-10 17:51:12,723 [cloud-manager-reconnect-57-thread-1] INFO o.t.s.s.cloud.CloudManagerService - Trying to reconnect due to the error: io.grpc.StatusRuntimeException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR! tb-edge 2023-03-10 17:51:12,723 [cloud-manager-reconnect-57-thread-1] INFO o.t.server.common.data.ResourceUtils - Reading resource data from file /ssl/cert.com.pem tb-edge 2023-03-10 17:51:12,729 [cloud-manager-reconnect-57-thread-1] INFO o.t.edge.rpc.EdgeGrpcClient - [xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] Sending a connect request to the TB! tb-edge 2023-03-10 17:51:13,010 [grpc-default-executor-0] INFO o.t.edge.rpc.EdgeGrpcClient - [xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] Configuration received: tb-edge edgeIdMSB: xxxxxxxxxxxxxx tb-edge edgeIdLSB: -xxxxxxxxxxxx tb-edge tenantIdMSB: xxxxxxxxxxxx tb-edge tenantIdLSB: -xxxxxxxxxxxx tb-edge customerIdMSB: -xxxxxxxxxxxx tb-edge customerIdLSB: -xxxxxxxxxxxx tb-edge name: "TB Edge" tb-edge type: "default" tb-edge routingKey: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" tb-edge secret: "njb7vomfngswne9yepa3" tb-edge additionalInfo: "{\"description\":\"\"}" tb-edge cloudType: "CE" tb-edge
tb-edge 2023-03-10 17:51:15,081 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-1 [Attributes] queueSize [0] totalAdded [1] totalSaved [1] totalFailed [0] tb-edge 2023-03-10 17:51:15,081 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [Attributes] queueSize [0] totalAdded [6] totalSaved [6] totalFailed [0] tb-edge 2023-03-10 17:51:15,324 [TB-Scheduling-8] INFO o.t.s.c.t.s.DefaultTransportService - Transport Stats: openConnections [0]

volodymyr-babak commented 1 year ago

@AndreMaz

Could you please send me full tb-edge.log and thingsboard.log files? And additionally please send your tb-edge.conf and thingsboard.conf files (If you are using docker please send me your docker compose files)? I suppose something is wrong in the configuration of connectivity. Looks like you enabled SSL on edge side and do not enabled it on the cloud or something similar. Please send me these files to vbabak@thingsboard.io.

Thanks.

AndreMaz commented 1 year ago

Hi @volodymyr-babak

Looks like you enabled SSL on edge side and do not enabled it on the cloud or something similar.

Yes, the SSL is enabled on TB-Edge but not enabled on TB-Cloud because it's located behind Kong (reverse proxy) and SSL is enabled in Kong. However, I don't think that the source of the problem is the SSL as some data successfully arrives to TB-Edge.

Here's a print of "Downlinks" tab @ TB-Cloud image

and here's a print of "Cloud Events" @ TB-Edge image

and a print of "Status" of TB-Edge image

Here's another print of "Customer Users" @ TB-Edge, which are successfully synced. image

In addition to users, TB-Edge instance manages to sync the Widgets and Device Profiles. However, it fails to sync the Dashboards and the Devices

Re the docker files. Unfortunately, I can't provide them for the TB-cloud as we're using Pulumi to manage our infrastructure.

volodymyr-babak commented 1 year ago

@AndreMaz

thanks for the provided information. Indeed issue is not in SSL because you are receiving and sending events. Could you please check size of your dashboards that are assigned to the Edge? We have opened issue in case dashboard size is bigger 4 MB: https://github.com/thingsboard/thingsboard-edge/issues/41

Maybe that could be the root cause for your case as well.

Additionally, please provide me full logs file of the edge service at least - maybe I can see root cause from the edge log file.

AndreMaz commented 1 year ago

Hi @volodymyr-babak

Could you please check size of your dashboards that are assigned to the Edge? We have opened issue in case dashboard size is bigger 4 MB:

I don't think that it's source of the problem as total size of my 7 dashboards is 1.4MB

Additionally, please provide me full logs file of the edge service at least - maybe I can see root cause from the edge log file.

Here's the tb-edge log file [removed]

volodymyr-babak commented 1 year ago

@AndreMaz

thanks for the logs, but root cause is not clear yet. Looks like really something is in the message size - probably edge is trying to send something that is bigger 4 MB or something like that. Without logs from the server side it's hard to provide you exact answer what is wrong. Could you please request server side logs from your provider?

AndreMaz commented 1 year ago

Hi @volodymyr-babak

That you for the quick response. I'll try to get the info that you need tomorrow.

Looks like really something is in the message size - probably edge is trying to send something that is bigger 4 MB or something like that.

I think that gRPC only has a limit of 4MBs for the incoming messages so it must be the TB-Cloud that's pushing something big (Tb-edge instance is empty as I've wiped all volumes)

One question, TB-Cloud syncs all DB tables with the TB-Edge, right? If so then I might be able to take a look at the DB and check if there's anything larger than 4MBs.

volodymyr-babak commented 1 year ago

hi @AndreMaz

not exactly. Cloud syncs to the edge only entities, that are assigned to these edge. Additionally, cloud syncs all device profiles, queues and some other generic stuff from the tenant level to the edge of this tenant. But cloud never sync all DB directly to the edge.

AndreMaz commented 1 year ago

Hi @volodymyr-babak

Sorry for the delay. I've checked the TB-Cloud DB and it's fairly small, total size is around 65MB. Here's are the tables with and approximate size image

I also didn't find any row > 4MB

Here's are the logs from TB-cloud starting from the moment the TB-Edge connected to the moment it got disconnected tb-cloud-logs.txt

volodymyr-babak commented 1 year ago

Closing ticket - root cause of the issue was in the miss configuration of connection string of gateway device.