thingsboard / thingsboard-edge

Apache License 2.0
98 stars 74 forks source link

[Bug] Edge sync failed if to the edge was assigned dashboard which size is bigger than 4MB #41

Closed volodymyr-babak closed 1 year ago

volodymyr-babak commented 1 year ago

Describe the bug Edge sync failed if to the edge was assigned dashboard which size is bigger than 4MB

Your Server Environment

Your Client Environment

Desktop (please complete the following information):

Smartphone (please complete the following information):

Your Device

To Reproduce Steps to reproduce the behavior:

  1. Create dashboard and use 'heavy' background image
  2. Assign this dashboard to edge

Expected behavior Dashboard is provisioned to the edge successfully. Or at least warning message displayed and dashboard cannot be assigned to the edge because of it's size.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

AndreMaz commented 1 year ago

@volodymyr-babak I'm on tb-edge PE v3.5.1.1 (clean install) and when I click on Sync Edge I see the following errors:

TB-Cloud

2023-08-04 16:01:30,097 [grpc-default-executor-821] ERROR o.t.s.s.edge.rpc.EdgeGrpcSession - [] Stream was terminated due to error: io.grpc.StatusRuntimeException: CANCELLED: client cancelled at io.grpc.Status.asRuntimeException(Status.java:526) at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:291) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closedInternal(ServerCallImpl.java:362) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:350) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:913) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2023-08-04 16:01:30,097 [grpc-default-executor-821] INFO o.t.s.s.edge.rpc.EdgeGrpcService - [] edge disconnected!

TB-Edge

2023-08-04 16:24:23,292 [grpc-default-executor-3] WARN o.t.edge.rpc.EdgeGrpcClient - [] Stream was terminated due to error: io.grpc.StatusRuntimeException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR at io.grpc.Status.asRuntimeException(Status.java:535) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479) at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463) at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427) at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460) at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

Shouldn't this PR prevent this kind of issues? Or at least print a warning https://github.com/thingsboard/thingsboard/blob/f618599d15f7f9af96a7ea4c95a672383a86dc1c/application/src/main/java/org/thingsboard/server/service/edge/rpc/EdgeGrpcSession.java#L425-L427

volodymyr-babak commented 1 year ago

Hello @AndreMaz,

Yes, the behavior should be as you've described - it should log a warning and skip the particular message. Could you please provide the version of your cloud (server) - is it also 3.5.1? I'm going to investigate this issue further once you confirm that it's 3.5.1 as well on the cloud.

AndreMaz commented 1 year ago

Hi @volodymyr-babak. Yep, TB-Cloud is running on v3.5.1 PE

AndriiLandiak commented 1 year ago

Hello, @AndreMaz.

We wanted to provide you with an update on the recent investigation into the dashboard issue you reported. We tried to reproduce this case with default tb variables and unfortunately, we could not reach what you got. In our case, we receive an error msg that the downlink msg size is more significant than expected (watch log and screen below). My flow:

  1. Create an entity group and assign it to the edge
  2. Create a dashboard and use a 'heavy' background image (on this step I receive an expected log error and the dashboard didn't appear on edge)
  3. Process sync up Edge (same result as on the second step).

To better assist you, could you please give more details (CLOUD_RPC_MAX_INBOUND_MESSAGE_SIZE, dashboard size, screenshots, etc) and your steps to reproduce?

we look forward to your prompt response to help you as soon as possible.

Error msg as expected: 2023-08-07 16:43:48,187 [edge-send-scheduler-39-thread-1] ERROR o.t.s.s.edge.rpc.EdgeGrpcSession - [8e190310-3525-11ee-b2f7-8dcf17cbc7bf][d1981090-3525-11ee-8838-33da74ac9909][ad6763ca-3e86-430f-882d-d85018ae2d4a] Downlink msg size [6994923] exceeds client max inbound message size [4194304]. Skipping this message. Please increase the value of CLOUD_RPC_MAX_INBOUND_MESSAGE_SIZE env variable on the edge and restart it.

image

AndreMaz commented 1 year ago

Hi @AndriiLandiak thank you for quick response.

Here goes some more info License status: image

Edge status @ TB-Cloud image

In Downlink tab I see that everything is marked as Deployed Side Node Would be good to be able to sort the entries by the Event action, Status, Type and not only by Event time as currently happens. image

TB-Cloud DB with the estimated table size image

I've set the CLOUD_RPC_MAX_INBOUND_MESSAGE_SIZE and EDGES_RPC_MAX_INBOUND_MESSAGE_SIZE to 20971520 (ie, 20 MB) but the problem persists.

My question is: the logic of this PR only applies to the Dashboard sync process? What about the remaining elements that flow through the gRPC link?

Also: is there anything that I can do to better see the error logs? The error that TB throws is not really useful. For example, it would be good to see the type of the element (and it's size) that TB tried to sync and caused the error

volodymyr-babak commented 1 year ago

@AndreMaz

From your description, it seems the issue might not be related to the size of the object but to another factor. Could you please follow these steps to help us diagnose the problem?

<logger name="org.thingsboard.server.service.cloud" level="TRACE" />

<logger name="org.thingsboard.server.service.edge" level="TRACE" />

After completing these steps, kindly attach the logs from both the edge and cloud containers. Thank you.

2023-08-08_10-45

AndreMaz commented 1 year ago

Hi @volodymyr-babak you can close the issue as it was caused by our infrastructure, kong to be precise. It was closing gRPC connection after a minute or so.

Thank you for the logback.xml trick, it helped me to conclude that everything was fine @ TB and led me to simplify our infrastructure, which allowed me to find out the source of the problem

Thank you again for the quick response. Really appreciate it

volodymyr-babak commented 1 year ago

@AndreMaz

Thank you for your feedback. Please feel free to reopen this ticket if necessary, and don't hesitate to create a new one if you encounter any other issues.