thingsboard / thingsboard-edge

Apache License 2.0
101 stars 77 forks source link

Loosing connection cloud <-> edge after few hours --- RPC Error: NO_ACTIVE_CONNECTION #8

Open AdrienAdB opened 2 years ago

AdrienAdB commented 2 years ago

Describe the bug

Self hosted TB cloud seems to loose connection with Edge after sometime (hours). Issue apply only for RPC calls from cloud server. Error from audit log TB Cloud: "RPC Error: NO_ACTIVE_CONNECTION" It seems like unassigning/reassigning Edge device restart connection, RPC call will work after that.

Your Server Environment

To Reproduce Steps to reproduce the behavior:

  1. Make RPC call from cloud. It works.
  2. Wait few hours, I have the problem over night on the next morning.
  3. Try same RPC call from TB cloud dashboard (audit log: RPC Error: NO_ACTIVE_CONNECTION). I can confirm the same RPC call works fine from TB Edge dashboard.
  4. Unassign Edge device.
  5. Assign Edge device.
  6. RPC call success.

Screenshots

Screenshot 2022-05-18 at 10 56 17
volodymyr-babak commented 2 years ago

hi @AdrienAdB that is most probably related to the active status of the device on the cloud. That's a known bug that is targeted to be fixed in the next release. Because the device is connected to the edge, its status on the cloud is not properly updated and the RPC call failed. I'm going to review it later in details and provide you feedback on the exact reason.

AdrienAdB commented 2 years ago

Thanks @volodymyr-babak.

volodymyr-babak commented 2 years ago

Hello @AdrienAdB I'm trying to reproduce this problem and fix it. Could you please provide your device protocol? Are you using MQTT? Do you send some data overnight from edge to cloud?

AdrienAdB commented 2 years ago

Hello,

Overall setup work really well, TB-Edge make remote very fast. Only issue is this disconnection overnight.

I will try to send extra data overnight, something every 10min and see if problem persists. That can be an easy work around time issue is resolved.

# /etc/tb-edge/conf/tb-edge.conf 
#
# Copyright © 2016-2022 The Thingsboard Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

export JAVA_OPTS="$JAVA_OPTS -Dplatform=deb -Dinstall.data_dir=/usr/share/tb-edge/data"
export JAVA_OPTS="$JAVA_OPTS -Xlog:gc*,heap*,age*,safepoint=debug:file=/var/log/tb-edge/gc.log:time,uptime,level,tags:filecount=10,filesize=10M"
export JAVA_OPTS="$JAVA_OPTS -XX:+IgnoreUnrecognizedVMOptions -XX:+HeapDumpOnOutOfMemoryError"
export JAVA_OPTS="$JAVA_OPTS -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -XX:+UseCondCardMark"
export JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UseStringDeduplication -XX:+ParallelRefProcEnabled -XX:MaxTenuringThreshold=10"
export LOG_FILENAME=tb-edge.out
export LOADER_PATH=/usr/share/tb-edge/conf,/usr/share/tb-edge/extensions
export SQL_DATA_FOLDER=/usr/share/tb-edge/data/sql

# UNCOMMENT NEXT LINES AND PUT YOUR CLOUD CONNECTION SETTINGS:
export CLOUD_ROUTING_KEY=xxxxxx-xxxxxxxx
export CLOUD_ROUTING_SECRET=xxxxxx

# UNCOMMENT NEXT LINES IF EDGE CONNECTS TO CE 'DEMO.THINGSBOARD.IO' SERVER:
export CLOUD_RPC_HOST=xxxxxx

# UNCOMMENT NEXT LINES IF YOU CHANGED DEFAULT CLOUD RPC HOST/PORT SETTINGS:
# export CLOUD_RPC_HOST=xxxxxx
# export CLOUD_RPC_PORT=7070

# UNCOMMENT NEXT LINES IF YOU ARE RUNNING EDGE ON THE SAME MACHINE WHERE THINGSBOARD SERVER IS RUNNING:
# export HTTP_BIND_PORT=18080
# export MQTT_BIND_PORT=11883
# export COAP_BIND_PORT=15683

# UNCOMMENT NEXT LINES IF YOU HAVE CHANGED DEFAULT POSTGRESQL DATASOURCE SETTINGS:
# export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/tb_edge
export SPRING_DATASOURCE_USERNAME=postgres
export SPRING_DATASOURCE_PASSWORD=xxxxxx
AdrienAdB commented 2 years ago

Device is now sending "keepAlive" attribute every 1min. I let you know tomorrow...

AdrienAdB commented 2 years ago

Hi, "keepAlive" attribute every minute didn't fix issue.

volodymyr-babak commented 2 years ago

Hello @AdrienAdB

thanks for the updates. Pull request that should fix this issue was created: https://github.com/thingsboard/thingsboard-pe/pull/897

It should be available next release. I'm going to validate this use case separately and let you know the results before the release. But this will work only in the case of sending "keepAlive" events from the device to keep the session active on a cloud.

truongvanhuy2000 commented 1 year ago

Have you knew the fix for this?

volodymyr-babak commented 1 year ago

@truongvanhuy2000

please provide additional details on your issue

  1. Are you seeing 'RPC Error: NO_ACTIVE_CONNECTION' error in the logs?
  2. Do you send any data from the edge to the cloud actively? Or you have some pauses in sending data?

So please provide any additional data so issue can be reproduced and fixed.

AndreMaz commented 1 year ago

hi @AdrienAdB that is most probably related to the active status of the device on the cloud. That's a known bug that is targeted to be fixed in the next release. Because the device is connected to the edge, its status on the cloud is not properly updated and the RPC call failed. I'm going to review it later in details and provide you feedback on the exact reason.

Hi @volodymyr-babak I'm on TB Edge 3.4.4 and just got exactly the same problem. I'm using MQTT between the TB Edge and devices. Here's what I get in Audit Logs of the device image

After the "unassign -> assign" the problem is gone.

volodymyr-babak commented 1 year ago

@AndreMaz

Can you kindly check if there's an RPC_CALL event logged immediately before or after you notice the NO_ACTIVE_CONNECTION error? This will help us ascertain whether the RPC_CALL request is being sent to the edge, or if it's not leaving the cloud.

image

Also, it would be insightful to determine if there are any RPC_CALL cloud events being logged on the edge. If the RPC_CALL is being sent from the cloud but is not being received at the edge, it may signify network issues or problems with the edge's ability to process the RPC_CALL.

Your observations on these points will be very valuable for us to pinpoint the issue and help you further.

I look forward to your response. If you have any additional questions or need further clarification, please don't hesitate to ask.

akseerali commented 1 year ago

Hi, after the new update to ThingsBoard, the RPC Call Request at Cloud is showing No Active Connection. From the Edge it can send the RPC request.

Kindly share any hints, thanks.

image

akseerali commented 1 year ago

Hi, after the new update to ThingsBoard, the RPC Call Request at Cloud is showing No Active Connection. From the Edge it can send the RPC request.

Kindly share any hints, thanks.

image

Problem is resolved. Please refer this issue for more details.