thingsboard / thingsboard-gateway

Open-source IoT Gateway - integrates devices connected to legacy and third-party systems with ThingsBoard IoT Platform using Modbus, CAN bus, BACnet, BLE, OPC-UA, MQTT, ODBC and REST protocols
https://thingsboard.io/docs/iot-gateway/what-is-iot-gateway/
Apache License 2.0
1.75k stars 845 forks source link

[BUG] IoTgateway Memory storage is full and never recovers #1483

Closed ashdam closed 3 months ago

ashdam commented 3 months ago

Describe the bug

IotGateway lost connection with TB Edge PE after a restart on iotgateway, after 8 hours memory gets full, restarted edge with new parameters (increased MQTT new message size limit ) but never recovers to a normal status

Steps followed: 1) Loaded a big connector OPCUA asyncua with 1000 variables 2) lost syncronizacion with TB Edge PE 3) Waited 7-8 hours Iotgateway claiming memory full (in memory config) 4) TB Edge PE: 4.1 - Shows error on MQTT size 4.2 Increased NETTY_MAX_PAYLOAD_SIZE env variable and restarted 4.3 - Recovers 5) IotGateway does not recover and keep through memory full error 6) IoTGateway restarted and went ok.

Our concern is about the recovery we found many situations were communication/syncronization is not recovered and GW keeps on throwing ERRORs

Connector name (If bug in the some connector): OPC-UA asyncua Connector

Error traceback (Gateway):

2024-08-06 05:51:27 - |INFO| - [tb_device_mqtt.py] - tb_device_mqtt - _on_connect - 352 - MQTT client <paho.mqtt.client.Client object at 0x74b24c8a8590> - Connected!
2024-08-06 05:51:29 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:51:29 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:51:29 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _on_disconnect - 343 - MQTT client was disconnected with reason code 7 (The connection was lost.)
2024-08-06 05:51:29 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _publish_data - 745 - Waiting for connection to be established before sending data to ThingsBoard!
2024-08-06 05:51:29 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _publish_data - 745 - Waiting for connection to be established before sending data to ThingsBoard!

.... repeated error 

2024-08-06 05:52:04 - |INFO| - [tb_device_mqtt.py] - tb_device_mqtt - _on_connect - 352 - MQTT client <paho.mqtt.client.Client object at 0x74b24c8a8590> - Connected!
2024-08-06 05:52:04 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:04 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:04 - |INFO| - [tb_gateway_mqtt.py] - tb_gateway_mqtt - gw_subscribe_to_attribute - 236 - Subscribed to *|* with id 1040 for device *
2024-08-06 05:52:06 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:06 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:06 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:06 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:06 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:06 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:06 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:06 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:06 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:06 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:06 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _on_disconnect - 343 - MQTT client was disconnected with reason code 7 (The connection was lost.)
2024-08-06 05:52:06 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 05:52:06 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 05:52:06 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _publish_data - 745 - Waiting for connection to be established before sending data to ThingsBoard!

2024-08-06 05:52:06 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _publish_data - 745 - Waiting for connection to be established before sending data to ThingsBoard!
2024-08-06 05:52:16 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _publish_data - 745 - Waiting for connection to be established before sending data to ThingsBoard!
2024-08-06 05:52:16 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _publish_data - 745 - Waiting for connection to be established before sending data to ThingsBoard!
2024-08-06 05:52:16 - |INFO| - [tb_device_mqtt.py] - tb_device_mqtt - _on_connect - 352 - MQTT client <paho.mqtt.client.Client object at 0x74b24c8a8590> - Connected!
2024-08-06 05:52:16 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __read_data_from_storage - 1223 - Error while sending data to ThingsBoard, it will be resent.
Traceback (most recent call last):
  File "/thingsboard_gateway/gateway/tb_gateway_service.py", line 1217, in __read_data_from_storage
    success = event.get() == event.TB_ERR_SUCCESS
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tb_device_mqtt.py", line 191, in get
    self.message_info.wait_for_publish(timeout=1)
  File "/usr/local/lib/python3.11/site-packages/paho/mqtt/client.py", line 362, in wait_for_publish
    raise RuntimeError('Message publish failed: %s' % (error_string(self.rc)))
RuntimeError: Message publish failed: The client is not currently connected.
2024-08-06 05:52:17 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!

..... a lil later

2024-08-06 06:24:36 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 06:24:36 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 06:24:36 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __read_data_from_storage - 1223 - Error while sending data to ThingsBoard, it will be resent.
Traceback (most recent call last):
  File "/thingsboard_gateway/gateway/tb_gateway_service.py", line 1217, in __read_data_from_storage
    success = event.get() == event.TB_ERR_SUCCESS
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tb_device_mqtt.py", line 191, in get
    self.message_info.wait_for_publish(timeout=1)
  File "/usr/local/lib/python3.11/site-packages/paho/mqtt/client.py", line 358, in wait_for_publish
    raise ValueError('Message is not queued due to ERR_QUEUE_SIZE')
ValueError: Message is not queued due to ERR_QUEUE_SIZE
2024-08-06 06:24:36 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 06:24:36 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __send_data_pack_to_storage - 1137 - '[727e540a-4526-4644-9918-77501a379e14] 'Data from the device "MB107_UDC1" cannot be saved, connector name is MB107_UDC1_connector.
2024-08-06 06:24:36 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __read_data_from_storage - 1223 - Error while sending data to ThingsBoard, it will be resent.
Traceback (most recent call last):
  File "/thingsboard_gateway/gateway/tb_gateway_service.py", line 1217, in __read_data_from_storage
    success = event.get() == event.TB_ERR_SUCCESS
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tb_device_mqtt.py", line 191, in get
    self.message_info.wait_for_publish(timeout=1)
  File "/usr/local/lib/python3.11/site-packages/paho/mqtt/client.py", line 358, in wait_for_publish
    raise ValueError('Message is not queued due to ERR_QUEUE_SIZE')
ValueError: Message is not queued due to ERR_QUEUE_SIZE
2024-08-06 06:24:36 - |ERROR| - [memory_event_storage.py] - memory_event_storage - put - 37 - Memory storage is full!
2024-08-06 06:24:36 - |ERROR| - [tb_gateway_service.py] - tb_gateway_service - __read_data_from_storage - 1223 - Error while sending data to ThingsBoard, it will be resent.

Versions (please complete the following information):

imbeacon commented 3 months ago

Hi @ashdam,

ERR_QUEUE_SIZE error means that MQTT client was not able to deliver all messages to the platform. It is a result of not sent messages to the platform, because platform disconnects the gateway. Before ThingsBoard 3.7.0 - it often returns 7 reason code and closes connection due to different reasons. The most often reason that we experienced with this error is exceeding transport rate limits for device or tenant. To get information why TB disconnects the gateway you can check logs. I think this can be also checked on TB Edge.

kuailezhiyuan commented 3 months ago

I'm also experiencing this problem I get this error after a while after docking a large number of devices using modbus. Using the code from the end of July works fine

ashdam commented 3 months ago

This related with a performance issue we currently have. Closing....