Open ManuelAlvesDtx opened 1 year ago
Hi, i dont know if this is helpful for your case but it may be an explaination to the behavior you describe.
We noticed in our application that if the connection goes down and we are using "success = result.get() == TBPublishInfo.TB_ERR_SUCCESS" to check the transmission (which is also documented as a blocking call) it will hang indefinitely.
Our current fix is as follows:
patch the _on_disconnect method in TBDeviceMqttClient so that self.disconnect() is called to execute a clean disconnect. This probably removes the ability of the client of automatic reconnection but here it helped as a workaround:
## fix the TBDeviceMqttClient _on_disconnect function
from tb_device_mqtt import log
def _on_disconnect_fix(self, client, userdata, result_code):
prev_level = log.level
log.setLevel("DEBUG")
log.debug("Disconnected client: %s, user data: %s, result code: %s", str(client), str(userdata),
str(result_code))
log.setLevel(prev_level)
# only the following line is added compared to the original fn
self.disconnect()
TBDeviceMqttClient._on_disconnect = _on_disconnect_fix
- remove the result.get() call
- check instead client._is_connected() to see if client is really connected, notice that if a connection is lost, the default timeout is 120s after which the _is_disconnected method is called (This can be adapted in the TBDeviceMqttClient.connect call).
I see that you are connecting / disconnecting the client in each thread so it may not be connection related. However, this situation may still happen in your case if the connection breaks right after it was established for some reason (client.connect) or maybe it never worked and then the success call simply blocks producing the reported behavior.
A more general solution would surely include to add a timeout to the result.get() fn and improve the disconnect behavior of the mqtt client.
Hi, thanks for your reply. I've managed to get it running by using paho and making a few changes ( I was missing client.loop_start() ). Now it runs with multiple threads without any issue. I leave it here for anyone trying to do the same:
` import paho.mqtt.client as mqtt import random, json ....
def write_device_attribute(self) -> int:
client = mqtt.Client()
client.username_pw_set(self.device_token)
client.connect(mqttBroker)
attribute = {"withoutHistory": random.randint(-1000000, 1000000)}
attribute = json.dumps(attribute)
client.loop_start()
info = client.publish(ATTRIBUTES_TOPIC, attribute)
info.wait_for_publish()
client.disconnect()
return info.rc
def write_device_telemetry(self) -> int:
client = mqtt.Client()
client.username_pw_set(self.device_token)
client.connect(mqttBroker)
telemetry = {"withHistory":random.randint(-1000000, 1000000)}
telemetry = json.dumps(telemetry)
client.loop_start()
info = client.publish(TELEMETRY_TOPIC, telemetry)
info.wait_for_publish()
client.disconnect()
return info.rc
`
I've tested Thingsboard API (HTTP using JMeter) with very good results. Now I've been asked to do the same using MQTT. I’m using the docker image described in https://thingsboard.io/docs/user-guide/install/docker/
For starters, the examples in https://thingsboard.io/docs/reference/python-client-sdk/ do not work with tb-mqtt-client latest versions. By trial and error, I managed to make it work with version 1.1, the only one that worked for sending attributes and telemetry (python -m pip install tb-mqtt-client==1.1). Using the MQTT python client to create devices never succeeds in provision more than 10.000. It hangs before that without any error. Since what I was asked to test was sending data (attributes and telemetry) I resorted to pycurl to provision the devices and get their tokens so I could send data to each device. Following the example in https://github.com/thingsboard/thingsboard-python-client-sdk/blob/master/examples/device/send_telemetry_and_attr.py I created these two functions to send data on custom MQTT class of my own where properties like server address and device token are populated when the device is provisioned using http.
And the main function: …
…
This code starts by running as expected but after a few hundred attribute writes it gets stuck. Checking the process with ps -u I see that it is waiting in a interruptible sleep (waiting for an event to complete), specifically state “Sl+”.
Any clue as why this works fine for low numbers but gets stuck on a long run? The server is almost idle at 2% CPU usage, lots of free memory and disk.