randomsync / robotframework-mqttlibrary

MQTT Keyword Library for Robot Framework
Apache License 2.0
25 stars 30 forks source link

Publish failing with error code 4 #25

Open juha-ylikoski opened 3 years ago

juha-ylikoski commented 3 years ago

Sometimes (maybe 1/4 of the time) when I try to publish message to my mqtt broker (mosquitto version 1.6.9) I just get

FAIL Error publishing: 4

this is caused by:

result, mid = self._mqttc.publish(topic, message, int(qos), retain)
if result != 0:
    raise RuntimeError('Error publishing: %s' % result)

in MQTTKeywords.publish returning result == 4. (4 == MQTT_ERR_NO_CONN).

according to https://github.com/eclipse/paho.mqtt.python/blob/master/src/paho/mqtt/client.py

rc is MQTT_ERR_SUCCESS to indicate success or MQTT_ERR_NO_CONN if the
client is not currently connected.  mid is the message ID for the
publish request. The mid value can be used to track the publish request
by checking against the mid argument in the on_publish() callback if it
is defined.

If the client is not currently connected should this library try to reconnect or throw error about it from _on_disconnect.

I cannot find reason for the disconnect from my mosquitto logs (I don't see anything with the brokers log with correct timestamp) or from robot logs and I have connected multiple other devices to this broker with different software (using python2 paho_mqtt library) and they don't seem to disconnect randomly.

ralienpp commented 3 years ago

I encountered a similar problem. In my case the disconnect occurs because keywords call custom functions that block the main thread for some time, this prevents the MQTT loop from interacting with the broker at regular intervals to let it know that it is still alive.

By default robotframework-mqttlibrary runs this in the main thread, but you can call subscribe with a timeout of 0, which would run the MQTT loop in a separate thread, so these disconnect issues should not occur.

Can you give it a try and let me know if it worked for you?

juha-ylikoski commented 3 years ago

I have encountered the problem with both ways. >think that the problem has to do something with the looping of this library. I have since posting this issue written my own mqq library with background loop and haven't encountered this issue since. unfortunately the library is not my ip and has a lot of propertiary things which I won't. be publishing.

I can however show the difference in looping mechanism and explain more in detail how I encountered this issue later today or tomorrow.

ralienpp commented 3 years ago

I'd greatly appreciate it if you could share your insights, because I ran into the same issue and I am considering whether I should do a devil's dance around it, or add an auto_reconnect parameter, or do something else entirely.

juha-ylikoski commented 3 years ago

How I've tried to use this library:

1) I used subscribe keyword to get all messages from one topic for few minutes. This fails about 1/2 of times not returning nothing when there should be messages. 2) I used subscribe + listen to asynchronously listen for messages. During this time I tried sending messages and got the error 4.

I believe the looping mechanism used by this library somehow causes these errors. I tried building my own library for company I work. I made the library a lot simpler and did not include pretty much anything else than asynchronously listening for messages while sending them and just listening for messages. I have not gotten any of these errors with my library.

The biggest difference between my library and this is that I used mqttc.loop_start() which starts background loop instead of the mqttc.loop(). I think using that using the synchronous loop() somehow causes these random disconnects.

I have not tried modifying the library to use auto_reconnect and I have no idea if it will help.

I do not know why I am having these difficulties. I'm running my robot usually on rpi but I have encountered this problem with my laptop also (Linux mint) so I do not think that load on OS is causing the problem.

I hope you can figure out some kind of solution to your problem.

ralienpp commented 3 years ago

Thanks for sharing your experience. I too made a custom function that solves my particular problem, though ideally it would be great if we could also make a general tweak in this library so it becomes more flexible.

I think the issue is at an ideological level, I believe the decision whether to use loop or loop_start (the one that starts a background thread for MQTT communications) should be made in the connect function, rather than the subscribe function (as it is now). However, we need some input from @randomsync, to understand why the current design is the way it is, maybe there are some unforeseen effects that it takes care of.

p.s. for future archaeologists, an easy workaround to this problem is to use the Connect keyword again after running a test instruction that is known to block the main thread for a long time (to be determined empirically).

randomsync commented 3 years ago

Thank you both for the discussion and feedback. I can take a look at the issues you're encountering as time permits, but if you have a PR or a branch that replicates the issue and/or has a fix, that would speed things up.

juha-ylikoski commented 3 years ago

I might have time next weekend to create pr with background looping instead of current looping mechanism.

However I do not know how I would validate this will fix this inconsistent behavior. I do not think I could write any a-tests for this due to this inconsistent behavior being caused by (allegedly) robot framework keyword execution, python or os.

juha-ylikoski commented 3 years ago

It seems the atests are failing and maybe due to this problem https://travis-ci.org/github/randomsync/robotframework-mqttlibrary/jobs/758261351 https://travis-ci.org/github/randomsync/robotframework-mqttlibrary/jobs/758261350

axi92 commented 2 years ago

I have the same issue over and over, is this still being worked on it? @juha-ylikoski :smiley:

juha-ylikoski commented 2 years ago

@axi92 I believe that as per #27 I created fork of this repo and created pull request but due to some test cases failing and not really being sure if we would want to change them or the implementation this was forgotten. For my usecase I created my own library with similar looping as the pr has (for internal usage at company) and it has worked well. (We anyway needed some extensions over this library / wrapper around it).

I believe you can install the fork with cmd pip install git+https://github.com/juha-ylikoski/robotframework-mqttlibrary.git@background-looping

randomsync commented 2 years ago

I admit that I have neglected this due to work and other commitments. And I likely can't commit to this in near future either, but this is definitely in my backlog. I hope the fork @juha-ylikoski (🙇🏽 ) provided works for now.

axi92 commented 2 years ago

Thank you for your fork @juha-ylikoski , but I still got the disconnect sometimes, right now there is no reconnect implemented or is there?

Edit: After some days of testing, @juha-ylikoski 's fork is way more stable. All keywords that I use are working most of the time. There is sometimes a connection los but not as often as before.

piotrZ-commit commented 12 months ago

Hi @randomsync, Could You update us for status in this issue? We are still facing it

[EDIT] Even when I am using loop_start() method in subscribe() and publish() methods this issue is visible, but not as often as withoiut my modification

randomsync commented 10 months ago

Hi folks, I'm not able to commit to this project anymore. Last I looked, it needs some effort to upgrade it to use the latest paho client. I hope someone can take over this project. Feel free to reach out if you can and I'll be happy to discuss what needs to be done.