njh / ruby-mqtt

Pure Ruby gem that implements the MQTT protocol, a lightweight protocol for publish/subscribe messaging.
http://www.rubydoc.info/gems/mqtt
MIT License
541 stars 135 forks source link

Synchronize writes and blocking socket reads #105

Closed camlow325 closed 5 years ago

camlow325 commented 6 years ago

When testing out the ruby-mqtt library on JRuby 9k, I ran into a problem where overlapping blocking reads and writes made to the same MQTT socket could intermittently lead to a deadlock. I have not yet encountered a similar deadlock in any testing that I've done so far with MRI Ruby versions. For JRuby 9k, I filed a related issue at https://github.com/jruby/jruby/issues/4854. I suspect that the work to address this in JRuby could be fairly involved and may not be done anytime soon. In the meantime, I've seen that by synchronizing all calls which do blocking reads on the socket -- i.e., for the calls inside of MQTT::Packet.read which do several blocking byte reads -- along with any writes to the socket that the deadlocks go away. I put up this PR with that change for reference, but I'd understand any hesitance to take this change considering the outstanding work that could be done to address the underlying issue in JRuby. Where the deadlock is timing-related and seemingly Ruby interpreter-specific, I couldn't think of a good spec test to include with this PR but I'd be happy to add one on if someone can think of one.

njh commented 6 years ago

I suspect that it is unlikely that this change is likely to cause performance problems, but it seems a mistake not to keep reads and writes asynchronous.

Would an alternative be to do a conditional include JRuby::Synchronized (as suggested in jruby/jruby#4854) when running inside jruby?

camlow325 commented 6 years ago

I think the JRuby::Synchronized approach would be a reasonable alternative for having the read/write synchronization be done for JRuby only, assuming you are okay with having JRuby-aware code in this project.

I initially tried doing the include JRuby::Synchronized into the MQTT:Client class. That resulted in really bad performance for a particular test case that I was exercising. The test involves publishing and subscribing to receive back 1000 messages from a broker, writing each received message to stdout. When running under JRuby, I'm used to that test taking no more than a couple of seconds to complete. With include JRuby::Synchronized in the MQTT:Client class the test case did not encounter any deadlocks but took several minutes to run.

As an alternative, I tried just extending the socket instance that the MQTT::Client class creates to use JRuby::Synchronized, as is described here. I ran my same test example through about 100 times with no deadlocks occurring. Typically, I'd see the deadlock maybe 10 - 20% of the time when running the test, with the first occurrence after no more than 10 iterations. The performance for the test was also similar with JRuby::Synchronized as with the original locking approach that I had included on this PR. In theory, it seems like doing a synchronize around every method on the socket instance could result in poorer performance than what I had in the original commit for the PR since the lock would need to be taken for every byte read when parsing a packet. Maybe this is practically inexpensive enough, though, that the difference wouldn't matter.

I modified the PR with a commit, 368ffed, that extends the socket instance with JRuby::Synchronized. If you think performance might be a concern with this, I could also go back to something like the original approach I used of locking around the whole MQTT::Packet.read call in MQTT::Client.receive_packet but only have the lock be taken when running under JRuby so that reads and writes can remain asynchronous on MRI and other Ruby interpreters.

Thanks again for your help and for looking at this.

camlow325 commented 5 years ago

The issue that I had filed to the JRuby team, https://github.com/jruby/jruby/issues/4854, was fixed in JRuby 9.1.15.0. I have re-run my tests against JRuby 9.2.0.0 without the synchronization logic added to this PR and can no longer reproduce the hangs that I had previously been seeing. Given that, I'm going to close out this PR. Thanks again for your comments and for taking a look at this.