iande / onstomp

A STOMP messaging client library for Ruby
http://mathish.com/projects/onstomp.html
Other
23 stars 11 forks source link

Fix an issue with ssl connection failures #36

Open l8nite opened 5 years ago

l8nite commented 5 years ago

The behavior I witnessed was the connection proceeding as normal until it would suddenly terminate with an unexpected error. The error was just "read would block".

I found some documentation on the fix here: http://ruby-doc.org/stdlib-2.4.1/libdoc/openssl/rdoc/OpenSSL/Buffering.html#method-i-read_nonblock

Edit: Build failure appears to be jruby. I see there's already some limited support for varying the nonblock method approach based on the underlying socket library, so maybe that needs to be extended to check for the appropriate ruby runtime?

iande commented 5 years ago

I’ll need to take a closer look at this in the morning. I was going to chalk this up in changes to OpenSSL between 1.9.3 and now, but ruby-doc.org has proven me wrong. The ready_for_read? and ready_for_write? methods were originally written to address this sort of thing, but they may not have detected certain edge cases or underlying IO handling may have changed in the years since I did all this.

At any rate, I just want to review this in a bit more depth and maybe construct a test scenario around it before merging it but it should be done by tomorrow afternoon (in case you’re participating in hacktoberfest.)

As for the failing JRuby specs, I think those are somewhat flakey tests (or a problem between my serialization assumptions and jruby/java IO processing.) I’ve seen those failures before and it’s not something I’m too concerned about at this time.

l8nite commented 5 years ago

Heads up, I think there's some bug in this with message sizes larger than MAX_BYTES_PER_READ - ran into it today. The read_buffer just hangs waiting for more data and never gets a complete frame to dispatch.

l8nite commented 5 years ago

@iande I'm at a loss on this one (not super familiar with socket I/O in Ruby), but for the moment I hacked around our local problems by forking and setting the MAX_BYTES_PER_READ to a much higher value.

iande commented 5 years ago

I think I need to spend a little more time with this one than I expected. You might be onto something with the higher value of MAX_BYTES_PER_READ. I'm going to set aside time this weekend to revisit the IO code and make sure I understand why I'm doing some of the things I am doing.

iande commented 5 years ago

Also, I want to address the flakiness of these frame ordering tests under jruby.

iande commented 5 years ago

I may have an explanation for what is going on. I need to work out a test for it, but my suspicion based on your report about messages larger than MAX_BYTES_PER_READ makes me think that my non-blocking approach is wrong.

Speculation follows: I think what happens is that after a read, if the bytes that remain aren't enough, then reading returns EWOULDBLOCK (raising WaitReadable) and we get stuck in a loop that can only be broken by more data coming in. Depending on message flows, that may take a long time, or may no ever happen.

l8nite commented 5 years ago

Check this explanation on stackoverflow out that I found for how SSL sockets work: https://stackoverflow.com/a/36816869/447868

I think this is what I was running into.

Edit: Maybe not, it seems like read_nonblock in openssl takes care of some of these details, but like I said my ruby socket-fu is weak :)

iande commented 5 years ago

I think that may be related. While read_nonblock may take care of this, the use of IO.select may still be a contributing factor. I believe the behavior outlined in that stackoverflow response could produce a situation where the reader thread hangs indefinitely due to IO.select blocking when no timeout is specified, or where reads are simply skipped continuously because a call to IO.select with a timeout is returning nil.

I think this gives me enough to work from to improve SSL reads. In the meantime, I think your local patching of MAX_BYTES_PER_READ to a size >= your typical message payload will serve as a workaround.

I've got to juggle this with work and some appointments, but I plan on having this fixed by the end of this week at the latest.

l8nite commented 5 years ago

No hurry from my side. Thanks for looking into it!