Do not destroy object variables if _SSL_opening is true

toddr commented 7 years ago

In situations where the first start_SSL call (server side) fails due to SSL_WANT_READ, we're supposed to be able to re-try. Clearing all object variables on a retry can be disastrous as it triggers unwanted DESTROY events.

Added unit test to show this. It will sometimes succeed though 10 iterations seems to be enough to trigger it most times.

toddr commented 7 years ago

This is a rough approximation of the bug we found and mis-reported the other day. It's def a regression as a result of the change in 7ee0ba3ed24c7c2d47a3258cad09c4153528d69b. The joke is it only triggered in on of our unit tests, not even production code. Feel free to merge and or alter the test to meet your standards.

If it's not appropriate to re-run start_SSL in this circumstance, then we need to understand what we should instead be doing.

Thanks!

noxxi commented 7 years ago

I'm not sure why you call start_SSL multiple times if the first one failed. The way non-blocking accept and connect should be done is to call start_SSL once to upgrade the socket and then call accept_SSL or connect_SSL as long you get EAGAIN. If there is any documentation or example code out there which suggests that you should use multiple calls of start_SSL instead please let me know.

toddr commented 7 years ago

The unit test I included shows a pretty decent example of how it is going wrong for us. Does this help?

In my example I think you are suggesting that after the first fail for start_SSL I should switch to calling accept?

toddr commented 7 years ago

SSL_WANT_READ rather than EAGAIN Seems to be my use case

noxxi commented 7 years ago

Since you are trying to do a non-blocking connect you should first upgrade the socket object using start_SSL once and then call connect_SSL repeatedly as long $! is EAGAIN. And based on $SSL_ERROR you should either wait for readability or writeability on the socket for example with select instead of busy waiting.

toddr commented 7 years ago

In my case, when I call start_SSL, if it fails, then the object is downgraded before I get it back. Also $! is not set so testing EAGAIN doesn't work. All I can do is check this to see if a retry is possible.

$IO::Socket::SSL::SSL_ERROR =~ m{read first}

So a few questions have come up as we've been going through this:

As a result of the changes in 7ee0ba3ed24c7c2d47a3258cad09c4153528d69b, We get a failure now when we call start_SSL a second time when the first call failed. If your intention is for the sub to not be called a second time on the socket, then IMO it should die if _SSL_opening is set since it's clearly not in a stable state?
It's clear that start_SSL then accept afterwards doesn't work because of the object downgrade back to its previous state as a GLOB. Can you give me some example code of how you think this process should be working?
I keep thinking about the message "make sure to start with a fresh %{*self} in configure_SSL". Do you know of an example of where this happens? I can't seem to simulate it except in the case where start_SSL itself downgrades it.

Thanks for your time. Todd

noxxi commented 7 years ago

As a result of the changes in 7ee0ba3, We get a failure now when we call start_SSL a second time when the first call failed. If your intention is for the sub to not be called a second time on the socket, then IMO it should die if _SSL_opening is set since it's clearly not in a stable state?

Maybe one could add this additional check. But, one should not run into this condition if the module is properly used.

It's clear that start_SSL then accept afterwards doesn't work because of the object downgrade back to its previous state as a GLOB. Can you give me some example code of how you think this process should be working?

Let me cite from the documentation of start_SSL:

Note that if start_SSL() fails in SSL negotiation, $socket will remain blessed in its original class. For non-blocking sockets you better just upgrade the socket to IO::Socket::SSL and call accept_SSL or connect_SSL and the upgraded object. To just upgrade the socket set SSL_startHandshake explicitly to 0.

Currently you don't follow the documented behavior, i.e. you don't set SSL_startHandshake to 0 to just do the upgrade but instead start already with the SSL handshake and fail, which causes the downgrade of the socket. If you would follow the documentation instead it should work.

I keep thinking about the message "make sure to start with a fresh %{*self} in configure_SSL". Do you know of an example of where this happens? I can't seem to simulate it except in the case where start_SSL itself downgrades it.

Please see #56 for details on how this problem got triggered.

toddr commented 7 years ago

Thanks for your help. This makes more sense now. Closing.

noxxi / p5-io-socket-ssl

Do not destroy object variables if _SSL_opening is true #58