python-hyper / hyper

HTTP/2 for Python.
http://hyper.rtfd.org/en/latest/
MIT License
1.05k stars 191 forks source link

self._sock is None inside of _single_read #256

Closed 72squared closed 8 years ago

72squared commented 8 years ago

I opened this with the apns2 package maintainer since that is where I discovered the bug but I think this is really a problem inside of hyper:

https://github.com/Pr0Ger/PyAPNs2/issues/11

self._sock is None when it hits this line:

https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L644

It is a rare occurance but it does happen. Any thoughts on what could be causing it?

Lukasa commented 8 years ago

So the _sock is set to None if the connection is by the remote peer, either cleanly or not.

This is definitely a behaviour I don't love: if the remote peer closes the connection with error code NO_ERROR, hyper doesn't handle it very gracefully. The ideal behaviour would be to quiesce the connection. That's quite a substantial change, however.

In the meantime, can I get a feel for how you're using the code? Are you running multithreaded, or from a single thread?

Lukasa commented 8 years ago

In the short term the easiest fix is probably to raise an exception in all connection termination cases: at least that way your code would be able to spot this and handle it as appropriate.

72squared commented 8 years ago

I am using the library through pyapns2:

https://github.com/Pr0Ger/PyAPNs2

We schedule an apple push notification to an iOS device as a background job, and a thread from a thread pool in a background job handles the request and uses pyapns2 (which uses hyper) to send a push notification through apple's http2 api backend.

I am going to try to see if it makes a difference if I use Greenlet thread workers rather than python native threads. But because I see it happen simultaneously on different servers I suspect it is apple's API dropping connections.

In that case, an exception on connection termination would be fine by me because it would be much easier for my application to correctly handle. A TypeError exception is hard to do anything intelligent about.

72squared commented 8 years ago

Could you make it raise a Connection exception if the connection closes and there are still open stream requests associated with the connection? Not sure if that is possible. Because I think if the connection gets closed and there are no outstanding requests there is no real problem. The next request would cause a new connection to be made. It's only if you are waiting on a response or in the middle of sending a request that there is a problem.

72squared commented 8 years ago

Looking at the code, here's where I can see a problem:

https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L678-L682

If you have multiple streams outstanding and the connection terminates, you close the connection, making self._sock = None. then if other threads are in progress reading from different streams on the same connection at the same time they could run into a problem in lots of other spots.

Lukasa commented 8 years ago

The answer is yes, we can make it throw exceptions if there are active streams. I think, in fact, that there is an issue here related to that. Certainly, however, I think that's the best way to go. =)

72squared commented 8 years ago

since you are inside a read_lock, seems like you may just need to check if self._sock is None:

https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L743-L744

And maybe handle by stashing the connection error on close() so that you can re-use that reason in the stream context.

72squared commented 8 years ago

The race condition I am seeing is two threads arrive at this line in the code simultaneously: https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L735

one obtains the read lock. it goes inside of _single_read and hits this line, closing the connection:

https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L682

Then the lock is released and the second thread obtains a read lock and moves into _single_read only now self._sock is None and nothing else checks it.

Lukasa commented 8 years ago

Yup, that seems sensible to me.

Are you interested in writing a PR for this?

72squared commented 8 years ago

Yeah I can take a stab at it. I don't have a lot of context because I never even looked at the hyper library source code until last night. I doubt I could write a great test for it to prove the issue is fixed. But I will give it a try and see how it goes.

Lukasa commented 8 years ago

Thanks! =) I'm happy to help you through it.

72squared commented 8 years ago

Looking at the design, I see one other potential problem.

Imagine this scenario:

request A and B are in flight. request A obtains the lock. It hits some kind of connection error and we close(). Then request C comes in and makes a request BEFORE request B tries to get the response. This causes a new _sock connection to be populated. At that point, request B has a stream id that wasn't issued by the old _sock connection. What would the behavior be in that case?

It seems like what we need is for streams to be tied to a given _sock and once the sock is closed it won't get paired up with a different sock accidentally.

72squared commented 8 years ago

I also realized that raising an exception on close is not enough because if it is a threaded application, the exception will raise in the thread that first encounters it and the other threads won't hit the exception. So we'd need to also raise exceptions when the stream read happens in the other threads. Which means saving the association between thread and sock and the state associated with that sock.

Lukasa commented 8 years ago

@72squared So the reason the socket goes away is, in essence, to reset the state of the connection object so it can be transparently re-used. That's not necessarily a good thing to do automatically. It may be better to keep a kind of 'close state' around and let users manually reset the state if they need to. That would allow us to throw the exceptions from multiple threads if needed.

72squared commented 8 years ago

The way I would probably solve this is to have a lightweight object wrapper around self._sock. This object would be passed into to a stream wrapper constructor at the time of the outbound request so that when we go to read the stream we use that same sock wrapper that was used when the stream id was issued. And this will also allow us to make sure if the connection is closed, that the stream continues to see that state, even if a new self._sock object has been created.

I need to study the abstraction layer around streams first though.

72squared commented 8 years ago

What if on close you do something like this:

            elif isinstance(event, h2.events.ConnectionTerminated):
                # If we get GoAway with error code zero, we are doing a
                # graceful shutdown and all is well. Otherwise, throw an
                # exception.
                for stream in self.streams.values():
                    stream.receive_end_stream(event)
                self.close()

then inside of _recv_cb method where we are waiting with the read lock, once we obtain it, we make sure the stream isn't closed and that the stream id is still inside of the self.streams. Actually, I think self.streams is reset to an empty set on close() so maybe just checking to see if stream_id is in self.streams ?

Lukasa commented 8 years ago

We probably want a way to signal to the stream that it has been forcibly closed, but yeah, this could work. =)

72squared commented 8 years ago

Review the unit tests. I think this will solve it. let me know what you think.