vmagamedov / grpclib

Pure-Python gRPC implementation for asyncio
http://grpclib.readthedocs.io
BSD 3-Clause "New" or "Revised" License
936 stars 92 forks source link

AttributeError: 'H2Connection' object has no attribute '_frame_dispatch_table' #156

Open libka-b opened 2 years ago

libka-b commented 2 years ago

From time to time, in our applications that do client calls to gRPC APIs we see this kind of error:

Traceback (most recent call last):

  File "/usr/local/lib/python3.8/asyncio/sslproto.py", line 546, in data_received
    self._app_protocol.data_received(chunk)

  File "/usr/local/lib/python3.8/site-packages/grpclib/protocol.py", line 712, in data_received
    events = self.connection.feed(data)

  File "/usr/local/lib/python3.8/site-packages/grpclib/protocol.py", line 189, in feed
    return self._connection.receive_data(data)  # type: ignore

  File "/usr/local/lib/python3.8/site-packages/h2/connection.py", line 1463, in receive_data
    events.extend(self._receive_frame(frame))

  File "/usr/local/lib/python3.8/site-packages/h2/connection.py", line 1487, in _receive_frame
    frames, events = self._frame_dispatch_table[frame.__class__](frame)

AttributeError: 'H2Connection' object has no attribute '_frame_dispatch_table'

I found these lines in the grpclib, that seem to be responsible for the above error.

When I set a breakpoint on the incriminated line, the data passed to the _receive_frame method were this:

frame = HeadersFrame(stream_id=9, flags=['END_HEADERS']): exclusive=False, depends_on=0, stream_weight=0, data=<hex:887689aa6355e580ae16...>

I don't know much about http/2, but according to this, the frame is valid. Is it possible that the connection objects (_frame_dispatch_table) were deleted prematurely?

grpclib == 0.4.2 python 3.8+

Let me know if some more debug info is needed, I will try to provide as much of it as possible

vmagamedov commented 2 years ago

Can you search in your logs the reason why connection was closed? Possible reasons:

Maybe you can comment those lines where grpclib removes _frame_dispatch_table attribute to get more specific errors.

libka-b commented 2 years ago

The reason is lost TCP connection, but that seems to actually be ok in most cases, and recover. But sometimes it fails with the above exception - race condition?

When I commented out the removal of _frame_dispatch_table and added some counters on the Connection.close() method, the method is called many more times (100x) in comparison with H2Protocol.connection_lost() method, but eventually everything works fine.

Gobot1234 commented 2 years ago

I've also been experiencing this on BetterProto's CI. https://github.com/danielgtaylor/python-betterproto/runs/6114120556?check_suite_focus=true and https://github.com/danielgtaylor/python-betterproto/runs/6114156605?check_suite_focus=true

vmagamedov commented 2 years ago

@Gobot1234 Fixed your case in https://github.com/vmagamedov/grpclib/commit/d02ef84e6057cf889ed19f396da9325d77ad67b5. As you can see it was caused by ChannelFor testing utility - calling Protocol.data_received after connection was closed.

Gobot1234 commented 2 years ago

@Gobot1234 Fixed your case in https://github.com/vmagamedov/grpclib/commit/d02ef84e6057cf889ed19f396da9325d77ad67b5. As you can see it was caused by ChannelFor testing utility - calling Protocol.data_received after connection was closed.

Thank you!