Open emilkhshiboun opened 1 year ago
I am not able to reproduce this bug, so could you share some details? When running quit
, the client should already be closing the stream. If the P4Runtime server is implemented correctly, the for loop in stream_recv
should break.
Are you using simple_switch_grpc?
These are the steps I used:
1) Start simple_switch_grpc with simple_switch_grpc --no-p4 --log-console
2) Start the p4runtime-shell with python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json
3) Type quit
in the shell; in my case it exited immediately
I tried the steps above a couple of times.
antonin@ubuntu-22:~$ simple_switch_grpc --no-p4 --log-console
Calling target program-options parser
Server listening on 0.0.0.0:9559
[14:03:55.511] [bmv2] [I] [thread 7705] Starting Thrift server on port 9090
[14:03:55.512] [bmv2] [I] [thread 7705] Thrift server was started
[14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'ipv4_lpm': NoAction -
[14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'forward': NoAction -
[14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'send_frame': NoAction -
[14:03:57.482] [bmv2] [D] [thread 7723] simple_switch target has been notified of a config swap
[14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'ipv4_lpm': NoAction -
[14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'forward': NoAction -
[14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'send_frame': NoAction -
[14:04:00.393] [bmv2] [D] [thread 7724] simple_switch target has been notified of a config swap
(venv) antonin@ubuntu-22:~$ python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json
*** Welcome to the IPython shell for P4Runtime ***
P4Runtime sh >>> quit
(venv) antonin@ubuntu-22:~$ python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json
*** Welcome to the IPython shell for P4Runtime ***
P4Runtime sh >>> quit
(venv) antonin@ubuntu-22:~$
I am not able to reproduce this bug, so could you share some details? When running
quit
, the client should already be closing the stream. If the P4Runtime server is implemented correctly, the for loop instream_recv
should break. Are you using simple_switch_grpc?These are the steps I used:
- Start simple_switch_grpc with
simple_switch_grpc --no-p4 --log-console
- Start the p4runtime-shell with
python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json
- Type
quit
in the shell; in my case it exited immediatelyI tried the steps above a couple of times.
antonin@ubuntu-22:~$ simple_switch_grpc --no-p4 --log-console Calling target program-options parser Server listening on 0.0.0.0:9559 [14:03:55.511] [bmv2] [I] [thread 7705] Starting Thrift server on port 9090 [14:03:55.512] [bmv2] [I] [thread 7705] Thrift server was started [14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'ipv4_lpm': NoAction - [14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'forward': NoAction - [14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'send_frame': NoAction - [14:03:57.482] [bmv2] [D] [thread 7723] simple_switch target has been notified of a config swap [14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'ipv4_lpm': NoAction - [14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'forward': NoAction - [14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'send_frame': NoAction - [14:04:00.393] [bmv2] [D] [thread 7724] simple_switch target has been notified of a config swap
(venv) antonin@ubuntu-22:~$ python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json *** Welcome to the IPython shell for P4Runtime *** P4Runtime sh >>> quit (venv) antonin@ubuntu-22:~$ python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json *** Welcome to the IPython shell for P4Runtime *** P4Runtime sh >>> quit (venv) antonin@ubuntu-22:~$
Hi Anton, I'm not using simple_switch_grpc, i am using custom p4 server.
I might be missing something here, but I could not find any information on the p4 specification regarding client disconnecting, what do you mean the p4server implemented incorrectly ?
From the Controller code in tear_down(), no message/signal is sent to the p4 server, so the server can not know when the client disconnected (correct me if i am wrong)
Here is my output: (running with -v)
DEBUG:root:Creating P4Runtime client
DEBUG:root:Connecting to device 1000 at 127.0.0.1:9559
DEBUG:root:Using insecure channel
DEBUG:root:Session established, client is 'primary'
DEBUG:root:Setting forwarding pipeline config
DEBUG:root:Retrieving P4Info file
DEBUG:root:Parsing P4Info message
DEBUG:asyncio:Using selector: EpollSelector
*** Welcome to the IPython shell for P4Runtime ***
DEBUG:asyncio:Using selector: EpollSelector
P4Runtime sh >>> quit()
DEBUG:root:Cleaning up stream
The above stuck forever
I'm not using simple_switch_grpc, i am using custom p4 server.
That could explain it. Probably a bug in the implementation of the server.
I might be missing something here, but I could not find any information on the p4 specification regarding client disconnecting, what do you mean the p4server implemented incorrectly ?
I assume you mean P4Runtime specification here, and not P4 specification.
This is out-of-scope of P4Runtime, or rather it is not specific to P4Runtime. This is about correctly implementing the P4Runtime service - and more specifically the StreamChannel
RPC) in your gRPC server. So it's more about gRPC bi-directional streams, and less about p4Runtime. The correct implementation will depend on 1) the programming language you use for your gRPC server, and 2) whether you use synchronous or asynchronous gRPC APIs in the server.
The reference P4Runtime server (C++) can be found here: https://github.com/p4lang/PI/blob/main/proto/server/pi_server.cpp
In my experience, it is easier to use synchronous gRPC, while asynchronous APIs require more care when dealing with streams. I don't know which one you are using in your custom P4 server.
During the teardown process (i.e when issuing quit() command), the controller tries to close all resources including the StreamChannel (StreamChannel is started on controller initaliazation).
StreamChannel events are handled by separated thread, which listens for gRPC messages/replies from the P4Agent, when there is no requests, the StreamChannel blocks, which in turn blocks the thread. As part of the teardown, the controller join() the StreamChannel thread, which does not exit as there are no requests from the P4Agent and thus the controller hangs forever.
The fix uses TimeoutIterator to query StreamChannel events with given timeout, if no events are received the loops continues and thus can check for program termination
This commit adds dependecy on 'iterators' module that implments the TimeoutIterator