quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.76k stars 380 forks source link

quinn_proto hangs in connection.handle_event() method #1475

Closed jodiserla closed 1 year ago

jodiserla commented 1 year ago

Hello, I am implementing quinn_proto within Kompact, a distributed actor framework which has its own event loop(reason I chose quinn_proto instead of quinn_rs) as a part of my master thesis.

I am encountering couple of problems with the API, which I am having a hard time figuring out. The test case I am trying to execute is following: // Sets up two KompactSystems with 2x actor Pingers and actor Pongers and QUIC as the network protocol. One Ponger is registered by UUID, // the other by a custom name. One Pinger communicates with the UUID-registered Ponger, // the other with the named Ponger. Both sets are expected to exchange PING_COUNT ping-pong // messages. This is the test, in mu case In my case I send 10 ping pongs between the actors messages.https://github.com/jodiserla/kompact/blob/5836a4697b04309ffc407c45ece8ebf5a9d959ff/core/tests/dispatch_integration_tests.rs#L339

However, when I execute the test case, the actors send varying number of ping pongs between them and then the test starts to "hang" until the test times out by itself. It seems that QUIC stops on the handle_event() function and doesn't do any further than that - causing the test to hang and timeout. Here is the code for calling the API methods https://github.com/jodiserla/kompact/blob/5836a4697b04309ffc407c45ece8ebf5a9d959ff/core/src/net/quic_endpoint.rs#L142 I also noticed that the endpoint.poll_transmit() doesn't seem to get called either which confuses me as well - https://github.com/jodiserla/kompact/blob/5836a4697b04309ffc407c45ece8ebf5a9d959ff/core/src/net/quic_endpoint.rs#L128 Any feedback on this would be greatly appreciated!

jodiserla commented 1 year ago

Correction: it does not "hang" in the handle_event(), it actually executes but doesn't do anything else. I guess then nothing gets queued for writing over the socket and therefore everything seems to hang. This leads me to thinking why the endpoint.poll_transmit() doesn't get triggered at any point?

Ralith commented 1 year ago

In QuicEndpoint::poll_and_handle, you overwrite self.timeout each time you process a connection, so only the value from the last connection processed is retained. I also don't see any logic that might schedule connections to be polled again after the timeout is reached.

jodiserla commented 1 year ago

Now I am a bit confused, I have been following the examples of using the API provided. Isn't the self.timeout overwritten here as well then? https://github.com/quinn-rs/quinn/blob/49cdfa553ee86eaefb0fceb88aac05240306d1bc/quinn-proto/src/tests/util.rs#L315 " I also don't see any logic that might schedule connections to be polled again after the timeout is reached." this part is also a bit unclear to me, when I reach the timeout I call the handle_timeout() method and afterwards I do call all the polling methods in the order listed from here https://docs.rs/quinn-proto/0.9.2/quinn_proto/struct.Connection.html. I am also still confused why the self.endpoint.poll_transmit() method never returns Some in my case.

On Thu, 5 Jan 2023 at 06:33, Benjamin Saunders @.***> wrote:

In QuicEndpoint::poll_and_handle, you overwrite self.timeout each time you process a connection, so only the value from the last connection processed is retained. I also don't see any logic that might schedule connections to be polled again after the timeout is reached.

β€” Reply to this email directly, view it on GitHub https://github.com/quinn-rs/quinn/issues/1475#issuecomment-1371800577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFY7T3X73C5HZTGSGU6A3K3WQZMLZANCNFSM6AAAAAATQSVMVI . You are receiving this because you authored the thread.Message ID: @.***>

Ralith commented 1 year ago

Isn't the self.timeout overwritten here as well then?

That test code supports only one connection per endpoint.

this part is also a bit unclear to me, when I reach the timeout I call the handle_timeout

poll_timeout tells you a time in the future at which you need to drive the connection again. You need to arrange for that to happen.

jodiserla commented 1 year ago

Hello again!, I am sorry, I was not very clear in the beginning. This project is a part of a master thesis, my prototype will only need to support one connection per endpoint as well. What I am trying to do is set up 2 Kompact systems(endpoints), which connect to each other on the same ConnectionHandle and thereafter, I will send various ping/pong messages between actors within the Kompact systems. Streams should be opened up for the actors, the actors communicate on the same connection, but with different streamIDs. Hence, the prototype doesn't need to be perfect but needs to work for evaluation purposes only πŸ’© I will look into the poll_timeout, but I am still curious about why endpoint.poll_transmit() always returns None https://github.com/jodiserla/kompact/blob/5836a4697b04309ffc407c45ece8ebf5a9d959ff/core/src/net/quic_endpoint.rs#L128

On Thu, 5 Jan 2023 at 20:25, Benjamin Saunders @.***> wrote:

Isn't the self.timeout overwritten here as well then?

That test code supports only one connection per endpoint.

this part is also a bit unclear to me, when I reach the timeout I call the handle_timeout

poll_timeout tells you a time in the future at which you need to drive the connection again. You need to arrange for that to happen.

β€” Reply to this email directly, view it on GitHub https://github.com/quinn-rs/quinn/issues/1475#issuecomment-1372636723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFY7T3QOXFLXPE4LCRFFNL3WQ4NZPANCNFSM6AAAAAATQSVMVI . You are receiving this because you authored the thread.Message ID: @.***>

djc commented 1 year ago

@jodiserla have you already looked at the source code for the quinn-proto functions you reference? IIRC the top-level functions you mention should be fairly readable (if not, we're happy to answer further questions) and will maybe help you improve your understanding of how the different components are supposed to work together.

jodiserla commented 1 year ago

Hello Dirkjan, I did look at some of it initially, but to be honest, I am fairly new to Rust and my study focus has mainly been on distributed systems so far(actor systems in case of the thesis) therefore, this is the first time I work with such a low-level networking API. However, this prototype should be fairly simple, not needing to think about multiple connections or needing certifications that aren't self-signed etc. I have been struggling for quite some time to make the prototype work and am falling a bit behind schedule on returning the thesis, therefore, I just wanted to reach out to see whether the problems with my tests were originating with quinn-proto or if I need to look further into the integration of it in Kompact. I see that if I log the contents of the endpoint the transmits always show an empty array, which I find weird since when I call the handle function on the endpoint the transmits should be enqueued and I should be able to poll it and get Some(transmit) - (unless I am greatly misunderstanding how the API works). I just want to get that cleared, since I tried to implement the API after the examples and docs provided(which should be sufficient in this solution).

Thank you for your fast responses, and sorry about my confusion! It is a bit hard to wrap your head around these things when you are working on them by yourself πŸ˜‚

On Thu, 5 Jan 2023 at 22:54, Dirkjan Ochtman @.***> wrote:

@jodiserla https://github.com/jodiserla have you already looked at the source code for the quinn-proto functions you reference? IIRC the top-level functions you mention should be fairly readable (if not, we're happy to answer further questions) and will maybe help you improve your understanding of how the different components are supposed to work together.

β€” Reply to this email directly, view it on GitHub https://github.com/quinn-rs/quinn/issues/1475#issuecomment-1372826336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFY7T3QICSFEAUNV6BS6SWLWQ47IZANCNFSM6AAAAAATQSVMVI . You are receiving this because you were mentioned.Message ID: @.***>

Ralith commented 1 year ago

Transmitting data often requires a timeout to elapse first, e.g. for pacing. Your code does not make any obvious attempt to schedule the connection to be polled again at that time.

Ralith commented 1 year ago

Closing for lack of response. Feel free to reopen if something remains unclear.