n0-computer / iroh

peer-2-peer that just works
https://iroh.computer
Apache License 2.0
2.63k stars 166 forks source link

iroh-gossip: connections and subscriptions are not cleaned up #2962

Open dignifiedquire opened 1 week ago

dignifiedquire commented 1 week ago

When dropping the subscription stream, neither connections nor subscriptions are fully cleaned up.

from @Frando

I found two bugs likely:

1 - there's no notification once a receiver stream on the user api side is dropped. we need to notify the actor about that. there is already logic in the actor to quit a topic once all event receivers and command senders are dropped; however it only triggers on sender drops and when failing to send an event, but not once the event receiver stream is dropped. likely we will have to add something like a oneshot sender plus atomic bool that notifies the actor, so that it can remove its event sender and check if the topic is ripe for quitting.

2 - more importantly, after a topic is quit the connection is not actually terminated. here we're back to "how to terminate a quic connection gracefully" territory, and I fear that this is not properly solvable without a wire breaking change, because as we know quic connections may only be closed by the receiver of the last message. however there is no clear last message; the protocol is message oriented and currently the logic would be "close the connection after the last message is sent" which is not possible in quic (hyparview/plumtree was designed with raw udp in mind). without a wire change we could resort to timeouts: clear connections 10s after the terminal message was sent. with wire breaking, we can add proper closing, e.g. with the function I wrote for iroh-willow (which uses uni streams to gracefully close).

Frando commented 4 days ago

Some more details from when I had a quick look at this last week and wrote the above: