vacp2p / nim-libp2p

libp2p implementation in Nim
https://vacp2p.github.io/nim-libp2p/docs/
MIT License
240 stars 52 forks source link

fix: stop services after transports #1138

Closed diegomrsantos closed 4 days ago

diegomrsantos commented 5 days ago

https://github.com/vacp2p/nim-libp2p/pull/843 Services stop duplication, but services should be stopped after Transports, as they are higher level.

kaiserd commented 5 days ago

Looking a the errors Unhandled defect: An attempt was made to complete a Future more than once., it seems the services close resources the transport tries to close again.

I agree it would make sense to close services after transports though. Maybe the services close resources they should not?

diegomrsantos commented 5 days ago

Looking a the errors Unhandled defect: An attempt was made to complete a Future more than once., it seems the services close resources the transport tries to close again.

I agree it would make sense to close services after transports though. Maybe the services close resources they should not?

I'm surprised that the tests failed.

diegomrsantos commented 5 days ago

@lchenut onReservation is being called when the switch is stopped with an empty list of addresses https://github.com/vacp2p/nim-libp2p/blob/100f3188ed536f13408a9d7ecd8f911afd319e84/libp2p/services/autorelayservice.nim#L99

diegomrsantos commented 5 days ago

@lchenut onReservation is being called when the switch is stopped with an empty list of addresses

https://github.com/vacp2p/nim-libp2p/blob/100f3188ed536f13408a9d7ecd8f911afd319e84/libp2p/services/autorelayservice.nim#L99

this fixes the tests, but not sure if it's the right way of doing it:

if not self.onReservation.isNil() and self.relayAddresses.len > 0 and switch.started:
  self.onReservation(concat(toSeq(self.relayAddresses.values)))
lchenut commented 4 days ago

It looks good to me. It should probably be necessary to write a similar condition here aswell https://github.com/vacp2p/nim-libp2p/blob/100f3188ed536f13408a9d7ecd8f911afd319e84/libp2p/services/autorelayservice.nim#L57

diegomrsantos commented 4 days ago

Why is onReservation being called? No reservations happened. Switch's started field is private and thus not used anywhere else in the system.

diegomrsantos commented 4 days ago

self.relayAddresses.len > 0 fixes two tests, but without switch.started, "Three relays connections" still fails.

lchenut commented 4 days ago

https://github.com/vacp2p/nim-libp2p/blob/100f3188ed536f13408a9d7ecd8f911afd319e84/libp2p/services/autorelayservice.nim#L57-L58 Here onReservation is called because we just connect someone and made a reservation, so we send an up to date list of address. https://github.com/vacp2p/nim-libp2p/blob/100f3188ed536f13408a9d7ecd8f911afd319e84/libp2p/services/autorelayservice.nim#L98-L99 Here onReservation is called because some remote peer just disconnect or close connection, so we send an up to date list of address.

lchenut commented 4 days ago

And Three relay connections still fails because you first stop the Transport, thus the service will receive the disconnection and try to update and send the newly list of addresses without those who disconnects.

One thing you could do to prevent this behavior is to add a fifth state in Three relay connections, something like TestFinished and it'll work just fine...

But to be perfectly honest, I'm not 100% certain this PR is useful. My gut feeling says that we should stop Services (which indirectly uses the Transport) before closing the Transports.

diegomrsantos commented 4 days ago

But to be perfectly honest, I'm not 100% certain this PR is useful. My gut feeling says that we should stop Services (which indirectly uses the Transport) before closing the Transports.

I think you're right and I'll probably close it.

Here onReservation is called because some remote peer just disconnect or close connection, so we send an up to date list of address.

So onReservation name seems misleading and a renaming might be good. Additionally, shouldn't the Autorelay Service and its tests handle this situation correctly and not fail if unrelated things are moving around?

lchenut commented 4 days ago

So onReservation name seems misleading and a renaming might be good. Additionally, shouldn't the Autorelay Service and its tests handle this situation correctly and not fail if unrelated things are moving around?

I agree that onReservation name could be misleading.

Autorelay Service's test handles this specific case, it's the Relay2UnreservedAndRelay1Reserved state in Three relays connections. What I haven't anticipated is that doing await switchClient.stop() will cause onReservation to be called even though it's setup to be a service of switchClient. Which is, in my opinion, a bit weird.

diegomrsantos commented 4 days ago

What I haven't anticipated is that doing await switchClient.stop() will cause onReservation to be called even though it's setup to be a service of switchClient. Which is, in my opinion, a bit weird.

Could you elaborate on what is weird in your opinion?

lchenut commented 4 days ago

Let's take an example:

I find it weird because we are stopping SwitchA. So we know we are disconnecting. I'm not sure it's a relevant point to indicate, after you decide to stop that, you indeed stopped. It's redundant. I might be wrong, though. But if we accept to stop services after stopping transports, I will prefer a re-write of the autorelay test instead of adding and self.relayAddresses.len > 0 and switch.started (even though I said it looked good two hours ago, I changed my mind about that)

diegomrsantos commented 4 days ago

I'm going to close this PR, we should stop the services first. One way to think about it is we start "things" and add them to an imaginary stack. When we want to stop them, we pop and stop, i.e. we stop in the reserve order that they were added. See https://github.com/vacp2p/nim-libp2p/pull/1095#discussion_r1595158121.

Even after this PR is closed and the tests pass, the service remains brittle if it behaves unexpectedly when unrelated changes occur.