stakwork / sphinx-key

Lightning hardware signer on ESP32
28 stars 1 forks source link

Reconnect dance gets stuck if signer crashes upon receiving dance steps 1 or 2 #96

Closed irriden closed 12 months ago

irriden commented 1 year ago

Right now, if the signer crashes in the middle of processing dance step 1 or 2, it reconnects with a new client id, and broker is stuck in a forever loop trying to send the reconnect dance steps 1 or 2 to the previous client id.

Broker will also not do the reconnect dance with any clients that may reconnect afterwards, as the pub_and_wait function below never returns. As a result, it never processes any subsequent messages sent on the channel.

https://github.com/stakwork/sphinx-key/blob/9d7e8b751f8ae49a12516588ae0add0ca640e75f/broker/src/mqtt.rs#L64-L68

This is because when sending to a specific client ID (the case for the reconnect dance steps), pub_and_wait will keep sending the same message to the same client ID forever until some message is returned.

In case that message causes a crash signer side, signer reconnects with a different client ID, and hence broker will never get any response.

irriden commented 1 year ago

See also #97