near / mpc

42 stars 16 forks source link

Mitigate any protocol timeouts upon node going offline #837

Open ChaoticTempest opened 2 months ago

ChaoticTempest commented 2 months ago

Currently, when a node goes offline, it's triple/presignature/signature protocol will timeout. But these timeouts can be very long. Ideally, nodes should have enough information to short circuit this timeout. Something like checking in progress protocols against currently active/stable participants should do the trick but there might be better ways of checking this

volovyks commented 2 months ago

Is it a popular issue? I guess that nodes are mostly online and rarely go offline.

ChaoticTempest commented 1 month ago

Yes, nodes rarely go offline, but this issue is to mitigate worst cases that can potentially happen during a black swan event, such as multiple nodes going offline.

In such cases, the whole triple generation gets stalled to the extent of triple timeout period which can be long (10 minutes or longer according to config)

volovyks commented 1 month ago

Triples can wait since we should have plenty of them in the stockpile. However, such an event can cause issues with signature protocols, since it is always required to be operational. For signature protocol generation_timeout is 45s. It's not great, but not the narrow place for now.