[feature] Retry sending of failed outgoing activities

superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.

https://docs.gotosocial.org

GNU Affero General Public License v3.0

3.83k stars 334 forks source link

[feature] Retry sending of failed outgoing activities #428

Open tsmethurst opened 2 years ago

tsmethurst commented 2 years ago

We still don't have any mechanism for retrying outgoing federation messages when they fail to be delivered.

This should be fixed, so that if a remote instance goes down, then when it comes up again it can still receive messages from GoToSocial that were sent while it was down.

To implement this, we need:

To track when and why delivery of a message fails and schedule a redelivery attempt if appropriate.
To implement a backoff mechanism so that retries of the same message are spaced out.

And we should:

Allow instance admins to see how many undelivered messages they have pending, or which instances delivery has properly 'failed' for (ie., run out of retry attempts).

NyaaaWhatsUpDoc commented 2 years ago

I think even after #564 is merged, we should still keep this open as there is still the question of longer backoffs and an actual queuing system for failed deliveries in the future. It would require a bit of a rework of how we batch deliveries and deal with expected errors, but it would make the final item here of an "undelivered messages status page" type thing much easier also.

tsmethurst commented 2 years ago

Agreed! I think we can take it off the current milestone since we have a stopgap implementation, and then we can have a longer discussion in a future release milestone about how to do this. Does that sound OK?

NyaaaWhatsUpDoc commented 2 years ago

Sounds great to me :)

igalic commented 2 years ago

retries should be exponentially back-off. retries need to be grouped / mapped by the target server, or even its IPv4/6.

i know it's not our job to monitor / healthcheck our peers, but it is our job to not DoS them.