w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.2k stars 77 forks source link

Timeouts & Retries #365

Open FagnerMartinsBrack opened 1 year ago

FagnerMartinsBrack commented 1 year ago

SMTP allows for timeouts based on the size of the message, is there any timeout rules for Activity Pub or is it entirely left at the discretion of implementors? Also, what about retries?

Not sure if this is the right forum. If this is not the right forum to ask questions about the protocol, where should I go instead?

Also, pointing to past discussions about this would be much appreciated.

Thank you all 👍

evanp commented 4 months ago

There is not a fixed timeout period in the ActivityPub federation protocol nor in the ActivityPub social API. This is left up to implementers. Because ActivityPub is a RESTful API built on top of HTTP, best practices for that type of API are recommended. If you need a heuristic, I think a 30-second time out is probably plenty for an ActivityPub client making a new POST or GET request, and a similar timeout for delivering activities to an inbox.

I'll add a primer page at https://www.w3.org/wiki/ActivityPub/Primer to give good heuristics and suggestions for retries, like exponential backoff.

FagnerMartinsBrack commented 4 months ago

@evanp Just to make it clear, my suggestion wouldn't be a fixed timeout, neither a fixed retry mechanism as that wouldn't solve any problems today. The suggestion was to have server nodes to send the retry directives and timeout directives in their response (headers and bodies) as they are the only ones who know what their processing capacity is, not the clients. HTTP has the Retry-After header which is a standard for retries for that purpose, for example.

Also, sending the target URL to fetch resources should come from servers, not from clients, as it allows servers to load balance their capacity to other copies of the server (distributed) in different hosts without the need for an actual load balancer (centralised). Like A/B testing where the server sends URLs pointing to different servers within their own cluster instance. Maybe this one is already implemented somehow?

Those implementations would allow the protocol to scale infinitely, only constrainted only by the number and size of servers within a given cluster and allow them to load-control clients by owning the timeout/retry directives they would respect.

I haven't implemented the protocol yet to get some experience in it, but from some preliminary investigation by reading it and deep HTTP knowledge I strongly believe the protocol is not scaling is because it hasn't implemented many of the controls that mandate server directives like that (and maybe the URL thing).

That's the reason I asked if there's any retry/timeout specification. Does that make sense?