Handling/retry of 500 errors?

amarand commented 4 years ago

Does your proposal relate to...

[X ] something else

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. e.g. I'm always frustrated when [...]

My Mastodon service provider sometimes throws his server into a "maintenance mode" and it throws a 521 error. It's semi-common for him to do this. It looks like ephemetoot gives up (which might be required for the protocol for a 500/521 error?) but is there a way to either A) add a switch that allows you to keep retrying (after a certain safe wait period, maybe a few minutes?) after getting a 500 error or B) just build that in without a switch (possibly with a switch to override that behavior)?

Describe the solution you'd like A clear and concise description of what you want to happen.

Ideally, this would be an opt-in switch, because I think 500 errors might not be retryable by convention. But the switch would, upon receipt of a 500 error, back off, wait a certain amount of time (5 minutes? 10? 15? Possibly set at the command line with a default?) and then retry.

Would like to write the code yourself?

[X] I would like someone else to write the code

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

When I see that it throws this error, I just restart the script and it runs usually right away. If I don't get to it for a few hours, I miss a few hours worth of deletions.

Additional context Add any other context or screenshots about the feature request here.

Thanks!

hughrun commented 4 years ago

Thanks for logging your issue!

You're right, generally a 5xxx error indicates the server is misconfigured so it would be typical to assume retrying 'soon' won't work. However I can see your use case and your proposed solution is the right way to do it. Something like --always_retry or something like that. We kind of have a similar functionality already for network errors (i.e. at your end). So we'd need to control two variables: how long between attempts, and now many attempts before the script gives up. Would you expect control over the second? I'd probably default to 5 attempts.

amarand commented 4 years ago

If I could control the former (duration between attempts, in minutes), a fixed five attempts would be fantastic. In my use case, I would just set the timeout to, say, ten minutes or something, and that would give me 50 minutes worth of retries, which should be more than enough for the majority of the "maintenance" outages I see with my unfederated instance. (Thank you so much!)

hughrun commented 4 years ago

Hey @amarand I've started looking at this but I now realise I don't know which Mastodon error this is throwing. When your server is returning a 521 error, do you remember what ephemetoot currently does with that? i.e. is it:

📡 ephemetoot cannot connect to the server - are you online?

or

🙅 User and/or access token does not exist or has been deleted

or something else?

amarand commented 4 years ago

Usually it’s the “User and/or access token does not exist...” error. With the instance I use, the 500-series is set when the admin is doing short Maintenance. Usually restarting in a few minutes works. My concern is, if I start it at, say, 2000, and it runs until 2100, then fails, nothing happens overnight. So any back-off (15/30/60 minutes) is better than an outright failure to the command line.

amarand commented 4 years ago

Oh, and I realized the reason why I never see the "📡 ephemetoot cannot connect to the server - are you online?" error is that the instance I use, goes through Cloudflare, so the front-end connection never fails ("are you online?") but the authentication/token on the back-end is rejected (because Cloudflare isn't passing anything other than a "failure" message). Hope that clarifies?

amarand commented 4 years ago

Ahh, here we go...found one from today:

🛑 ERROR deleting toot - 100923825263144264 - ('Mastodon API returned error', 522, '', None) Waiting 1 minute before re-trying Attempting delete again 🛑 ERROR deleting toot - 100923825263144264 ('Mastodon API returned error', 500, 'Internal Server Error', None) Exiting due to error.

🙅 User and/or access token does not exist or has been deleted

hughrun commented 4 years ago

Perfect thanks!

Mastodon.py provides different error codes for different types of error so I just need to make sure I'm using the right one. This is tricky to test because I need to emulate a 5xx error without, you know, shutting down my own site.

hughrun / ephemetoot

Handling/retry of 500 errors? #41