500 and 429 responses on /api/issue?

greggles commented 3 years ago

Question

We are testing out sending out codes to phone numbers in batches (happy to discuss the motivation and use case separately from this issue).

I've built out error handling for the various response messages to expect in the API.md but find

More 500 responses than I expect - I'm starting to guess these are numbers on a block list from Twilio? Is there any way to know how/why these are happening?
Getting 429 responses. These are not documented, so I'm really curious about them. Currently the script calls the API in a pretty tight loop, issuing ~100 codes within a few seconds. Should we implement a sleep in the loop to avoid some request threshold? This does not have to be a high throughput process.

sethvargo commented 3 years ago

Hi @greggles,

1. More 500 responses than I expect...

Are you getting a response body? It should include an errors key. If not, can you let me know which server you're using and I can investigate and see if there's anything in the server logs. Timestamps would be helpful too.

2. Getting 429 responses...

The possibility of a rate limited is documented in api.md, but you're correct that we do not document the specific rate limits.

Rate limiting can occur at the following levels, in order:

API keys - If an API key is provided, the system rate limits by the realm associated with the API key, by client IP address. API keys are specified as the X-API-Key HTTP header, but not all endpoints require that header.
User ID - If a user is authenticated to the system (only applies to the UI server), they are rate limited by user ID. Since a user can be a member of multiple realms, the enforcement mechanism here is per-user, not per-realm.
IP address - If no authentication information is present, rate limiting is applied by the requestor's IP address.

When a request is rate limited, it will return a 429 Too Many Requests. The response headers will include an X-Retry-After header, which indicates when it is safe to retry the request.

The current global rate limit is 60 tokens per minute (note this is NOT the same as 1 token per second, although that is one possible use), however we reserve the right to change this rate limit as-needed.

It's unlikely we'll every drop below 60 tokens/minute, so I would build around that metric, and then bundle any requests that were rate limited into a retry.

greggles commented 3 years ago

I'll double check, but the 500s seem to have an empty body. I will investigate some more and let you know what I find. I'll provide some timestamps if I need more help.

Sorry for missing that part of the doc. I guess I was reading the earlier error code tables and didn't realize general codes are at the bottom of the doc. That's great advice about 60 tokens per minute. I'll try that out.

sethvargo commented 3 years ago

If they have an empty body, that's a bug. The body should at least contain {}. I just looked at the code and we do return an error if the SMS fails, however we don't currently tell you why - it's just "failed to send sms".

I do, however, think I see a bug which I'll patch up in a moment.

sethvargo commented 3 years ago

https://github.com/google/exposure-notifications-verification-server/pull/1031

mikehelmick commented 3 years ago

In our reference setup - I think we can relax the rate limit a bit for the adminapi server

@greggles - I looked at the logs and the 500s here are indeed from Twilio, which once #1031 rolls out, that will propagate back to you

""The 'To' phone number: `XXXX is not currently reachable using the 'From' phone number: XXXX via SMS." "

greggles commented 3 years ago

Super helpful, thanks, mike!

The rate limit is fine by me. I put a sleep in the calling code and the throughput is acceptable for us.

mikehelmick commented 3 years ago

Here is the Twilio page that documents that error: https://www.twilio.com/docs/api/errors/21612

greggles commented 3 years ago

I think this question might be resolved if you agree. If you want me to verify anything before it's closed I can try to do that.

You've been super helpful. Thank you!

mikehelmick commented 3 years ago

great - I'll mark this as closed. Let us know if anything else comes up.

/close

google-oss-robot commented 3 years ago

@mikehelmick: Closing this issue.

In response to [this](https://github.com/google/exposure-notifications-verification-server/issues/1028#issuecomment-725500854): >great - I'll mark this as closed. Let us know if anything else comes up. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

google / exposure-notifications-verification-server

500 and 429 responses on /api/issue? #1028

Question