Open nutrina opened 2 months ago
It looks like we are currently storing the response message even when errors are thrown.
This query will return a count of the api key ids that return the most 429 errors with their associated account ids
SELECT
aaa.api_key_id,
aak.account_id, -- This will get the Account's ID
COUNT(*) as rate_limit_count
FROM
account_accountapikeyanalytics aaa
JOIN account_accountapikey aak ON aaa.api_key_id = aak.id
WHERE
response::text LIKE '%limited%'
GROUP BY
aaa.api_key_id,
aak.account_id
ORDER BY
rate_limit_count DESC
LIMIT 10;
https://github.com/passportxyz/passport-scorer/pull/709 should successfully save all status codes to the analytics table. Once this is deployed it will be easy to update the above query so that it aggregates status codes and api key ids
The API Key Usage dashboard has been updated based on the AC. Note that we only have status code data through last week
User Story:
As a DevOps responsible I want to have a dashboard showing the most active partners, as well as the most common errors cases encountered by the API handler (including rate limits) So that I can have a detailed picture over who is using our API and how, and reach out to the respective teams.
Note: this issue focuses mostly on the error & rate limiting aspects, as we are trying to shift the monitoring of these from the engineering team to DevOps
Acceptance Criteria
GIVEN I am a DevOps responsible WHEN I go to the API Key Analytics Dashboard THEN I can see a list of partners ordered by number of successful API key requests AND another list of partners ordered by the HTTP 429 error code rate AND another list of partners ordered by other HTTP error code rates
Product & Design Links: