passportxyz / passport

Passport allows users to prove their identity through a secure, decentralized UI
Other
994 stars 461 forks source link

Enable better assessment of API errors through the API analytics #2923

Open nutrina opened 2 months ago

nutrina commented 2 months ago

User Story:

As a DevOps responsible I want to have a dashboard showing the most active partners, as well as the most common errors cases encountered by the API handler (including rate limits) So that I can have a detailed picture over who is using our API and how, and reach out to the respective teams.

Note: this issue focuses mostly on the error & rate limiting aspects, as we are trying to shift the monitoring of these from the engineering team to DevOps

Acceptance Criteria

GIVEN I am a DevOps responsible WHEN I go to the API Key Analytics Dashboard THEN I can see a list of partners ordered by number of successful API key requests AND another list of partners ordered by the HTTP 429 error code rate AND another list of partners ordered by other HTTP error code rates

Product & Design Links:

#### Tech Details: - this depends on the fixes from the following ticket: https://github.com/passportxyz/core-infra/issues/31 #### Open Questions: #### Notes/Assumptions:
tim-schultz commented 1 month ago

It looks like we are currently storing the response message even when errors are thrown.

This query will return a count of the api key ids that return the most 429 errors with their associated account ids

SELECT 
    aaa.api_key_id,
    aak.account_id,  -- This will get the Account's ID
    COUNT(*) as rate_limit_count
FROM 
    account_accountapikeyanalytics aaa 
    JOIN account_accountapikey aak ON aaa.api_key_id = aak.id
WHERE 
    response::text LIKE '%limited%'
GROUP BY 
    aaa.api_key_id,
    aak.account_id
ORDER BY 
    rate_limit_count DESC
LIMIT 10;
tim-schultz commented 1 month ago

https://github.com/passportxyz/passport-scorer/pull/709 should successfully save all status codes to the analytics table. Once this is deployed it will be easy to update the above query so that it aggregates status codes and api key ids

tim-schultz commented 3 weeks ago

The API Key Usage dashboard has been updated based on the AC. Note that we only have status code data through last week