Closed liammulh closed 1 year ago
In this https://stackoverflow.com/a/30579888/15027348 SO answer, I discovered we can look up the user name with an ID by going to https://api.github.com/user/:id
https://api.github.com/user/34923065
{
"login": "phet-dev",
...
}
Why is @phet-dev's rate limit 60? Is it not authenticated?
Here's another wrinkle: GitHub has two types of tokens. There are classic tokens (if I had to guess I would say @phet-dev's personal access token is classic) and then there are new "fine-grained" tokens.
Why is @phet-dev's rate limit 60? Is it not authenticated?
It's because we weren't providing the PAT in the curl. When we provide the PAT in the curl, it has 5000 requests.
@jbphet and I figured this out. We discovered that @phet-dev's GitHub personal access token (hereafter referred to as "the PAT") is specifically for Rosetta. It is not used for anything other than Rosetta. We also discovered that the PAT has a rate limit of 5000 requests per hour. We figured this out by entering the following command:
curl --request GET \
--url "https://api.github.com/users/phet-dev" \
--header "Authorization: Bearer <PAT goes here>" \
--header "X-GitHub-Api-Version: 2022-11-28" \
-i
The -i
option is important. According to the curl
man page, -i
(or --include
) does the following:
Include the HTTP response headers in the output. The HTTP response headers can include things like server name, cookies, date of the document, HTTP version and more...
One of the headers in the response is x-ratelimit-reset
, which gives an epoch timestamp. It doesn't seem like it is reset at the start of a new hour, so we think it's a "rolling window" hour.
It would seem Rosetta is somehow exhausting all 5000 of its requests in an hour. @jbphet and I were skeptical that it could be making that many requests without some sort of bug. However, upon further investigation, we discovered it is possible for Rosetta to exhaust its 5000 GitHub API requests in one hour:
Thus, for a typical PhET sim, Rosetta needs to perform 8 GitHub API requests to create a translation report object with total translated strings and total strings. If there are 100 sims, a full translation report requires approximately 800 GitHub API requests. We are limited to 5000 GitHub API requests per hour, so we can only generate the translation report for 6 different locales. Of course, once the translation report has been generated, it is cached in Rosetta's memory.
The proper solution to this problem is to use MongoDB to store the translated strings. However, tomorrow is my last day of work at PhET, and it is unlikely anyone else will have the time to do this. @jbphet and I decided to do the following:
When Rosetta gets an API request for a translation report, it will check how many requests it has left in the hour. If that number is lower than, say, 900, then we send a 429 status code to the client-side code saying "sorry, we've reached GitHub's API rate limit". The client-side code will then put up some sort of banner saying "sorry, the translation stats aren't available right now, but you can still translate" and we will put "--" or something where the stats would usually go.
When Rosetta gets an API request for a translation report, it will check how many requests it has left in the hour. If that number is lower than, say, 900, then we send a 429 status code to the client-side code saying "sorry, we've reached GitHub's API rate limit". The client-side code will then put up some sort of banner saying "sorry, the translation stats aren't available right now, but you can still translate" and we will put "--" or something where the stats would usually go.
The implementation ended up being slightly different than this — we set up an API endpoint to check how many requests we have left, and if it was below 900, we told the client-side code to not display translation stats.
This is now deployed.
We had a meeting this week to talk about priorities in Rosetta, and during that meeting we brought up the interface and noticed that the throttling message was showing. I thought I'd take a look at the logs and see if this is occurring a lot. The answer is "not a lot, but it is happening". The logs go back until May 22 2023, so almost 4 weeks, and it has happened 3 times over that period. I think one of those times - the one on June 13 - was due to my testing related to https://github.com/phetsims/rosetta/issues/412#issuecomment-1590162919, so I think that one can be ignored. The longest one was on June 14 starting at 13:12:39 lasted around 20 minutes, which doesn't seem to be too bad.
Bottom line: So far this seems to be working reasonably well and isn't causing too many problems.
Here is the raw data. This was generated by getting the Rosetta log and using the command grep -B 2 "should show stats: false" rlog.txt
. The threshold for the number of remaining requests is currently set at 900.
EDIT: Rosetta has been in maintenance mode since May 24th at ~5pm mountain time.
Last night, I noticed Rosetta was erroring.
Apparently, we have exceeded the GitHub API rate limit.
This GitHub doc says:
Later in that doc, it provides a way to check your rate limit:
When I ran this command for the GitHub user @phet-dev, who I suspect owns the personal access token we use in Rosetta, I see:
Am I misreading that doc? I thought we should have 5,000 requests per hour, assuming @phet-dev is an authenticated user with a personal access token. But apparently we only have 60, which I assume is the default.
The error I saw provided a user ID. Maybe I can use it to look up the user who owns the personal access token.