haskell-github / github

The github API for Haskell
https://hackage.haskell.org/package/github
BSD 3-Clause "New" or "Revised" License
409 stars 190 forks source link

As a developer I want to see the rate limit so I don't go over the limit #8

Open mike-burns opened 12 years ago

mike-burns commented 12 years ago

Each response from the underlying JSON API contains rate limiting information: how many requests we've made, and how many remain. It would be good to use this information.

Some proposals:

a) Produce the rate limiting data as part of every response data type, including Error. b) Provide a function that makes a request then produces just the rate limiting data. c) Track the rate limit internally and automatically slow requests as the limit approaches.

Since this API is rather low level still, it makes sense to provide access to this data. (c) should be in a different library, perhaps named github-convenience or so.

In common use I'd think (a) is most useful; otherwise each request would need to be followed by another request immediately.

joeyh commented 12 years ago

Only including it in Error (or rather, as a special, detectable class of Error) would be another option. Some programs (such as mine) only really care about detecting when they've gone over the rate limit vs encountered some other problem. They don't need to worry about avoiding going over it.

Maybe put it in Error and also provide hooks that could be used to write a hypothetical github-convenience? One way would be, for each existing api call Foo :: a , to provide a Foo' :: (a, RateLimitInfo)

joeyh commented 12 years ago

Hmm, on second thought, Foo' is not needed. A github-convenience could simply provide a catchRateLimit :: IO (Either Error a) -> IO (Either Error a), that makes any arbitrary API request, checks if the Error indicates rate limiting, and if so waits, retrying the request periodically until it's let through.

It would not be possible for github-convenience to slow things down to preemptively avoid getting rate limited, but I am unsure of the value of that strategy anyway.. If a program is going to make 1000 api requests, and will succeed without any delays, then inserting delays just makes the program behave slower to no purpose. If it can only make the first 500 before being rate limited, then will need to wait for an hour (say) to get back enough API credit to finish, so it will probably take it just over an hour to run, with either strategy.

It might still be helpful to have a b). A program could, for example, make one check of the current rate limit state on startup, and choose to run in a mode that makes many API calls, or a mode that makes fewer.

joeyh commented 12 years ago

Few observations on rate limiting..

I have to run quite a few parallel loops to seriously deplete the limit. In other words, 5000 https requests per hour is quite a lot for a reasonable client to sustain. My github-backup also does some git clones of forks it finds, and this was plenty of delay to let it run for hours on end without going over the rate limit. It's also possible github puts some delay in on their side. I've done some bad bad things, but never gone over rate limit until I determined to do so on purpose.

Here's what you see if you do go over. :)

HTTP/1.1 403 Forbidden
Server: nginx/1.0.4
Date: Mon, 30 Jan 2012 04:40:26 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Status: 403 Forbidden
X-RateLimit-Limit: 5000
ETag: "5893e29af3b9b0e65a433076a38ff5b6"
X-RateLimit-Remaining: 0
Content-Length: 59

{
  "message": "API Rate Limit Exceeded for xx.xx.xx.xx"
}

Finally, once I did exceed the rate limit, I had sort of expected to regain the ability to make at least a few api requests within a few minutes, using the model that it replenishes at a constant rate. This did not happen. Indications are that once the rate limit reaches 0, you have to wait a whole hour and then it's reset to 5000.

phadej commented 8 years ago

I'll think how to provide this information thru https://github.com/phadej/github/issues/127. I'm not promising it will be implemented, only possible.

esjmb commented 7 years ago

Hi,

I've been working on this issue for a use case involving hitting rate limits...I've implemented a solution as follows:

  1. modified Request.hs in GitHub/Data
-- | 'PagedQuery' returns just some results, using this data we can specify how
-- many pages we want to fetch. RateLimiting indicates whether we should pause processing for
-- any RateLimit error. We either don't handle handle rate limiting, or handle rate limiting with
-- the default user,  or we handle rate limiting with an authorized token/secret combo if passed. .
type Token = IBS.ByteString
type Secret = IBS.ByteString
data RateLimiting = NoLimiting | AnonLimiting | AuthorizedLimiting {token :: Token, secret :: Secret}
                     deriving (Eq, Ord, Read, Show, Generic, Typeable, Hashable, Binary, NFData)
data FetchCount = FetchAtLeast !Int | FetchAll
                     deriving (Eq, Ord, Read, Show, Generic, Typeable)
  1. I then support setting of RateLimiting on most relevant API calls, with a handling of rate limit failures, which backs off until the rate limit will be renewed. The handling is down in GitHub/Request.hs [httpLibs'] as you would expect. The solution also adds token and secret appropriately to calls if these are passed as part of AuthorizedLimiting.

i.e..

-- | List languages.
-- See <https://developer.github.com/v3/repos/#list-languages>
languagesForR :: Name Owner -> Name Repo -> RateLimiting -> Request k Languages
languagesForR user repo =
    query ["repos", toPathPart user, toPathPart repo, "languages"] []

It works pretty well, but I'm still playing round with it. One thing I want to add is a preemptive delay so as to avoid hitting the rate limit (and thus be a nice consumer :) ). At present, it reacts to the rate limit hit by backing off at that point.

If you're happy to wait a week or so, I can extract the code out from my other changes and push out for review.

phadej commented 7 years ago

Modifying Request is semantically wrong. The mock (or cached) endpoint handling Request doesn't need that information at all. You could wrap Request into RateLimitedRequest or similar, if you need to handle different requests differently (or you could reflect directly from Request value).

"Unfortunately" we don't have rate-limiting problems at work, so I don't have time to write a proper tutorial for an approach I have in mind.

esjmb commented 7 years ago

or like FetchAllRequest? :)

np. your gig.

For anyone looking for a workable external solution, here's a sketch...

                     -- | use like so: a <- handleRateLimit $  userRepos'  uauth usr RepoPublicityAll
                     handleRateLimit :: IO (Either G.Error a) -> IO (Either G.Error a)
                     handleRateLimit f = do
                       res <- f
                       case res of
                         r@(Right _) -> return r
                         e@(Left (HTTPError (HttpExceptionRequest _ (StatusCodeException resp _)))) -> do
                         case ( readInteger =<< lookup "X-RateLimit-Remaining" resp
                              , readInteger =<< lookup "X-RateLimit-Reset" resp ) of
                           (Just b, Just c)
                             | fst b > 0 -> return e 
                             | otherwise -> do
                                 o <- getCurrentTime
                                 let f = posixSecondsToUTCTime $ fromIntegral $ fst c
                                     delaySeconds = round $ diffUTCTime f o
                                 threadDelay $ (delaySeconds + 10) * 1000000  -- add 10 seconds for good measure
                                 handleRateLimit f
                           _ -> return e

Note that you are not going to get Authorised rate limiting this way, but that's a matter of associating rate limiting with the request such that client_id and client_secret are added as URL params.