Rate limit question - Githubissues

donchev7 commented 3 years ago

Hey, nice lib you got here!

I have a question regarding the rate limit implementation

Isnt it true that you store the state in memory of the worker thus when a new worker is created the rate limiting is useless? Am I missing something?

Thanks.

markusahlstrand commented 3 years ago

Hi @donchev7,

It is true that it's stored in memory of the worker, but the global variables are persisted between invocations so it actually works pretty well. The rate limit is per edge location, but as the traffic typically are routed the same way for all requests it doesn't really matter. If you have a strict rate limit based on an api-key rather than IP, this might not be ideal as the customer for instance could use a vpn to hit different edge locations.

I typically add an IP-based ratelimit to all api's/services to make sure some client isn't flooding it with requests.

You can check how it works on https://proxy.cloudproxy.io/. It adds headers for what the current count is, what the limit is and at when the interval resets.

Not sure if answered your question? :)

donchev7 commented 3 years ago

Hey, yeah you kind of answered it. I must admit I am surprised cloudflare doesn't purge the workers after a response.

I'll do some tests now to verify what happens.

Cheers.

donchev7 commented 3 years ago

Hi Markus,

I did some tests and it seems it only works when we do per 1 minute rate limiting. I implemented something like, 10 req per hour e.g.:

const rateLimiter = (limitPerHour: number) => {
  // The Date object only returns current date if we use it during the event handler of a request!
  const getCurrentMin = () => Math.trunc(Date.now() / (1000* 60)) 
  let limitTime = 0
  let count = 0

  const reset = () => {
    limitTime = getCurrentMin() + 60
    count = 0
  }

  return {
    check: () => {
      // first run, initilize limitTime
      if (limitTime === 0) limitTime = getCurrentMin() + 60

      const currentMinute = getCurrentMin()
      if (currentMinute > limitTime) reset()

      if (count > limitPerHour) throw new RateLimitError(429, false, `Retry again in ${limitTime-currentMinute} min`)

      count = count + 1
    }
  }

}

but cloudflare resets the worker thus the counter and limitTime value after 2-3 minutes. Sometimes after 5 minutes.

Since your implementation is doing a per minute limiting I guess its fine. Just if somebody wanted to implement something more granular they need to watch out for the caveat.

markusahlstrand commented 3 years ago

Interesting.. I wonder if it's due to inactivity, so if you had more traffic it would keep it in memory?

I've been thinking about if it's possible to get a better and more strict rate-limiter implemented based on only cloudflare components. KV-storage would be interesting but it has a lag between the different edge locations and almost requires a central queue for getting the writes to work properly.

One interesting option would be to use the cache as storage. It's still pre edge location but should make sure that the data isn't lost on worker process restarts. I just did some work on the cache module to make it possible to use custom cache-keys and think it shouldn't be too hard. Maybe we could set a global variable with a timestamp to keep track on when the process started to know if we need to look in the cache or not. Guessing that the trickiest part will be to keep track of parallel requests when writing to the cache. Guess the safest bet would be to always write to the cache so it's kept up to date in case the process would restart for whatever reason.

What do you think?

donchev7 commented 3 years ago

Unfortunately for us I doubt it will be atomic. Will it be good enough? No way to know until we have implemented / tried it out :)

markusahlstrand commented 3 years ago

Think it could be implemented to be atomic, you just need to keep track of any updates that are in-flight.

So, basically we keep an up to date counter in memory. When a request comes in we start a timer that waits for x seconds we send an update to the cache with the sum of requests for the timeperiod. If a request is received and we have no in-memory count for this IP (or api-key..) we check in the cache for older counts. We could keep a timestamp for when the process started to avoid unnecessary calls to the cache as an optimisation.

I can't see any reason why this wouldn't be working.. Is there some edge case I missed?

markusahlstrand commented 3 years ago

The only thing is that it still will be per edge location.. So, it depends on if this is limitation you could live with?

donchev7 commented 3 years ago

When a request comes in we start a timer that waits for x seconds we send an update to the cache with the sum of requests for the timeperiod

Interesting....

Yes the limitation per edge location will always be there but in my opinion thats fine.

markusahlstrand commented 3 years ago

Hi @donchev7 !

Saw that cloudflare released a beta of durable object: https://blog.cloudflare.com/introducing-workers-durable-objects/

This would be the perfect fit for a ratelimiter right? Something that would be interesting for you?

alfaproject commented 3 years ago

That sounds very interesting for me d:

markusahlstrand commented 3 years ago

Yay! Now durable objects are available. I'll see if we could get a first version working

markusahlstrand / cloudworker-proxy

Rate limit question #80