I am sure that rate limit not working well

ElegantSoft commented 1 year ago

Screenshot 2023-09-09 122205 I have nextjs app behind caddy server and I have DDOS attacks. I've setup rate limit in caddy file.

rate_limit {
    distributed
    zone ip_rate {
        key    {remote_host}
        events 250
        window 300s
    }
    zone ip_rate_min {
        key    {remote_host}
        events 70
        window 100s
    }
}

And in nextjs app I made app level rate limiter

    const ip = headers["x-forwarded-for"] as string;
    try {
      await limiter.check(70, ip);
    } catch (e) {
      console.log("will block ip", ip);
      appContext.ctx.req?.destroy();
    }

And I still get over than 100 lines in log

ElegantSoft commented 1 year ago

UPDATE I found 1000+ concurrent requests passed to application layer and I don't know how did they passed from caddy-ratelimit. I suppose it happens in the same time before rate-limit store the number of connections.

I don't know how! but I am sure there are some problems with this package

mohammed90 commented 1 year ago

Ideally, we need to be able to reproduce the bug in the most minimal way possible. This allows us to write regression tests to verify the fix is working. If we can't reproduce it, then you'll have to test our changes for us until it's fixed -- and then we can't add test cases, either.

I've attached a template below that will help make this easier and faster! This will require some effort on your part -- please understand that we will be dedicating time to fix the bug you are reporting if you can just help us understand it and reproduce it easily.

This template will ask for some information you've already provided; that's OK, just fill it out the best you can. :+1: I've also included some helpful tips below the template. Feel free to let me know if you have any questions!

Thank you again for your report, we look forward to resolving it!

Template

## 1. Environment

### 1a. Operating system and version

```
paste here
```

### 1b. Caddy version (run `caddy version` or paste commit SHA)

```
paste here
```

### 1c. Go version (if building Caddy from source; run `go version`)

```
paste here
```

## 2. Description

### 2a. What happens (briefly explain what is wrong)

### 2b. Why it's a bug (if it's not obvious)

### 2c. Log output

```
paste terminal output or logs here
```

### 2d. Workaround(s)

### 2e. Relevant links

## 3. Tutorial (minimal steps to reproduce the bug)

Helpful tips

Environment: Please fill out your OS and Caddy versions, even if you don't think they are relevant. (They are always relevant.) If you built Caddy from source, provide the commit SHA and specify your exact Go version.
Description: Describe at a high level what the bug is. What happens? Why is it a bug? Not all bugs are obvious, so convince readers that it's actually a bug.
- 2c) Log output: Paste terminal output and/or complete logs in a code block. DO NOT REDACT INFORMATION except for credentials.
- 2d) Workaround: What are you doing to work around the problem in the meantime? This can help others who encounter the same problem, until we implement a fix.
- 2e) Relevant links: Please link to any related issues, pull requests, docs, and/or discussion. This can add crucial context to your report.
Tutorial: What are the minimum required specific steps someone needs to take in order to experience the same bug? Your goal here is to make sure that anyone else can have the same experience with the bug as you do. You are writing a tutorial, so make sure to carry it out yourself before posting it. Please:
- Start with an empty config. Add only the lines/parameters that are absolutely required to reproduce the bug.
- Do not run Caddy inside containers.
- Run Caddy manually in your terminal; do not use systemd or other init systems.
- If making HTTP requests, avoid web browsers. Use a simpler HTTP client instead, like curl.
- Do not redact any information from your config (except credentials). Domain names are public knowledge and often necessary for quick resolution of an issue!
- Note that ignoring this advice may result in delays, or even in your issue being closed. 😞 Only actionable issues are kept open, and if there is not enough information or clarity to reproduce the bug, then the report is not actionable.

Example of a tutorial:

Create a config file: ``` { ... } ``` Open terminal and run Caddy: ``` $ caddy ... ``` Make an HTTP request: ``` $ curl ... ``` Notice that the result is ___ but it should be ___.

mholt commented 1 year ago

Note that any rate limiter using the remote IP as the zone key won't stop DDoS attacks because you'll have a lot of different IPs.

Once we have the information needed to reproduce the bug we'll take a look!

ElegantSoft commented 1 year ago

1. Environment

1a. Operating system and version

ubuntu 22.10

1b. Caddy version (run `caddy version` or paste commit SHA)

v2.7.4 h1:J8nisjdOxnYHXlorUKXY75Gr6iBfudfoGhrJ8t7/flI=

1c. Go version (if building Caddy from source; run `go version`)

go version go1.20.7 linux/amd64

2. Description

2a. What happens (briefly explain what is wrong)

I have rate limit with max 70 request per ip in 90 seconds and that's not happening because I have implemented rate limit in application layer (nextjs nodejs) and log the IPs that have exceed the limit and blocked. I found 1000+ requests comes in 10 seconds from 1 single ip logged in application layer and I don't know how did it escaped from rate limit in caddy layer

I don't know how you can produce that because I face it as DDOS attack and I don't know how the attacker do that but I can share my caddy file config

domain.com {

rate_limit {
    distributed
    zone ip_rate {
        key    {remote_host}
        events 250
        window 300s
    }
    zone ip_rate_min {
        key    {remote_host}
        events 70
        window 100s
    }
}

ElegantSoft commented 1 year ago

Note that any rate limiter using the remote IP as the zone key won't stop DDoS attacks because you'll have a lot of different IPs.

Once we have the information needed to reproduce the bug we'll take a look!

I know that But it will help alot. If the attacker will use proxies (good luck for him to pay 5$ per GB bandwith).

the problem here he uses low number of servers from OVH and Azure and produce huge traffic and requests from the same IP address.

I am here not to solve ddos issue. But to know how the rate limit didn't work in caddy layer and passed to application layer

mohammed90 commented 1 year ago

I don't see where the proxying to your backend occurs. Please share full config.

ElegantSoft commented 1 year ago

*.domain.com {

tls /var/lib/caddy/.local/share/caddy/star-net-bundle.crt  /var/lib/caddy/.local/share/caddy/star-net-key.key

rate_limit {
    zone ip_rate {
        key    {remote_host}
        events 250
        window 300s
    }
    zone ip_rate_min {
        key    {remote_host}
        events 70
        window 100s
    }
}
                reverse_proxy * {

                to 10.114.0.6:9090
                to 10.114.0.7:9090

                lb_policy ip_hash
                lb_try_duration 1s
                lb_try_interval 500ms

                health_path     /thanks # Backend health check path
                # health_port     80 # Default same as backend port
                health_interval 10s
                health_timeout  2s
                health_status   200
                }
        }

ElegantSoft commented 1 year ago

I don't know if this helps but what if 1000 requests come from same ip at same time?? I think rate limit will check for the limit and pass the request without letting a chance to increment number of requests for that ip. May be it sort of race condition or something like that

mohammed90 commented 1 year ago

I don't know if this helps but what if 1000 requests come from same ip at same time?? I think rate limit will check for the limit and pass the request without letting a chance to increment number of requests for that ip. May be it sort of race condition or something like that

Nope. You're thinking too hard about it. There's no race condition here. Your case is simpler than that: you didn't tell Caddy the order to apply the directive. See the instructions here in the README of the repo. More about the directives order here.

ElegantSoft commented 1 year ago

No I already added the order and rate limit works if requests come one by one

{
    on_demand_tls {
        ask http://localhost:8080/api/v1/stores/domain
    }
order rate_limit before basicauth
}

mholt commented 1 year ago

Sorry, I'm still a little confused, which config are you using, exactly? Of the 3 you posted with a rate_limit directive, there are 2 versions. Is the config distributed? That makes a big difference. If it's a distributed config, you need to make sure storage is properly shared across all instances, otherwise you'll experience more requests than you're expecting.

ElegantSoft commented 1 year ago

I have 1 server contains caddy and then it acts as load balancer for 2 servers contains front-end app (nextjs) the full config file

{
    on_demand_tls {
        ask http://localhost:8080/api/v1/stores/domain
    }
order rate_limit before basicauth
}

*.domain.com {

tls /var/lib/caddy/.local/share/caddy/star-net-bundle.crt  /var/lib/caddy/.local/share/caddy/star-net-key.key

rate_limit {
    zone ip_rate {
        key    {remote_host}
        events 250
        window 300s
    }
    zone ip_rate_min {
        key    {remote_host}
        events 70
        window 100s
    }
}
                reverse_proxy * {

                to 10.114.0.6:9090
                to 10.114.0.7:9090

                lb_policy ip_hash
                lb_try_duration 1s
                lb_try_interval 500ms

                health_path     /thanks # Backend health check path
                # health_port     80 # Default same as backend port
                health_interval 10s
                health_timeout  2s
                health_status   200
                }
        }

mohammed90 commented 1 year ago

I'm not able to reproduce it. I used Vegeta for the load testing, and I don't see requests passing once the rate limit is hit. Here are the results I witnessed.

Run 1: Without `distributed`

~ $ cat Caddyfile
{
    debug
    order rate_limit before basicauth
}
localhost {
    log
    rate_limit {
        zone ip_rate {
            key {remote_host}
            events 250
            window 300s
        }
        zone ip_rate_min {
            key {remote_host}
            events 70
            window 100s
        }
    }
    respond "Hi"
}
~ $ echo "GET https://localhost/" | vegeta attack -insecure -duration=40s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         2000, 50.02, 1.75
Duration      [total, attack, wait]             39.981s, 39.98s, 652.577µs
Latencies     [min, mean, 50, 90, 95, 99, max]  496.442µs, 1.237ms, 1.048ms, 1.732ms, 2.116ms, 5.205ms, 21.413ms
Bytes In      [total, mean]                     140, 0.07
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           3.50%
Status Codes  [code:count]                      200:70  429:1930
Error Set:
429 Too Many Requests

Run 2: with `distributed`

~ $ cat Caddyfile
{
    debug
    order rate_limit before basicauth
}
localhost {
    log
    rate_limit {
        distributed
        zone ip_rate {
            key {remote_host}
            events 250
            window 300s
        }
        zone ip_rate_min {
            key {remote_host}
            events 70
            window 100s
        }
    }
    respond "Hi"
}
~ $ echo "GET https://localhost/" | vegeta attack -insecure -duration=40s | tee results.bin | vegeta report                                         
Requests      [total, rate, throughput]         2000, 50.02, 1.75
Duration      [total, attack, wait]             39.982s, 39.981s, 1.323ms
Latencies     [min, mean, 50, 90, 95, 99, max]  505.503µs, 1.402ms, 988.338µs, 1.979ms, 2.937ms, 8.912ms, 59.632ms
Bytes In      [total, mean]                     140, 0.07
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           3.50%
Status Codes  [code:count]                      200:70  429:1930
Error Set:
429 Too Many Requests

Sample Caddy log line on rate-limit:

ERROR   http.log.access handled request {"request": {"remote_ip": "127.0.0.1", "remote_port": "57005", "client_ip": "127.0.0.1", "proto": "HTTP/2.0", "method": "GET", "host": "localhost", "uri": "/", "headers": {"X-Vegeta-Seq": ["1999"], "Accept-Encoding": ["gzip"], "User-Agent": ["Go-http-client/2.0"]}, "tls": {"resumed": false, "version": 772, "cipher_suite": 4865, "proto": "h2", "server_name": "localhost"}}, "bytes_read": 0, "user_id": "", "duration": 0.000067283, "size": 0, "status": 429, "resp_headers": {"Server": ["Caddy"], "Alt-Svc": ["h3=\":443\"; ma=2592000"], "Retry-After": ["0"]}}

If your config has more than what's shared, please share the full details without redaction. Also check if you're truly running the configuration you think you're running.

ElegantSoft commented 1 year ago

Thanks for your time and care. the config file hasn't anything else. I will try to reproduce the issue. Are you sure the above command do the requests in parallel, Vegeta default -rate value Number of requests per time unit [0 = infinity] (default 50/1s) so 50 per second my be done one after another not at the same time.

ElegantSoft commented 1 year ago

I just now tried https://www.npmjs.com/package/loadtest loadtest -c 1000 --rps 10000 https://site the issue didn't work but The attacker could pass the rate limiter of caddy and reach the app level rate limiter. I have added some logic to my app to log and store ip if the abuser in db and when I did the test my IP didn't pass caddy and didn't logged or stored in db. but 3 IPs from the attack already passed and I found them in the log and in the db.

ElegantSoft commented 1 year ago

I really understand that you need a way to reproduce the issue to be able to solve it. I don't know how to reproduce it but I'm 100% sure of the validity of the issue. thanks alot @mholt @mohammed90

mholt commented 1 year ago

Thanks for understanding. And thank you @mohammed90 for trying to reproduce it!

As you're able to reproduce it, @ElegantSoft , if you want to try debugging it to gather more facts or narrow it down, that would be extremely helpful.

ElegantSoft commented 1 year ago

Unfortunately, I couldn't reproduce it. But I can see attackers' IPs passing the web server's rate limit and reach the application layer rate limit.

I will try to find a way to reproduce it.

mholt commented 1 year ago

Ok, thanks -- if you do, we can reopen this.

mholt / caddy-ratelimit