Closed ElegantSoft closed 1 year ago
UPDATE I found 1000+ concurrent requests passed to application layer and I don't know how did they passed from caddy-ratelimit. I suppose it happens in the same time before rate-limit store the number of connections.
I don't know how! but I am sure there are some problems with this package
Ideally, we need to be able to reproduce the bug in the most minimal way possible. This allows us to write regression tests to verify the fix is working. If we can't reproduce it, then you'll have to test our changes for us until it's fixed -- and then we can't add test cases, either.
I've attached a template below that will help make this easier and faster! This will require some effort on your part -- please understand that we will be dedicating time to fix the bug you are reporting if you can just help us understand it and reproduce it easily.
This template will ask for some information you've already provided; that's OK, just fill it out the best you can. :+1: I've also included some helpful tips below the template. Feel free to let me know if you have any questions!
Thank you again for your report, we look forward to resolving it!
## 1. Environment
### 1a. Operating system and version
```
paste here
```
### 1b. Caddy version (run `caddy version` or paste commit SHA)
```
paste here
```
### 1c. Go version (if building Caddy from source; run `go version`)
```
paste here
```
## 2. Description
### 2a. What happens (briefly explain what is wrong)
### 2b. Why it's a bug (if it's not obvious)
### 2c. Log output
```
paste terminal output or logs here
```
### 2d. Workaround(s)
### 2e. Relevant links
## 3. Tutorial (minimal steps to reproduce the bug)
Environment: Please fill out your OS and Caddy versions, even if you don't think they are relevant. (They are always relevant.) If you built Caddy from source, provide the commit SHA and specify your exact Go version.
Description: Describe at a high level what the bug is. What happens? Why is it a bug? Not all bugs are obvious, so convince readers that it's actually a bug.
Tutorial: What are the minimum required specific steps someone needs to take in order to experience the same bug? Your goal here is to make sure that anyone else can have the same experience with the bug as you do. You are writing a tutorial, so make sure to carry it out yourself before posting it. Please:
curl
.Example of a tutorial:
Create a config file: ``` { ... } ``` Open terminal and run Caddy: ``` $ caddy ... ``` Make an HTTP request: ``` $ curl ... ``` Notice that the result is ___ but it should be ___.
Note that any rate limiter using the remote IP as the zone key won't stop DDoS attacks because you'll have a lot of different IPs.
Once we have the information needed to reproduce the bug we'll take a look!
ubuntu 22.10
caddy version
or paste commit SHA)v2.7.4 h1:J8nisjdOxnYHXlorUKXY75Gr6iBfudfoGhrJ8t7/flI=
go version
)go version go1.20.7 linux/amd64
I have rate limit with max 70 request per ip in 90 seconds and that's not happening because I have implemented rate limit in application layer (nextjs nodejs) and log the IPs that have exceed the limit and blocked. I found 1000+ requests comes in 10 seconds from 1 single ip logged in application layer and I don't know how did it escaped from rate limit in caddy layer
I don't know how you can produce that because I face it as DDOS attack and I don't know how the attacker do that but I can share my caddy file config
domain.com {
rate_limit {
distributed
zone ip_rate {
key {remote_host}
events 250
window 300s
}
zone ip_rate_min {
key {remote_host}
events 70
window 100s
}
}
Note that any rate limiter using the remote IP as the zone key won't stop DDoS attacks because you'll have a lot of different IPs.
Once we have the information needed to reproduce the bug we'll take a look!
I know that But it will help alot. If the attacker will use proxies (good luck for him to pay 5$ per GB bandwith).
the problem here he uses low number of servers from OVH and Azure and produce huge traffic and requests from the same IP address.
I am here not to solve ddos issue. But to know how the rate limit didn't work in caddy layer and passed to application layer
I don't see where the proxying to your backend occurs. Please share full config.
*.domain.com {
tls /var/lib/caddy/.local/share/caddy/star-net-bundle.crt /var/lib/caddy/.local/share/caddy/star-net-key.key
rate_limit {
zone ip_rate {
key {remote_host}
events 250
window 300s
}
zone ip_rate_min {
key {remote_host}
events 70
window 100s
}
}
reverse_proxy * {
to 10.114.0.6:9090
to 10.114.0.7:9090
lb_policy ip_hash
lb_try_duration 1s
lb_try_interval 500ms
health_path /thanks # Backend health check path
# health_port 80 # Default same as backend port
health_interval 10s
health_timeout 2s
health_status 200
}
}
I don't know if this helps but what if 1000 requests come from same ip at same time?? I think rate limit will check for the limit and pass the request without letting a chance to increment number of requests for that ip. May be it sort of race condition or something like that
I don't know if this helps but what if 1000 requests come from same ip at same time?? I think rate limit will check for the limit and pass the request without letting a chance to increment number of requests for that ip. May be it sort of race condition or something like that
Nope. You're thinking too hard about it. There's no race condition here. Your case is simpler than that: you didn't tell Caddy the order to apply the directive. See the instructions here in the README of the repo. More about the directives order here.
No I already added the order and rate limit works if requests come one by one
{
on_demand_tls {
ask http://localhost:8080/api/v1/stores/domain
}
order rate_limit before basicauth
}
Sorry, I'm still a little confused, which config are you using, exactly? Of the 3 you posted with a rate_limit
directive, there are 2 versions. Is the config distributed? That makes a big difference. If it's a distributed config, you need to make sure storage is properly shared across all instances, otherwise you'll experience more requests than you're expecting.
I have 1 server contains caddy and then it acts as load balancer for 2 servers contains front-end app (nextjs) the full config file
{
on_demand_tls {
ask http://localhost:8080/api/v1/stores/domain
}
order rate_limit before basicauth
}
*.domain.com {
tls /var/lib/caddy/.local/share/caddy/star-net-bundle.crt /var/lib/caddy/.local/share/caddy/star-net-key.key
rate_limit {
zone ip_rate {
key {remote_host}
events 250
window 300s
}
zone ip_rate_min {
key {remote_host}
events 70
window 100s
}
}
reverse_proxy * {
to 10.114.0.6:9090
to 10.114.0.7:9090
lb_policy ip_hash
lb_try_duration 1s
lb_try_interval 500ms
health_path /thanks # Backend health check path
# health_port 80 # Default same as backend port
health_interval 10s
health_timeout 2s
health_status 200
}
}
I'm not able to reproduce it. I used Vegeta for the load testing, and I don't see requests passing once the rate limit is hit. Here are the results I witnessed.
distributed
~ $ cat Caddyfile
{
debug
order rate_limit before basicauth
}
localhost {
log
rate_limit {
zone ip_rate {
key {remote_host}
events 250
window 300s
}
zone ip_rate_min {
key {remote_host}
events 70
window 100s
}
}
respond "Hi"
}
~ $ echo "GET https://localhost/" | vegeta attack -insecure -duration=40s | tee results.bin | vegeta report
Requests [total, rate, throughput] 2000, 50.02, 1.75
Duration [total, attack, wait] 39.981s, 39.98s, 652.577µs
Latencies [min, mean, 50, 90, 95, 99, max] 496.442µs, 1.237ms, 1.048ms, 1.732ms, 2.116ms, 5.205ms, 21.413ms
Bytes In [total, mean] 140, 0.07
Bytes Out [total, mean] 0, 0.00
Success [ratio] 3.50%
Status Codes [code:count] 200:70 429:1930
Error Set:
429 Too Many Requests
distributed
~ $ cat Caddyfile
{
debug
order rate_limit before basicauth
}
localhost {
log
rate_limit {
distributed
zone ip_rate {
key {remote_host}
events 250
window 300s
}
zone ip_rate_min {
key {remote_host}
events 70
window 100s
}
}
respond "Hi"
}
~ $ echo "GET https://localhost/" | vegeta attack -insecure -duration=40s | tee results.bin | vegeta report
Requests [total, rate, throughput] 2000, 50.02, 1.75
Duration [total, attack, wait] 39.982s, 39.981s, 1.323ms
Latencies [min, mean, 50, 90, 95, 99, max] 505.503µs, 1.402ms, 988.338µs, 1.979ms, 2.937ms, 8.912ms, 59.632ms
Bytes In [total, mean] 140, 0.07
Bytes Out [total, mean] 0, 0.00
Success [ratio] 3.50%
Status Codes [code:count] 200:70 429:1930
Error Set:
429 Too Many Requests
Sample Caddy log line on rate-limit:
ERROR http.log.access handled request {"request": {"remote_ip": "127.0.0.1", "remote_port": "57005", "client_ip": "127.0.0.1", "proto": "HTTP/2.0", "method": "GET", "host": "localhost", "uri": "/", "headers": {"X-Vegeta-Seq": ["1999"], "Accept-Encoding": ["gzip"], "User-Agent": ["Go-http-client/2.0"]}, "tls": {"resumed": false, "version": 772, "cipher_suite": 4865, "proto": "h2", "server_name": "localhost"}}, "bytes_read": 0, "user_id": "", "duration": 0.000067283, "size": 0, "status": 429, "resp_headers": {"Server": ["Caddy"], "Alt-Svc": ["h3=\":443\"; ma=2592000"], "Retry-After": ["0"]}}
If your config has more than what's shared, please share the full details without redaction. Also check if you're truly running the configuration you think you're running.
Thanks for your time and care. the config file hasn't anything else. I will try to reproduce the issue. Are you sure the above command do the requests in parallel, Vegeta default -rate value Number of requests per time unit [0 = infinity] (default 50/1s) so 50 per second my be done one after another not at the same time.
I just now tried https://www.npmjs.com/package/loadtest loadtest -c 1000 --rps 10000 https://site
the issue didn't work but The attacker could pass the rate limiter of caddy and reach the app level rate limiter.
I have added some logic to my app to log and store ip if the abuser in db and when I did the test my IP didn't pass caddy and didn't logged or stored in db. but 3 IPs from the attack already passed and I found them in the log and in the db.
I really understand that you need a way to reproduce the issue to be able to solve it. I don't know how to reproduce it but I'm 100% sure of the validity of the issue. thanks alot @mholt @mohammed90
Thanks for understanding. And thank you @mohammed90 for trying to reproduce it!
As you're able to reproduce it, @ElegantSoft , if you want to try debugging it to gather more facts or narrow it down, that would be extremely helpful.
Unfortunately, I couldn't reproduce it. But I can see attackers' IPs passing the web server's rate limit and reach the application layer rate limit.
I will try to find a way to reproduce it.
Ok, thanks -- if you do, we can reopen this.
I have nextjs app behind caddy server and I have DDOS attacks. I've setup rate limit in caddy file.
And in nextjs app I made app level rate limiter
And I still get over than 100 lines in log