Open mwantia opened 5 months ago
Some updates:
I tinkered around with my setup, mostly adjusting the LogLevel
, AccessLog
and directives
, but it seems to run smoothly between 2,5 to 2,8 GB right now.
Currently unsure why it behaves like this, so I will have to experiment with my settings again later to see if there are any noticeable changes.
I also adjusted the LogLevel
to debug, since all other options on coraza don't seem to change anything and noticed that the following output gets repeated nearly every few requests.
2024-05-31T08:26:19Z DBG github.com/traefik/traefik/v3/pkg/logs/wasm.go:31 > Initializing WAF with directives:
SecRuleEngine On
SecDebugLog /dev/stdout
SecDebugLogLevel 2
Include @crs-setup.conf.example
SecRule REQUEST_URI "@streq /xyz" "id:101,phase:1,log,deny,status:403"
Isn't this part of the main function and only used during initialization? I'm not that knowledgable in go programming and even less when it comes to Traefik plugins but I would assume to only see this log at the start but not during every request.
Additionally, this is the configuration Traefik is currently running with. I will try to see if there are any noticable changes or spices in usage during the weekend.
experimental:
plugins:
traefik-real-ip:
modulename: github.com/soulbalz/traefik-real-ip
version: v1.0.3
geoblock:
moduleName: github.com/PascalMinder/GeoBlock
version: v0.2.2
coraza:
moduleName: github.com/jcchavezs/coraza-http-wasm-traefik
version: v0.2.1
entrypoints:
websecure:
address: ':443'
forwardedHeaders:
insecure: true
http:
tls: true
middlewares:
- 'realip@file'
- 'geoblock-de@file'
- 'waf@file'
global:
sendAnonymousUsage: false
checkNewVersion: false
api:
dashboard: true
insecure: true
metrics:
prometheus:
addRoutersLabels: true
addServicesLabels: true
ping: {}
log:
level: DEBUG
accessLog: {}
providers:
file:
directory: /local/config
consulcatalog:
endpoint:
address: 'consul.service.consul:8501'
scheme: https
token: '${CONSUL_TOKEN}'
tls:
insecureSkipVerify: false
connectAware: true
connectByDefault: true
exposedByDefault: false
defaultRule: 'Host(`{{ .Name }}.${DOMAIN}`)'
constraints: 'TagRegex(`cloudflare.enable=true`)'
http:
middlewares:
waf:
plugin:
coraza:
directives:
- SecRuleEngine On
- SecDebugLog /dev/stdout
- SecDebugLogLevel 9
- Include @crs-setup.conf.example
- SecRule REQUEST_URI "@streq /xyz" "id:101,phase:1,log,deny,status:403" # Testing
# - Include @owasp_crs/**.conf
- Include /directives/*.conf
Can confirm this issue.
With
waf:
plugin:
coraza:
directives:
- SecRuleEngine On
- SecDebugLog /dev/stdout
- SecDebugLogLevel 9
- SecRule REQUEST_URI "@streq /wp-admin" "id:101,phase:1,log,deny,status:403"
traefik process uses a bit under 2GB, without it 100MB.
Hello, I noticed a similar behavior using Traefik in binary mode and the Coraza middleware. Here is my experience in case it can be useful to someone.
The System's RAM usage before starting Traefik with Coraza is 2,4G and after starting it 2,7G.
First test: I set a Python server behind Traefik and using another Python script I sent 100 requests (which will reach the Python server) to the Traefik entrypoint, with a 100ms sleep time between requests. After this test the RAM increased to 4.8G. What is interesting in my opinion is that the RAM doesn't seem to go back down to 2.7G. I waited 10 minutes without sending any traffic and the RAM only came down to 3.9G. I ran another test with 200 requests instead of 100 and this didn't seem to affect the RAM usage, it went up to 4.8G again.
Second test: I changed my Python script to send 100 requests to 5 different URLs (1 URL that reaches the Python server and 4 URLs that are filtered out by the Coraza middleware) one after the other, which will make a total of 500 requests, with a 100ms sleep time between requests. After running the script three times this is what I got:
First time ------------ RAM: 6.0G Second time ------- RAM: 7.3G Third time ----------- RAM: 8.3G
After waiting 10 minutes without sending traffic the RAM came down to 6.0G.
I ran the same tests without the Coraza middleware and the RAM didn't even budged, it stayed at 2.4G before starting Traefik, after starting Traefik, and during the traffic tests.
Here is my config:
waf:
plugin:
coraza:
directives:
- SecRuleEngine On
- SecDebugLog /dev/stdout
- SecDebugLogLevel 9
- Include @crs-setup.conf.example
- Include @owasp_crs/**.conf
Hi everyone, thanks for coming by this repository.
The problem seems to be very similar to what we experienced in https://github.com/corazawaf/coraza-proxy-wasm/issues/249. Although they are different code, what they have in common is the GC and that could be the issue.
One way to slice and dice this issue is to discriminate requests with/without payload and second, in directives set SecRequestBodyAccess Off
before Include @owasp_crs/**.conf
because the main hunch here is that the space we allocate for request bodies are the source of problem.
In the mean time I released https://github.com/jcchavezs/coraza-http-wasm-traefik/releases/tag/v0.2.2 which attempts to introduce minor improvements in performance. Would be amazing if any of you could test it.
We're currently facing the same issue while testing the coraza Traefik plugin.
The Traefik Coraza Plugin leads to very high memory usage on our servers.
The memory used by the Traefik container grows with the container lifetime until the server is out of memory (16GB), and docker restarts the container. This currently happens roughly every hour.
Traefik v3.0 coraza-http-wasm-traefik v0.2.2
We run the following directives:
- SecRuleEngine On
- SecDebugLog /dev/stdout
- SecDebugLogLevel 3
- SecRequestBodyAccess On
- SecResponseBodyAccess Off
# set default error handling
- SecDefaultAction "phase:1,log,auditlog,deny,status:403"
- SecDefaultAction "phase:2,log,auditlog,deny,status:403"
# whitelist a trusted server used for end-to-end testing
- SecRule REMOTE_ADDR "@ipMatch 100.100.100.100" "id:1237,phase:1,allow"
# block access to specific paths
- SecRule REQUEST_URI "@rx \/web\/database\/.*" "id:1239,phase:1,log,deny,status:403,msg:'Access Denied'"
# Limit the size of the request body
- SecRequestBodyLimit 5242880 #5M
- SecRequestBodyNoFilesLimit 1048576 #1M
- SecRequestBodyInMemoryLimit 1048576 #1M
# Block SQL injection and XSS attacks
- SecRule ARGS "@detectSQLi" "id:1234,phase:2,log,deny,status:403,msg:'SQL Injection Detected'"
- SecRule ARGS "@detectXSS" "id:1235,phase:2,log,deny,status:403,msg:'XSS Attack Detected'"
# Block upload of files with dangerous extensions
- SecRule FILES_TMPNAMES "@rx \.(exe|bat|cmd|sh|php|pl|py)$" "id:1236,phase:2,log,deny,status:403,msg:'File Type Denied'"
We tried the following measures:
SecRequestBodyAccess Off
SecDefaultAction "phase:2[...]"
, SecRule ARGS "@detectSQLi"
, SecRule ARGS "@detectXSS"
, SecRule FILES_TMPNAMES
)None of the measures above lead to Traefik leveling off at below 16GB of memory usage, albeit disabling request body access and all phase 2 rules made the container gain memory less quickly (1 hour between container restarts in comparison to about 30mins with request body access)
/proc/meminfo
indicates that lots of memory is reserved but inactive.
We're wondering if there's a connection between the max body size and the reserved memory.
Any thoughts on the issue?
Thanks for your dedication to the project!
Same issue here, memory usage grows even with minimal usage until pod is killed.
v0.2.2 has the same issue, tested.
Plan to use this plugin... is the memory leak still an issue?
Yes, unfortunately the problem hasn't been found or fixed.
There is an interesting idea for a workaround here: https://github.com/traefik/yaegi/issues/1590#issuecomment-2270703913
Thanks this works:
https://github.com/madebymode/traefik-modsecurity-plugin?tab=readme-ov-file
This PR is up https://github.com/http-wasm/http-wasm-host-go/pull/86 and hopefully it will help in here.
This PR is up http-wasm/http-wasm-host-go#86 and hopefully it will help in here.
Hey @jcchavezs how can I get the fix running?
Do you have a timeline for releasing the new version (0.2.3) of the coraza waf
plugin with this fix?
We tested the newest version v0.3.0
on a small infrastructure. Typically, Traefik uses less than 100 MB of RAM in this setup. Once we enabled the plugin and configured the CRS and OWASP rules, it seemed to exhibit the same behaviour with significantly higher memory usage, going up to 2.5 GB with only a few HTTP requests reaching the reverse proxy. I assume this means the memory issue still persists?
Adding some details for reference:
v3.2.0
v0.3.0
Kubernetes
plugin:
coraza:
crsEnabled: true
directives:
- Include @coraza.conf-recommended
- Include @crs-setup.conf.example
- Include @owasp_crs/*.conf
- SecRuleEngine On
I'm not very experienced with Go or memory profiling, so this is the extent of what I can do. But I wouldn't mind testing it again once there are more fixes!
I am not quite sure if it is related, but when I tested v3.0.0 with traefik v3.2.0 the coraza plugin significantly increased response time and CPU usage. When a normal request comes in without the plugin, it takes about 10ms (95percentile). When the plugin is involved requests take about 1000ms (95percentile). For the CPU usage without the plugin we observe around 10% , however with the plugin involved during the processing of requests ist spikes to aroun 70%. (I tested on a virtual machine with 12 Cores.)
Memory is also increasing which is why I think it might all be related in some way. Interestingly enough, when requesting the same url several times in a row, response time decreases (is there some sort of caching) and after trying again a few minutes later it is back to 1000ms.
Details:
v3.2.0
v0.3.0"
Docker
plugin:
coraza-waf:
directives:
# - SecDebugLog /dev/stdout
# - SecDebugLogLevel 9
- SecRule REQUEST_URI "@streq /admin" "id:101,phase:1,log,deny,status:403"
# Allow some additional HTTP methods:
# - SecAction "id:900200,phase:1,pass,t:none,nolog,setvar:'tx.allowed_methods=GET HEAD POST OPTIONS PUT PATCH DELETE CHECKOUT COPY LOCK MERGE MKACTIVITY MKCOL MOVE PROPFIND PROPPATCH UNLOCK REPORT'"
# Allow some additional request content-types:
- SecAction "id:900220,phase:1,pass,t:none,nolog,setvar:'tx.allowed_request_content_type=|application/x-www-form-urlencoded| |multipart/form-data| |multipart/related| |text/xml| |application/xml| |application/soap+xml| |application/json| |application/cloudevents+json| |application/cloudevents-batch+json| |text/plain| |application/proto|'"
- SecRequestBodyAccess Off #Fix according to https://github.com/jcchavezs/coraza-http-wasm-traefik/issues/9#issuecomment-2146919384
- Include @coraza.conf-recommended
- Include @crs-setup.conf.example
- Include @owasp_crs/**.conf
- SecRuleEngine On
I am currently trying to implement the coraza plugin into traefik, which sits behind a cloudflare tunnel for external access.
As soon as I activate the middleware for the services traefik starts using a lot of memory. I increased the allowed memory usage of traefik to 4 GB, which were immediately consumed after navigating two times. After the third, Traefik fails with an OOM exception and restarts.
I can't imagine that these kind of high memory usages are expected. There also seems to be an ongoing discussion about the same topic here, where Traefik even seems to consume about 32 GB of memory.
I removed most other configuration, since they shouldn't be relevant but this is the config I have the Traefik running with:
I intentionally removed the other two include-directives, but even with such a barebone setting I receive an OOM after a handful of requests.