Closed splitice closed 8 years ago
Hey, thanks for checking it out! Super exciting to see people interested in this!
_log()
calls in an if statement (instead of having the if check in the function itself). I may also look into that, but it seems like a lot of work for not much gain (I'm a sysadmin, I'm lazy ;)). If you're really looking to squeeze out performance, you can strip out the debug calls entirely:sed -i '/\s*_log(/d' fw.lua
But I know that's not the best way to do things ;)
Test::Nginx::Socket
, but frankly, that idea bores me (again, I'm a sysadmin, I'm lazy). Contributions welcome if you want to add some testing into the project, or if you do some research on your own and want to share I would love to hear it :)Please feel free to poke me with any more questions you have, I'm always happy to chat!
hehe Sysadmin here too. I very much agree with your sentiment, pending a your answer was planning on just doing a command like that to remove it then :)
A few tests, even just as a module that can be called would be a good idea from my perspective to make sure I don't break too much in the early states while poking it. I am happy to write a Travis CI project to CI it.
Currently I am looking into a few areas:
Awesome, if you do end up writing a CI project let me know :)
Currently I have a lot of ideas and its only been a day. My main experience with lua in nginx is writing a Layer7 DDoS mitigation environment so I've had to do a lot of work in the performance area.
Given our environment we need it to scale to some pretty big extremes, many server blocks (and corresponding enabled and disabled rules) and during some pretty bad days we expect to see many thousands of req/s and we don't wish to take too much CPU & RAM away from the rest of the system should anything leak from mitigation.
So far I have found a few things that come to mind (so far) -
_table_
methods taking a reference to the main WAF seems a bit backwards. I'd like to look at a code cleanup in that area. Given table creation isn't cheap it might be possible to reduce the dependency on these functions anyway.lookup
tables as a member of the module so I can add/remove. Nothing complex needed.Test::Nginx::Socket
test was defined I could take it from there. Blatant bribery there, perl urgh.I am starting with some of the easier stuff at the moment (performance-tweaking branch), small wins that don't impact much (function, code quality, loc etc). I figure that's the easiest way to get a full understanding. I fully expect you to have problems with some of it though so don't worry. Just tell me I am silly if I change something you did on purpose or disagree with, I wont bite.
I fully acknowledge our requirements are not yours and I am sure not all of it will be what you want, or even be the entirely right direction (reverting is learning too). Where possible I'll try and get it to a state that you can merge it in, since its less for me to maintain in the long run as a patch. I am going to put some serious hours in this & next week to try and get a PoC solution for testing internally in our system as a whole.
Sounds like you've got a lot in mind ;) I'll address a few things that popped out:
REQUEST_ARGS
, which is used throughout the XSS/SQLi rulesets), it's only transformed once- and my comments have reflected the need to cache this data so we don't waste cycles doing the some operations 100 times in a single request. I've found that, during regular processing, the vast majority of CPU time is spent on regex processing, which is handling by the Nginx Lua API, not our own code. I'd be interested in perhaps looking at some optimized transformations in C via FFI, not sure how we could move transformed data from a separate C-level module (my C isn't very good, which is why this is written in Lua :) )Let me know if you have any questions about the logic or the code itself, I know reading someone else's work isn't always easy (and this has really just been a fun project for me, not serious development work, so I'm sure there are many more people more knowledgeable than I that can contribute).
Progress so far - Requests per second: 6230.39 #/sec Started with: Requests per second: 5227.52 #/sec
Calculated using ab -k -c 20 -n 100000 http://192.168.56.102/
from the same machine (a virtual machine restricted to a single i7 core)
The biggest single improvement is the offloading of work into multiple contexts. The configuration for which looks like -
init_worker_by_lua '
local FreeWAF = require "FreeWAF.fw"
FreeWAF.preload({ 10000, 11000, 20000, 21000, 35000, 40000, 41000, 42000, 90000, 99000 })
-- setup FreeWAF to deny requests that match a rule
FreeWAF:set_option("mode", "ACTIVE")
--fw:set_option("debug", true)
';
server {
...access_by_lua '
local FreeWAF = require "FreeWAF.fw"
-- instantiate a new instance of the module
local fw = FreeWAF:new()
-- run the firewall
fw:exec()
';
}
So far:
Next I'll look at the rule structure, some wins can be found (i.e function pointers instead of table lookups of operators) by "pre-compiling" if its preloaded. Not sure if I will be able to do it easily however without inducing a bit of a hit for non-preloaded rules.
Still - there's much room for improvement when actually executing through a more serious number of rules and transformations.
# ab -k -c 20 -n 100000 http://192.168.56.102/?a=aa
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.102 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: openresty/1.7.10.1
Server Hostname: 192.168.56.102
Server Port: 80
Document Path: /?a=aa
Document Length: 612 bytes
Concurrency Level: 20
Time taken for tests: 100.243 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 99001
Total transferred: 85595005 bytes
HTML transferred: 61200000 bytes
Requests per second: 997.58 [#/sec] (mean)
Time per request: 20.049 [ms] (mean)
Time per request: 1.002 [ms] (mean, across all concurrent requests)
Transfer rate: 833.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 3
Processing: 0 20 187.3 1 2571
Waiting: 0 20 187.3 1 2571
Total: 0 20 187.3 1 2571
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 2
98% 19
99% 24
100% 2571 (longest request)
To give you an idea of the potential gains of pre-processing. fd812587e1a41f9c9a80ab2bd8c07d27dcca74b1 increased rule processing performance by 25%. And locally Ive got another 25% being worked on :)
Requests per second: 1399.23 [#/sec] (mean)
Ultimately pre-processing the entire rule into a lua function call is my ultimate aim - although its difficult with the current structure so I'll focus on the individual components for now.
Last update for me for the night ee3b781f1c6683d31d13d6abc0f0a8a08705a325
# ab -k -c 20 -n 10000 http://192.168.56.102/?a=aa
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.102 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: openresty/1.7.10.1
Server Hostname: 192.168.56.102
Server Port: 80
Document Path: /?a=aa
Document Length: 612 bytes
Concurrency Level: 20
Time taken for tests: 6.871 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 9901
Total transferred: 8559505 bytes
HTML transferred: 6120000 bytes
Requests per second: 1455.40 [#/sec] (mean)
Time per request: 13.742 [ms] (mean)
Time per request: 0.687 [ms] (mean, across all concurrent requests)
Transfer rate: 1216.55 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 0 14 111.8 1 1343
Waiting: 0 14 111.8 1 1343
Total: 0 14 111.8 1 1343
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 12
90% 13
95% 14
98% 15
99% 19
100% 1343 (longest request)
Sorry, been busy with life the last few days (and will be for a few more, but got a small slice of free time). Some of your branches look interesting, I would be interested to see how these changes compare using tools like https://github.com/openresty/nginx-systemtap-toolkit and https://github.com/openresty/stapxx
So would I but alas I have never been able to get systemtap / dtrace to execute. Always freezes on me. Very similarly to what another person experienced on OpenResty's mailing list recently. To be investigated when I have time.
Instead currently I am using a stopwatch module and timing areas of interest. Then using apachebench for benchmarking.
If you have a stable branch/commit you would like to submit I can clone it into my test harness and generate a flame graph and lua exec time reports.
My latest commit is https://github.com/splitice/FreeWAF/commit/ee3b781f1c6683d31d13d6abc0f0a8a08705a325
If any issues are identified in a flame graph I would be happy to fix it.
Sorry I haven't responded in a while, I've been very busy with work and outside responsibilities. The amount of time I'll be able to commit to the project in the near future will be diminished significantly. I'll still try to get to profiling your commits at some point soon, and if you have a stable and solid set of changes you'd like to submit I'd be willing to take a look.
Going to close this as stale. There were good ideas presented here, and we can use this as a discussion base in the future.
Hey,
This project looks quite interesting. Its obvious you have put quite a lot of hard work in already :+1: It is also pretty remarkable the features available given the size of the code. Working on understanding it all currently.
I will continue reading, but I am interested in getting this up and running - and of course contributing back. Feel free to ignore any questions that seem a bit ignorant at all, I haven't got it running yet so I might be mistaken on a point or two.
At a glance this does most of what I need it to do already. And hopefully it will be able to meet our performance requirements (looks to be very very close already).
Thanks in advance for reading all this :)