zeek / packet-bricks

A netmap-based packet layer for distributing and filtering traffic.
Other
176 stars 30 forks source link

no way to keep daemon running #4

Open StewGoin opened 9 years ago

StewGoin commented 9 years ago

I've tried a bunch of ways to keep the bricks-server up and running, but it seems like as soon as it finishes running the file specific in -f it exits the bricks process (since there is no drop to bricks console).

If I toss a loop into the lua script, it pegs a core completely and the packet engine doesn't load balance.

shirkdog commented 9 years ago

I have seen the same issue on FreeBSD 10/11-Current

ajamshed commented 9 years ago

I tested bricks-server (daemon-mode). It did not give me any issues. I have a few follow-up questions:

1- In your script file (startup-one-thread.lua), did you uncomment lines 140 and 142? If you did not, then the bricks server will simply parse the file (function definitions) and then will gracefully terminate the process. Please do not uncomment lines 144 and 146 since this will destroy the engine and kill the bricks daemon.

2- If you are correctly following step 1, then something else is causing the problem. Can you please edit src/main.c line 200:

if ((fd = open("/dev/null", O_RDWR)) >= 0) {

to:

if ((fd = open("/tmp/bricks.log", O_CREAT | O_RDWR)) >= 0) {

and then recompile the entire program with debugging flags (gmake clean; gmake debug)? This will create a debugging log file named /tmp/bricks.log (caution: it will dump a LOT of lines). Please email me the log.

shirkdog commented 9 years ago

On FreeBSD 11 (technically hardenedbsd 11):

Everything looks fine in /tmp/bricks.log, until the end:

 <<< [share_packets(): 331] 
 <<< [flush_all_cnodes(): 418]  
 <<< [netmap_callback(): 546] 
 >>> [status_print(): 218] 
 >>> [is_pktengine_online(): 487] 
 >>> [pktengines_list_stats(): 452] 
 <<< [pktengines_list_stats(): 477] 
 <<< [status_print(): 240] 
 [start_listening_reqs(): 160]>> ERROR!!: kqueue error
 <<< [start_listening_reqs(): 160] 

I will email you the bricks.log file. I made your changes to make sure init() and start() run, with the following configured for the simple load balancer (interface em1):

function C:simple_lbconfig(pe)
         local lb = Brick.new("LoadBalancer", 2)
         lb:connect_input("em1")
         lb:connect_output("em1{0", "em1{1", "em1{2", "em1{3")
         -- now link it!
         pe:link(lb)
end

debug output on command line:

./bin/bricks-server -f scripts/startup-one-thread.lua                                                                                                                                                                                            
>>> [      main(): 280] 
[      main(): 310] _debuginfo_: Taking file scripts/startup-one-thread.lua as startup
>>> [load_lua_file(): 130] 
<<< [load_lua_file(): 138] 
>>> [init_modules(): 114] 
[init_modules(): 116] _debuginfo_: Initializing the engines module 
>>> [pktengine_init(): 160] 
<<< [pktengine_init(): 165] 
>>> [interface_init():  40] 
<<< [interface_init():  45] 
>>> [initBricks(): 137] 
<<< [initBricks(): 155] 
<<< [init_modules(): 121] 
>>> [do_daemonize(): 190] 
<<< [do_daemonize(): 208] 
>>> [print_status_file(): 247] 
<<< [print_status_file(): 250] 
<<< [      main(): 365] 
>>> [clean_exit():  81] 
<<< [clean_exit():  89] 

I also turned on netmap verbose mode and there are no errors reported, only that the pipes are disconnected from em1:

Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.454384 [1727] netmap_interp_ringid      em1: tx [0,1) rx [0,1) id 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.454519 [ 464] nm_mem_assign_group       iommu_group 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.493839 [2998] netmap_reset              em1 TX0 hwofs 0 -> 0, hwtail 255 -> 255
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.493863 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.632906 [2998] netmap_reset              em1 RX0 hwofs 0 -> 0, hwtail 0 -> 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.632930 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633266 [ 562] netmap_mmap_single        cdev 0xfffff80002390400 foff 0 size 343019520 objp 0xfffffe003cea49d8 prot 3
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633298 [ 470] netmap_dev_pager_ctor     handle 0xfffff80002508980 size 343019520 prot 3 foff 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633432 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633451 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633469 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633485 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633530 [1727] netmap_interp_ringid      em1: tx [0,1) rx [0,1) id 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633548 [ 464] nm_mem_assign_group       iommu_group 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815425 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815441 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815453 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815463 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864432 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864475 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864515 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848  
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864758 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864790 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848  
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 677.965118 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 677.965135 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848  
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165516 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165531 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165543 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165553 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179629 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179645 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179657 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179666 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.312217 [3050] netmap_common_irq         received RX queue 0

..snip..

Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.437508 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.437529 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.437539 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663747 [1727] netmap_interp_ringid      em1{0: tx [0,1) rx [0,1) id 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663771 [ 464] nm_mem_assign_group       iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663854 [1727] netmap_interp_ringid      em1{1: tx [0,1) rx [0,1) id 1
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663865 [ 464] nm_mem_assign_group       iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663946 [1727] netmap_interp_ringid      em1{2: tx [0,1) rx [0,1) id 2
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663957 [ 464] nm_mem_assign_group       iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.664028 [1727] netmap_interp_ringid      em1{3: tx [0,1) rx [0,1) id 3
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.664039 [ 464] nm_mem_assign_group       iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.665091 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.665102 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.665113 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.685204 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48

..snip..

Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.283306 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.283315 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.283346 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285098 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285113 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285127 [3050] netmap_common_irq         received TX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285137 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285170 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285327 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285428 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285483 [3050] netmap_common_irq         received RX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285493 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285505 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.286095 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.286110 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.450805 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.450823 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.450845 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.469163 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.469181 [ 678] freebsd_selwakeup         on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.469213 [ 678] freebsd_selwakeup         on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572907 [1607] netmap_mem_global_deref   active = 5
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572920 [ 914] netmap_do_unregif         deleting last instance for em1{0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572943 [1607] netmap_mem_global_deref   active = 4
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572952 [ 914] netmap_do_unregif         deleting last instance for em1{1
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572971 [1607] netmap_mem_global_deref   active = 3
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572980 [ 914] netmap_do_unregif         deleting last instance for em1{2
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573001 [1607] netmap_mem_global_deref   active = 2
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573010 [ 621] netmap_close              dev 0xfffff80002390400 fflag 0x3 devtype 8192 td 0xfffff80002f4f9c0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573026 [ 914] netmap_do_unregif         deleting last instance for em1{3
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573042 [1607] netmap_mem_global_deref   active = 1
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573148 [ 486] netmap_dev_pager_dtor     handle 0xfffff80002508980  
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573158 [ 914] netmap_do_unregif         deleting last instance for em1
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.690914 [1607] netmap_mem_global_deref   active = 0
shirkdog commented 9 years ago

I did an install of FreeBSD 10, with the same test setup from your recommendation, and it appears to daemonize and function with tcpdump.

However, in testing with tcpdump with em1{0 and em1{1, there is a state when detaching causes a kernel panic (on 10 stable, separate issue maybe).

I verified that this kqueue error occurs on FreeBSD 11 and HardenedBSD 11 only.

shirkdog commented 9 years ago

note: all testing involved the single-thread lua scripts.

ajamshed commented 9 years ago

Okay, Thanks! It seems that the netmap I/O does not play well with epoll()/kqueue() system calls. I may have to end up revising the networking backend of the bricks soon. For the time being, I suggest that you replace line 362 of src/main.c:

if (pc_info.daemonize && rc == 0) start_listening_reqs();

to:

if (pc_info.daemonize && rc == 0) {while(1) sleep(1);}

This is a dirty hack but I do plan to start revising the bricks framework in the coming days.

sethhall commented 9 years ago

Netmap should play well with kqueue. The netmap team has indicated that their epoll support isn't complete yet though.

Luigi will be in Berkeley next month and we can discuss these sorts of troubles more then.

ajamshed commented 9 years ago

Okay. I will narrow down the issue. Let me check if the same problem occurs on a Linux machine as well.

shirkdog commented 9 years ago

I tried your fix on FreeBSD 11 (HardenedBSD)

if (pc_info.daemonize && rc == 0) {while(1) sleep(1);}

and it resolves the issue. The single thread brick ran for an hour, and then I tested running bro with 4 interfaces on an under-powered VM and things seem stable. The multi-thread setup starts and runs, but seems to only use 2 threads in the brick process. I have also restarted bro several times, to ensure I am not seeing kernel panics when leaving netmap mode.

I sent a pull request to fix the FreeBSD compile issue, but let me know if you need testing for additional fixes.

Thanks for your help and clarification on this.

ajamshed commented 9 years ago

I updated the backend code recently. bricks-server in FreeBSD should now work fine. Let me know if you see any other issues