Open StewGoin opened 9 years ago
I have seen the same issue on FreeBSD 10/11-Current
I tested bricks-server (daemon-mode). It did not give me any issues. I have a few follow-up questions:
1- In your script file (startup-one-thread.lua), did you uncomment lines 140 and 142? If you did not, then the bricks server will simply parse the file (function definitions) and then will gracefully terminate the process. Please do not uncomment lines 144 and 146 since this will destroy the engine and kill the bricks daemon.
2- If you are correctly following step 1, then something else is causing the problem. Can you please edit src/main.c line 200:
if ((fd = open("/dev/null", O_RDWR)) >= 0) {
to:
if ((fd = open("/tmp/bricks.log", O_CREAT | O_RDWR)) >= 0) {
and then recompile the entire program with debugging flags (gmake clean; gmake debug)? This will create a debugging log file named /tmp/bricks.log (caution: it will dump a LOT of lines). Please email me the log.
On FreeBSD 11 (technically hardenedbsd 11):
Everything looks fine in /tmp/bricks.log, until the end:
<<< [share_packets(): 331]
<<< [flush_all_cnodes(): 418]
<<< [netmap_callback(): 546]
>>> [status_print(): 218]
>>> [is_pktengine_online(): 487]
>>> [pktengines_list_stats(): 452]
<<< [pktengines_list_stats(): 477]
<<< [status_print(): 240]
[start_listening_reqs(): 160]>> ERROR!!: kqueue error
<<< [start_listening_reqs(): 160]
I will email you the bricks.log file. I made your changes to make sure init() and start() run, with the following configured for the simple load balancer (interface em1):
function C:simple_lbconfig(pe)
local lb = Brick.new("LoadBalancer", 2)
lb:connect_input("em1")
lb:connect_output("em1{0", "em1{1", "em1{2", "em1{3")
-- now link it!
pe:link(lb)
end
debug output on command line:
./bin/bricks-server -f scripts/startup-one-thread.lua
>>> [ main(): 280]
[ main(): 310] _debuginfo_: Taking file scripts/startup-one-thread.lua as startup
>>> [load_lua_file(): 130]
<<< [load_lua_file(): 138]
>>> [init_modules(): 114]
[init_modules(): 116] _debuginfo_: Initializing the engines module
>>> [pktengine_init(): 160]
<<< [pktengine_init(): 165]
>>> [interface_init(): 40]
<<< [interface_init(): 45]
>>> [initBricks(): 137]
<<< [initBricks(): 155]
<<< [init_modules(): 121]
>>> [do_daemonize(): 190]
<<< [do_daemonize(): 208]
>>> [print_status_file(): 247]
<<< [print_status_file(): 250]
<<< [ main(): 365]
>>> [clean_exit(): 81]
<<< [clean_exit(): 89]
I also turned on netmap verbose mode and there are no errors reported, only that the pipes are disconnected from em1:
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.454384 [1727] netmap_interp_ringid em1: tx [0,1) rx [0,1) id 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.454519 [ 464] nm_mem_assign_group iommu_group 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.493839 [2998] netmap_reset em1 TX0 hwofs 0 -> 0, hwtail 255 -> 255
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.493863 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.632906 [2998] netmap_reset em1 RX0 hwofs 0 -> 0, hwtail 0 -> 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.632930 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633266 [ 562] netmap_mmap_single cdev 0xfffff80002390400 foff 0 size 343019520 objp 0xfffffe003cea49d8 prot 3
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633298 [ 470] netmap_dev_pager_ctor handle 0xfffff80002508980 size 343019520 prot 3 foff 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633432 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633451 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633469 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633485 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633530 [1727] netmap_interp_ringid em1: tx [0,1) rx [0,1) id 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.633548 [ 464] nm_mem_assign_group iommu_group 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815425 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815441 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815453 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1282] 677.815463 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864432 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864475 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864515 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864758 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:17 ev1lbr0 kernel: [1283] 677.864790 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 677.965118 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 677.965135 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165516 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165531 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165543 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.165553 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179629 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179645 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179657 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.179666 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:18 ev1lbr0 kernel: [1283] 678.312217 [3050] netmap_common_irq received RX queue 0
..snip..
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.437508 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.437529 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.437539 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663747 [1727] netmap_interp_ringid em1{0: tx [0,1) rx [0,1) id 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663771 [ 464] nm_mem_assign_group iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663854 [1727] netmap_interp_ringid em1{1: tx [0,1) rx [0,1) id 1
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663865 [ 464] nm_mem_assign_group iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663946 [1727] netmap_interp_ringid em1{2: tx [0,1) rx [0,1) id 2
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.663957 [ 464] nm_mem_assign_group iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.664028 [1727] netmap_interp_ringid em1{3: tx [0,1) rx [0,1) id 3
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.664039 [ 464] nm_mem_assign_group iommu_group 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.665091 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.665102 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.665113 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:19 ev1lbr0 kernel: [1284] 679.685204 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
..snip..
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.283306 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.283315 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.283346 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285098 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285113 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285127 [3050] netmap_common_irq received TX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285137 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285170 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285327 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285428 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285483 [3050] netmap_common_irq received RX queue 0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285493 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.285505 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.286095 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.286110 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.450805 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.450823 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.450845 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.469163 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.469181 [ 678] freebsd_selwakeup on knote 0xfffff80002697848
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.469213 [ 678] freebsd_selwakeup on knote 0xfffff80002697c48
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572907 [1607] netmap_mem_global_deref active = 5
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572920 [ 914] netmap_do_unregif deleting last instance for em1{0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572943 [1607] netmap_mem_global_deref active = 4
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572952 [ 914] netmap_do_unregif deleting last instance for em1{1
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572971 [1607] netmap_mem_global_deref active = 3
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.572980 [ 914] netmap_do_unregif deleting last instance for em1{2
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573001 [1607] netmap_mem_global_deref active = 2
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573010 [ 621] netmap_close dev 0xfffff80002390400 fflag 0x3 devtype 8192 td 0xfffff80002f4f9c0
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573026 [ 914] netmap_do_unregif deleting last instance for em1{3
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573042 [1607] netmap_mem_global_deref active = 1
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573148 [ 486] netmap_dev_pager_dtor handle 0xfffff80002508980
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.573158 [ 914] netmap_do_unregif deleting last instance for em1
Sep 30 15:01:29 ev1lbr0 kernel: [1294] 689.690914 [1607] netmap_mem_global_deref active = 0
I did an install of FreeBSD 10, with the same test setup from your recommendation, and it appears to daemonize and function with tcpdump.
However, in testing with tcpdump with em1{0 and em1{1, there is a state when detaching causes a kernel panic (on 10 stable, separate issue maybe).
I verified that this kqueue error occurs on FreeBSD 11 and HardenedBSD 11 only.
note: all testing involved the single-thread lua scripts.
Okay, Thanks! It seems that the netmap I/O does not play well with epoll()/kqueue() system calls. I may have to end up revising the networking backend of the bricks soon. For the time being, I suggest that you replace line 362 of src/main.c:
if (pc_info.daemonize && rc == 0) start_listening_reqs();
to:
if (pc_info.daemonize && rc == 0) {while(1) sleep(1);}
This is a dirty hack but I do plan to start revising the bricks framework in the coming days.
Netmap should play well with kqueue. The netmap team has indicated that their epoll support isn't complete yet though.
Luigi will be in Berkeley next month and we can discuss these sorts of troubles more then.
Okay. I will narrow down the issue. Let me check if the same problem occurs on a Linux machine as well.
I tried your fix on FreeBSD 11 (HardenedBSD)
if (pc_info.daemonize && rc == 0) {while(1) sleep(1);}
and it resolves the issue. The single thread brick ran for an hour, and then I tested running bro with 4 interfaces on an under-powered VM and things seem stable. The multi-thread setup starts and runs, but seems to only use 2 threads in the brick process. I have also restarted bro several times, to ensure I am not seeing kernel panics when leaving netmap mode.
I sent a pull request to fix the FreeBSD compile issue, but let me know if you need testing for additional fixes.
Thanks for your help and clarification on this.
I updated the backend code recently. bricks-server in FreeBSD should now work fine. Let me know if you see any other issues
I've tried a bunch of ways to keep the bricks-server up and running, but it seems like as soon as it finishes running the file specific in -f it exits the bricks process (since there is no drop to bricks console).
If I toss a loop into the lua script, it pegs a core completely and the packet engine doesn't load balance.