luigirizzo / netmap

Automatically exported from code.google.com/p/netmap
BSD 2-Clause "Simplified" License
1.86k stars 537 forks source link

netmap_ring_reinit problems when running Zeek with netmap pipes #953

Open kimshrier opened 8 months ago

kimshrier commented 8 months ago

On Linux kernel version 4.19.303.

I have built and installed the kernel module from commit d75ef42c on the master branch, along with the lb application.

I have lb configured to load balance between 18 netmap pipes using the following command: /usr/bin/lb -i eth1 -p zeek:18 -B 131072 -b 2048

When I don't have much traffic being monitored on eth1, everything seems to run fine. However when the amount of traffic is large, > 1Mpps, I see the following messages:

Mar 19 15:06:51 info kernel: [  249.792024] 011.256496 [1750] nm_txsync_prologue        zeek{13 TX0: fail 'head > kring->rtail && head < kring->rhead' h 686 c 2 t 2 rh 1016 rc 1016 rt 2 hc 1016 ht 2
Mar 19 15:06:51 info kernel: [  249.792026] 011.256500 [1853] netmap_ring_reinit        called for zeek{13 TX0
Mar 19 15:06:51 info kernel: [  249.823322] 011.287795 [1750] nm_txsync_prologue        zeek{12 TX0: fail 'head > kring->rtail && head < kring->rhead' h 432 c 8 t 8 rh 1022 rc 1022 rt 8 hc 1022 ht 8
Mar 19 15:06:51 info kernel: [  249.823324] 011.287798 [1853] netmap_ring_reinit        called for zeek{12 TX0
Mar 19 15:06:51 info kernel: [  249.833910] 011.298382 [1750] nm_txsync_prologue        zeek{2 TX0: fail 'head > kring->rtail && head < kring->rhead' h 436 c 6 t 6 rh 1020 rc 1020 rt 6 hc 1020 ht 6
Mar 19 15:06:51 info kernel: [  249.833911] 011.298385 [1853] netmap_ring_reinit        called for zeek{2 TX0
Mar 19 15:06:51 info kernel: [  249.842695] 011.307168 [1853] netmap_ring_reinit        called for zeek{12 TX0
Mar 19 15:06:51 info kernel: [  249.856049] 011.320522 [1853] netmap_ring_reinit        called for zeek{2 TX0
Mar 19 15:06:51 info kernel: [  249.863774] 011.328246 [1853] netmap_ring_reinit        called for zeek{0 TX0
Mar 19 15:06:51 info kernel: [  249.881713] 011.346185 [1750] nm_txsync_prologue        zeek{12 TX0: fail 'head > kring->rtail && head < kring->rhead' h 249 c 6 t 6 rh 1020 rc 1020 rt 6 hc 1020 ht 6
Mar 19 15:06:51 info kernel: [  249.881714] 011.346187 [1853] netmap_ring_reinit        called for zeek{12 TX0
Mar 19 15:06:51 info kernel: [  249.888441] 011.352912 [1750] nm_txsync_prologue        zeek{13 TX0: fail 'head > kring->rtail && head < kring->rhead' h 166 c 4 t 4 rh 1018 rc 1018 rt 4 hc 1018 ht 4
Mar 19 15:06:51 info kernel: [  249.888442] 011.352915 [1853] netmap_ring_reinit        called for zeek{13 TX0
Mar 19 15:06:51 info kernel: [  249.897439] 011.361911 [1750] nm_txsync_prologue        zeek{0 TX0: fail 'head > kring->rtail && head < kring->rhead' h 226 c 2 t 2 rh 1016 rc 1016 rt 2 hc 1016 ht 2
Mar 19 15:06:51 info kernel: [  249.897440] 011.361913 [1853] netmap_ring_reinit        called for zeek{0 TX0
Mar 19 15:06:51 info kernel: [  249.897446] 011.361918 [1750] nm_txsync_prologue        zeek{2 TX0: fail 'head > kring->rtail && head < kring->rhead' h 94 c 1 t 1 rh 1015 rc 1015 rt 1 hc 1015 ht 1
Mar 19 15:06:51 info kernel: [  249.897447] 011.361920 [1853] netmap_ring_reinit        called for zeek{2 TX0
Mar 19 15:06:51 info kernel: [  249.932957] 011.397428 [1750] nm_txsync_prologue        zeek{12 TX0: fail 'head > kring->rtail && head < kring->rhead' h 504 c 4 t 4 rh 1018 rc 1018 rt 4 hc 1018 ht 4
Mar 19 15:06:51 info kernel: [  249.932958] 011.397431 [1853] netmap_ring_reinit        called for zeek{12 TX0
Mar 19 15:06:51 info kernel: [  249.943351] 011.407822 [1853] netmap_ring_reinit        called for zeek{0 TX0
Mar 19 15:06:51 info kernel: [  249.965217] 011.429688 [1853] netmap_ring_reinit        called for zeek{2 TX0
Mar 19 15:06:51 info kernel: [  249.968823] 011.433294 [1853] netmap_ring_reinit        called for zeek{12 TX0
Mar 19 15:06:51 info kernel: [  249.989778] 011.454249 [1853] netmap_ring_reinit        called for zeek{12 TX0
Mar 19 15:06:51 info kernel: [  249.992789] 011.457260 [1853] netmap_ring_reinit        called for zeek{0 TX0

The code in Zeek that reads from the netmap pipe is somewhat dated. Looking at https://github.com/zeek/zeek-netmap.git I see that nothing substantial has changed since 2017.

I modified zeek-netmap to link against libnetmap and use nmport_d, nmport_open, nmport_close, etc. instead of using nm_desc, nm_open, nm_close, etc.. My thinking was that zeek-netmap was out of sync with what is currently being done in the kernel module and lb. However, these changes did not have any apparent effect on the problem. Under light load things seem to work. Under heavier loads, the netmap_ring_reinit messages come back.