snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

RawSocket usage #928

Open adw555 opened 8 years ago

adw555 commented 8 years ago

Back in October I modifed the example_spray program to use an RawSocket as the input rather than the PcapReader. All worked well, but when I run the exact same code with the latest version of snabb switch, I get an initial burst of packets then none after that. See below for output. I have the program running in a loop and outputting the stats every 60seconds

October version of snabb switch - you can see the packet counts increasing every minute.

test:/tmp$ sudo ./snabb example_spray eth0 /tmp/test3.pcap Main Report: link report: 1,583 sent on capture.tx -> spray_app.input (loss rate: 0%) 791 sent on spray_app.output -> output_file.input (loss rate: 0%) Main Report: link report: 3,106 sent on capture.tx -> spray_app.input (loss rate: 0%) 1,553 sent on spray_app.output -> output_file.input (loss rate: 0%) Main Report: link report: 4,451 sent on capture.tx -> spray_app.input (loss rate: 0%) 2,225 sent on spray_app.output -> output_file.input (loss rate: 0%)

Latest version of snabb switch- you can see the counts do not increase and are smaller than above.

test:/tmp$ sudo ./snabb example_spray eth0 /tmp/test4.pcap Main Report: link report: 27 sent on capture.tx -> spray_app.input (loss rate: 0%) 13 sent on spray_app.output -> output_file.input (loss rate: 0%) Main Report: link report: 27 sent on capture.tx -> spray_app.input (loss rate: 0%) 13 sent on spray_app.output -> output_file.input (loss rate: 0%) Main Report: link report: 27 sent on capture.tx -> spray_app.input (loss rate: 0%) 13 sent on spray_app.output -> output_file.input (loss rate: 0%)

Has RawSocket changed in that way I'm supposed to use it? I can see it was rewritten to use ljsyscall in November.

dpino commented 8 years ago

@adw555 Yes, I rewrote RawSocket to use ljsyscall in January, although I think it has been patched a few times later. When I rewrote I only validated the selftest kept consistent bu I didn't try example_spray. I will take a look. Thanks for reporting!

dpino commented 8 years ago

@adw555 I have checked that the amount of transmitted packet is significantly lower in the May release than in the October release. I run example_spray reading a pcap file, instead of an interface, and writing to a file.

In my tests I used a pcap file of 40K packets:

$ capinfos v4v6.pcap | grep "Number"
Number of packets:   40 k

Then I run example_spray from October to May.

$ sudo ./snabb example_spray v4v6.pcap /tmp/output.pcap

Here are my results:

v2015.10
link report:
              40,000 sent on capture.output -> spray_app.input (loss rate: 0%)
              20,000 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2015.11
link report:
              40,000 sent on capture.output -> spray_app.input (loss rate: 0%)
              20,000 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2015.12
link report:
              40,000 sent on capture.output -> spray_app.input (loss rate: 0%)
              20,000 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2016.01
link report:
              36,465 sent on capture.output -> spray_app.input (loss rate: 0%)
              18,232 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2016.02
link report:
              39,780 sent on capture.output -> spray_app.input (loss rate: 0%)
              19,890 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2016.03
link report:
              40,000 sent on capture.output -> spray_app.input (loss rate: 0%)
              20,000 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2016.04
link report:
              40,000 sent on capture.output -> spray_app.input (loss rate: 0%)
              20,000 sent on spray_app.output -> output_file.input (loss rate: 0%)
v2016.05
link report:
               9,690 sent on capture.output -> spray_app.input (loss rate: 0%)
               4,845 sent on spray_app.output -> output_file.input (loss rate: 0%)

Could you give it a try to the April release? If it's working OK, then there's something that has changed in v2016.05.

eugeneia commented 8 years ago

That is interesting since only the selftest changed between 04 and 05, e.g. must be a change in a dependency.

% git diff v2016.04...v2016.05 apps/socket/
diff --git a/src/apps/socket/raw.lua b/src/apps/socket/raw.lua
index f7e6dc7..a6066b3 100644
--- a/src/apps/socket/raw.lua
+++ b/src/apps/socket/raw.lua
@@ -80,16 +80,20 @@ end
 function selftest ()
    -- Send a packet over the loopback device and check
    -- that it is received correctly.
-   -- XXX Beware of a race condition with unrelated traffic over the
-   -- loopback device.
    local datagram = require("lib.protocol.datagram")
    local ethernet = require("lib.protocol.ethernet")
    local ipv6 = require("lib.protocol.ipv6")
-
-   -- Initialize RawSocket.
-   local lo = RawSocket:new("lo")
-   lo.input, lo.output = {}, {}
-   lo.input.rx, lo.output.tx = link.new("test1"), link.new("test2")
+   local Match = require("apps.test.match").Match
+
+   -- Initialize RawSocket and Match.
+   local c = config.new()
+   config.app(c, "lo", RawSocket, "lo")
+   config.app(c, "match", Match, {fuzzy=true})
+   config.link(c, "lo.tx->match.rx")
+   engine.configure(c)
+   local link_in, link_cmp = link.new("test_in"), link.new("test_cmp")
+   engine.app_table.lo.input.rx = link_in
+   engine.app_table.match.input.comparator = link_cmp
    -- Construct packet.
    local dg_tx = datagram:new()
    local src = ethernet:pton("02:00:00:00:00:01")
@@ -99,22 +103,16 @@ function selftest ()
                         dst = localhost,
                         next_header = 59, -- No next header.
                         hop_limit = 1}))
-   dg_tx:push(ethernet:new({src = src, 
-                            dst = dst, 
+   dg_tx:push(ethernet:new({src = src,
+                            dst = dst,
                             type = 0x86dd}))
-   -- Transmit packet.
-   link.transmit(lo.input.rx, dg_tx:packet())
-   lo:push()
-   -- Receive packet.
-   lo:pull()
-   local dg_rx = datagram:new(link.receive(lo.output.tx), ethernet)
-   -- Assert packet was received OK.
-   assert(dg_rx:parse({{ethernet, function(eth)
-      return(eth:src_eq(src) and eth:dst_eq(dst) and eth:type() == 0x86dd)
-   end }, { ipv6, function(ipv6)
-      return(ipv6:src_eq(localhost) and ipv6:dst_eq(localhost))
-   end } }), "loopback test failed")
-   lo:stop()
+   -- Transmit packets.
+   link.transmit(link_in, dg_tx:packet())
+   link.transmit(link_cmp, packet.clone(dg_tx:packet()))
+   engine.app_table.lo:push()
+   -- Run engine.
+   engine.main({duration = 0.01, report = {showapps=true,showlinks=true}})
+   assert(#engine.app_table.match:errors() == 0)
    print("selftest passed")

    -- XXX Another useful test would be to feed a pcap file with
adw555 commented 8 years ago

For me, working from the latest release backwards, using RawSocket I don't get correct execution until the snabb-2016.02 release. For all releases after February, the packet throughput is minimal for my application.

adw555 commented 8 years ago

Just some figures to illustrate. My application is listening on eth0 using a RawSocket and piping the packets to the input of my application. Working correctly I will get just over a 1000 packets in a minute on this test server. You can seel all is well in the February release, but not from March onwards:

snabb-2016.02

Main Report: link report: 1,038 sent on interface.tx -> amqp_app.input (loss rate: 0%)

snabb-2016.03

Main Report: link report: 0 sent on interface.tx -> amqp_app.input (loss rate: 0%)

snabb-2016.04

Main Report: link report: 0 sent on interface.tx -> amqp_app.input (loss rate: 0%)

snabb-2016.04.1

Main Report: link report: 15 sent on interface.tx -> amqp_app.input (loss rate: 0%)

snabb-2016.05

Main Report: link report: 15 sent on interface.tx -> amqp_app.input (loss rate: 0%)

dpino commented 8 years ago

I don't manage to run example_spray on eth0. Maybe I'm doing something wrong. This is how I try to run it:

(v2016.02) $ sudo ./snabb example_spray eth0 /tmp/output.pcap
lib/pcap/pcap.lua:56: Unable to open file: eth0
stack traceback:
        core/main.lua:126: in function <core/main.lua:124>
        [C]: in function 'error'
        lib/pcap/pcap.lua:56: in function 'records'
        apps/pcap/pcap.lua:13: in function 'new'

It works though on a tap interface.

$ sudo ip tuntap add tap0 mode tap
$ $ sudo ./snabb example_spray tap0 /tmp/output.pcap
link report:
                  21 sent on capture.output -> spray_app.input (loss rate: 0%)
                  10 sent on spray_app.output -> output_file.input (loss rate: 0%)

Any hints?

adw555 commented 8 years ago

So I changed the standard example_spray code to use a RawSocket instead of a PcapReader as it then does the same as my real application. You see below I've commented out the pcapReader and used a RawSocket instead. You can see the difference between 2016-02 and 2016-03 release results.

module(..., package.seeall)

local pcap = require("apps.pcap.pcap") local sprayer = require("program.example_spray.sprayer") local raw = require("apps.socket.raw")

function run (parameters) if not (#parameters == 2) then print("Usage: example_spray ") main.exit(1) end local input = parameters[1] local output = parameters[2]

local c = config.new() --config.app(c, "capture", pcap.PcapReader, input) config.app(c, "capture", raw.RawSocket, input) config.app(c, "spray_app", sprayer.Sprayer) config.app(c, "output_file", pcap.PcapWriter, output)

config.link(c, "capture.tx -> spray_app.input") config.link(c, "spray_app.output -> output_file.input")

engine.configure(c) engine.main({duration=60, report = {showlinks=true}}) end

test:~/Workspace/snabb-2016.02$ sudo src/snabb example_spray eth0 /tmp/test1.pcap link report: 1,485 sent on capture.tx -> spray_app.input (loss rate: 0%) 742 sent on spray_app.output -> output_file.input (loss rate: 0%)

test:~/Workspace/snabb-2016.03$ sudo src/snabb example_spray eth0 /tmp/test5.pcap link report: 0 sent on capture.tx -> spray_app.input (loss rate: 0%) 0 sent on spray_app.output -> output_file.input (loss rate: 0%)

dpino commented 8 years ago

I found out the reason for the "regression" between v2016.04 and v2016.05. The regression was introduced in #882 when max_packets increased from 1e5 to 1e6. 1e5 gets better results in this case.

v2016.04

(v2016.04) $ sudo ./snabb snsh -p example_spray v4v6.pcap /tmp/output.pcap
link report:
              40,000 sent on capture.output -> spray_app.input (loss rate: 0%)
              20,000 sent on spray_app.output -> output_file.input (loss rate: 0%)

v2016.05

(v2016.05) $ sudo ./snabb snsh  -p example_spray v4v6.pcap /tmp/output.pcap
link report:
              15,300 sent on capture.output -> spray_app.input (loss rate: 0%)
               7,650 sent on spray_app.output -> output_file.input (loss rate: 0%)

v4v6.pcap is a mix of IPv4 and IPv6 packets. It contains 40K packets. The file can be downloaded here: http://http://people.igalia.com/dpino/v4v6.pcap

I also increased the duration of example_spray up to 10 seconds, to give it time to the script to process the whole file.