scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.4k stars 1.56k forks source link

httpd with DPDK segmentation fault #99

Open aaalgo opened 8 years ago

aaalgo commented 8 years ago

seastar git commit 5cffbabeafa94c3b3725b28b62bc3596b1e20f65 dpdk git commit 3b60ce8cbb959d7a6839f94ad995a3594c07801e OS: CentOS 7 NIC: Intel Corporation 82571EB Gigabit Ethernet Controller [8086:105e](rev 06) GCC is the devtoolset-3-gcc-4.9.

command: ./httpd --network-stack native --dpdk-pmd --dhcp 0 --host-ipv4-addr 10.0.0.20 --netmask-ipv4-addr 255.255.255.0 --collectd 0 --smp 2 --port 10000

Segmentation fault is triggered by another machine trying to wget http://10.0.0.20:ANY_PORT. That is, any port number, will triguer the segmentation fault.

Stack trace is this:

(gdb) bt
#0  operator() (__closure=<optimized out>) at net/net.cc:345
#1  forward_dst<net::interface::dispatch_packet(net::packet)::<lambda()> > (hashfn=<optimized out>,
    src_cpuid=<optimized out>, this=0x6000000ae1c0) at net/net.hh:284
#2  net::interface::dispatch_packet (this=0x60000035ba08, p=...) at net/net.cc:349
#3  0x0000000000589055 in operator() (p=..., __closure=<optimized out>) at net/net.cc:276
#4  std::_Function_handler<future<>(net::packet), net::interface::interface(std::shared_ptr<net::device>)::<lambda(net::packet)> >::_M_invoke(const std::_Any_data &, net::packet) (__functor=..., __args#0=...)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
#5  0x000000000047c81e in operator() (__args#0=..., this=<optimized out>)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
#6  do_void_futurize_apply<std::function<future<>(net::packet)>&, net::packet> (func=...) at ./core/future.hh:1150
#7  apply<std::function<future<>(net::packet)>&, net::packet> (func=...) at ./core/future.hh:1186
#8  produce (data#0=..., this=0x6000000c1bc8) at ./core/stream.hh:161
#9  net::device::l2receive (this=<optimized out>, p=...) at net/net.hh:266
#10 0x000000000048080d in dpdk::dpdk_qp<false>::process_packets (this=0x6000000c1b00, bufs=<optimized out>,
    count=<optimized out>) at net/dpdk.cc:2085
#11 0x000000000048093a in std::unique_ptr<reactor::pollfn, std::default_delete<std::unique_ptr> > reactor::make_pollfn<dpdk::dpdk_qp<false>::rx_start()::{lambda()#1}>(dpdk::dpdk_qp<false>::rx_start()::{lambda()#1}&&)::the_pollfn::poll() () at net/dpdk.cc:2108
#12 0x000000000050c4f9 in poll_once (this=0x6000001b9000) at core/reactor.cc:1689
#13 reactor::run (this=0x6000001b9000) at core/reactor.cc:1638
#14 0x0000000000568b63 in app_template::run_deprecated(int, char**, std::function<void ()>&&) (
    this=this@entry=0x7fffffffe360, ac=ac@entry=16, av=av@entry=0x7fffffffe598,
    func=func@entry=<unknown type in /home/wdong/seastar/build/release/apps/httpd/httpd, CU 0x1334ef0, DIE 0x13e10eb>) at core/app-template.cc:123
#15 0x0000000000418741 in main (ac=16, av=0x7fffffffe598) at apps/httpd/main.cc:89

The variable "data" is

$7 = {data = "\n\000\000\n\n\000\000\024\227%\003\350", '\000' <repeats 51 times>, end_idx = 12}

Other variables are optimized out.

I'm not able to build the debug version as I have trouble installing libasan.

I can get rid of the segmentation fault and make the program run normally by making the following change. So it seems to me ether this or this->_dev is not correct.

diff --git a/net/net.cc b/net/net.cc
index 74bad22..c2cd2cf 100644
--- a/net/net.cc
+++ b/net/net.cc
@@ -342,7 +342,8 @@ void interface::forward(unsigned cpuid, packet p) {
                 } else {
                     forward_hash data;
                     if (l3.forward(data, p, sizeof(eth_hdr))) {
-                        return toeplitz_hash(rss_key(), data);
+                        //return toeplitz_hash(rss_key(), data);
+                        return toeplitz_hash(default_rsskey_40bytes, data);
                     }
                     return 0u;
                 }

Hope this information is helpful.

Best regards.

avikivity commented 8 years ago

@gleb-cloudius review please

gleb-cloudius commented 8 years ago

The surrounding code looks like this:

            auto fw = _dev->forward_dst(engine().cpu_id(), [&p, &l3, this] () {
                auto hwrss = p.rss_hash();
                if (hwrss) {
                    return hwrss.value();
                } else {
                    forward_hash data;
                    if (l3.forward(data, p, sizeof(eth_hdr))) {
                        return toeplitz_hash(rss_key(), data);
                    }
                    return 0u;
                }
            });

so we know _dev is valid and, according to the trace, is equal 0x6000000ae1c0.