scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.28k stars 1.54k forks source link

Memcached bad performance (both using DPDK and posix network stack) #383

Open nlfox opened 6 years ago

nlfox commented 6 years ago

When the load going up (about 10240 request per second) Seastar keep on printing out:

WARN 2018-01-10 13:01:07,196 [shard 1] seastar - Exceptional future ignored: std::overflow_error (sstring overflow), backtrace: 0x581e03, 0x581e61, 0x42b3ca, 0x432934, 0x57e747, 0x5d375a, 0x5d5090, 0x57b24d, 0x74dfa4, 0x7fa4fea76183, 0x7fa4fe7a3bec WARN 2018-01-10 13:01:09,914 [shard 0] seastar - Exceptional future ignored: std::overflow_error (sstring overflow), backtrace: 0x581e03, 0x581e61, 0x42b3ca, 0x432934, 0x57e747, 0x5d375a, 0x641062, 0x41431e, 0x7fa4fe6c7f44, 0x41693a

This issue appears when using both native and posix network stack. And compare with stock memcached, the performance is only 1/2.

What can be possible issue?

nlfox commented 6 years ago

$ sudo ./build/release/apps/memcached/memcached --dpdk-pmd --network-stack native --smp=2 -m 1G --dhcp 0 --host-ipv4-addr 192.168.23.2 --netmask-ipv4-addr 255.255.255.0 [sudo] password for s1679450: EAL: Detected 40 lcore(s) EAL: No free hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: PCI device 0000:04:00.0 on NUMA socket 0 EAL: probe driver: 15b3:1013 net_mlx5 PMD: net_mlx5: PCI information matches, using device "mlx5_0" (SR-IOV: false, MPS: false) PMD: net_mlx5: 1 port(s) detected PMD: net_mlx5: port 1 MAC address is 24:8a:07:96:58:f2 ports number: 1 Port 0: max_rx_queues 65535 max_tx_queues 65535 Port 0: using 2 queues Port 0: RSS table size is 512 LRO is off RX checksum offload supported TX ip checksum offload supported TX TCP&UDP checksum offload supported Port 0 init ... PMD: net_mlx5: 0x9901c0: TX queues number update: 0 -> 2 PMD: net_mlx5: 0x9901c0: RX queues number update: 0 -> 2 done: Creating Tx mbuf pool 'dpdk_pktmbuf_pool0_tx' [1024 mbufs] ... Creating Rx mbuf pool 'dpdk_pktmbuf_pool0_rx' [1024 mbufs] ... Creating Tx mbuf pool 'dpdk_pktmbuf_pool1_tx' [1024 mbufs] ... Creating Rx mbuf pool 'dpdk_pktmbuf_pool1_rx' [1024 mbufs] ... Port 0: Enabling HW FC PMD: net_mlx5: 0x9901c0: filter type (8) not supported Checking link status Created DPDK device done Port 0 Link Up - speed 56000 Mbps - full-duplex seastar memcached v1.0

DPDK output

dorlaor commented 6 years ago

These overflows aren't right. What version/commit are you using? Please post a version with the symbols (debug build)

On Wed, Jan 10, 2018 at 5:48 AM, nlfox notifications@github.com wrote:

When the load going up (about 10240 request per second) Seastar keep on printing out:

WARN 2018-01-10 13:01:07,196 [shard 1] seastar - Exceptional future ignored: std::overflow_error (sstring overflow), backtrace: 0x581e03, 0x581e61, 0x42b3ca, 0x432934, 0x57e747, 0x5d375a, 0x5d5090, 0x57b24d, 0x74dfa4, 0x7fa4fea76183, 0x7fa4fe7a3bec WARN 2018-01-10 13:01:09,914 [shard 0] seastar - Exceptional future ignored: std::overflow_error (sstring overflow), backtrace: 0x581e03, 0x581e61, 0x42b3ca, 0x432934, 0x57e747, 0x5d375a, 0x641062, 0x41431e, 0x7fa4fe6c7f44, 0x41693a

This issue appears when using both native and posix network stack. And compare with stock memcached, the performance is only 1/2.

What can be possible issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scylladb/seastar/issues/383, or mute the thread https://github.com/notifications/unsubscribe-auth/ABp6RYgmKMK1NwJGjEo5UMxu-Tb8Bghgks5tJL-kgaJpZM4RZWD7 .

nlfox commented 6 years ago

I'm using seastar-17.05-branch. The interesting thing is, if I switch to debug build, the problem disappears. This error (sstring overflow) will only be triggered if the rps (request per second) reach some big number (in my example, 10240 request per second). The reason why the debug build can't trigger the problem, I think might because debug build works too slow to trigger that overflow.

I'm now compiling the lastest code to check if it's fixed already.

nlfox commented 6 years ago

Problem also exists in master branch.

nlfox commented 6 years ago

Error log from release build

WARN 2018-01-10 17:42:38,043 [shard 0] seastar - Exceptional future ignored: std::overflow_error (sstring overflow), backtrace: 0x58a307 0x424efa 0x455eac 0x4a7ef7 0x588fe3 0x589266 0x5db79c 0x659560 0x41692f /lib/x86_64-linux-gnu/libc.so.6+0x21f44 0x41973a

WARN 2018-01-10 17:42:38,043 [shard 1] seastar - Exceptional future ignored: std::overflow_error (sstring overflow), backtrace: 0x58a307 0x424efa 0x455eac 0x4a7ef7 0x588fe3 0x589266 0x5db79c 0x5e1000 0x5856bd 0x83c574 /lib/x86_64-linux-gnu/libpthread.so.0+0x8183 /lib/x86_64-linux-gnu/libc.so.6+0xfdbec

penberg commented 6 years ago

Can you please resolve the stack stack trace? The instructions are here:

https://github.com/scylladb/scylla/wiki/How-to-resolve-backtrace

nlfox commented 6 years ago

seastar::report_failed_future(std::__exception_ptr::exception_ptr) at /home/s1679450/test/seastar/core/reactor.cc:4201

seastar::promise<>::abandoned() at /home/s1679450/test/seastar/core/future.hh:1240 _ZN7seastar7promiseIJEED4Ev at /home/s1679450/test/seastar/core/future.hh:458

(inlined by) _ZZN7seastar6futureIJEE12then_wrappedIZNS1_12finally_bodyIZZZN8memcache10tcp_server5startEvENKUlvE_clEvENUlNS_16connected_socketENS_14socket_addressEE_clES7_S8_EUlvE1_Lb1EEclEOS1_EUlT_E_S1_EET0_OSD_ENUlSG_E_D2Ev at /home/s1679450/test/seastar/core/future.hh:973 _ZN7seastar12continuationIZNS_6futureIJEE12then_wrappedIZNS2_12finally_bodyIZZZN8memcache10tcp_server5startEvENKUlvE_clEvENUlNS_16connected_socketENS_14socket_addressEE_clES8_S9_EUlvE1_Lb1EEclEOS2_EUlT_E_S2_EET0_OSE_EUlSH_E_JEED4Ev at /home/s1679450/test/seastar/core/future.hh:409

(inlined by) _ZN7seastar12continuationIZNS_6futureIJEE12then_wrappedIZNS2_12finally_bodyIZZZN8memcache10tcp_server5startEvENKUlvE_clEvENUlNS_16connected_socketENS_14socket_addressEE_clES8_S9_EUlvE1_Lb1EEclEOS2_EUlT_E_S2_EET0_OSE_EUlSH_E_JEED4Ev at /home/s1679450/test/seastar/core/future.hh:409

(inlined by) _ZN7seastar12continuationIZNS_6futureIJEE12then_wrappedIZNS2_12finally_bodyIZZZN8memcache10tcp_server5startEvENKUlvE_clEvENUlNS_16connected_socketENS_14socket_addressEE_clES8_S9_EUlvE1_Lb1EEclEOS2_EUlT_E_S2_EET0_OSE_EUlSH_E_JEE15run_and_disposeEv at /home/s1679450/test/seastar/core/future.hh:414

seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/s1679450/test/seastar/core/reactor.cc:2487

seastar::reactor::run_some_tasks(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&) at /home/s1679450/test/seastar/core/reactor.cc:2884

seastar::reactor::run_some_tasks(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&) at /home/s1679450/test/seastar/core/future.hh:940 (inlined by) seastar::reactor::run() at /home/s1679450/test/seastar/core/reactor.cc:3028

seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/s1679450/test/seastar/core/app-template.cc:180 main at /home/s1679450/test/seastar/apps/memcached/memcache.cc:1461

The output from seastar-addr2line is listed as above

nyh commented 6 years ago

On Wed, Jan 10, 2018 at 10:18 PM, Pekka Enberg notifications@github.com wrote:

Can you please resolve the stack stack trace? The instructions are here:

https://github.com/scylladb/scylla/wiki/How-to-resolve-backtrace

I just sent a patch for adding to the Seastar tutorial a section on the "ignored exceptions" and about decoding the backtraces included in the message.

Nadav.

nlfox commented 6 years ago

(gdb) bt

0 basic_sstring<char, unsigned int, 15u>::basic_sstring (this=0x7ff7ef9fac10, x=0x0,

size=105759279597654) at ./core/sstring.hh:206

1 0x000000000042a0e8 in basic_sstring (e=, b=, this=0x7ff7ef9fac10)

at ./core/sstring.hh:231

2 sstring_builder::guard::mark_end (this=0x7ff7ef9faca0, p=) at ./core/ragel.hh:85

3 0x000000000042a3ac in memcache_ascii_parser::parse(char, char, char*)::{lambda()#1}::operator()() const (closure=closure@entry=0x7ff7ef9facb0) at apps/memcached/ascii.rl:138

4 0x000000000043558a in memcache_ascii_parser::parse (this=0x603000515348,

p=p@entry=0x6030004ac84c "get 2\r\nget 6\r\nget 10\r\nget 14\r\nget 18\r\nget 22\r\nget 26\r\nget 30\r\nget 34\r\nget 38\r\nget 42\r\nget 46\r\nget 50\r\nget 54\r\nget 58\r\nget 62\r\nget 66\r\nget 70\r\nget 74\r\nget 78\r\nget 82\r\nget 86\r\nget 90\r\nget 94\r\nget 98\r\nget 102\r\nget 106\r\nget 110\r\nget 114\r\nget 118\r\nget 122\r\nget 126\r\nget 130\r\nget 134\r\nget 138\r\nget 142\r\nget 146\r\nget 150\r\nget 154\r\nget 158\r\nget 162\r\nget 166\r\nget 170\r\nget 174\r\nget 178\r\nget 182\r\nget 186\r\nget 190\r\nget 194\r\nget 198\r\nget 202\r\nget 206\r\nget 210\r\nget 214\r\nget 218\r\nget 222\r\nget 226\r\nget 230\r\nget 234\r\nget 238\r\nget 242\r\nget 246\r\nget 250\r\nget 254\r\nget 258\r\nget 262\r\nget 266\r\nget 270\r\nget 274\r\nget 278\r\nget 282\r\nget 286\r\nget 290\r\nget 294\r\nget 298\r\nget 302\r\nget 306\r\nget 310\r\nget 314\r\nget 318\r\nget 322\r\nget 326\r\nget 330\r\nget 334\r\nget 338\r\nget 342\r\nget 346\r\nget 350\r\nget 354\r\nget 358\r\nget 362\r\nget 366\r\nget 370\r\nget 374\r\nget 378\r\nget 382\r\nget 386\r\nget 390\r\nget 394\r\nget 398\r\nget 402\r\nget 406\r\nget 410\r\nget 414\r\nget 418\r\nget 422\r\nget 426\r\nget 430\r\nget 434\r\nget 438\r\nget 442\r\nget 446\r\nget 450\r\nget 454\r\nget 458\r\nget 462\r\nget 466\r\nget 470\r\nget 474\r\nget 478\r\nget 482\r\nget 486\r\nget 490\r\nget 494\r\nget 498\r\n", pe=0x6030004acc96 "", eof=0x0)
at apps/memcached/ascii.rl:59

5 0x0000000000461d1f in operator() (

buf=<error reading variable: access outside bounds of object referenced via synthetic pointer>,
this=<optimised out>) at ./core/ragel.hh:131

6 future<> input_stream::consume(memcache_ascii_parser&)::{lambda()#1}::operator()() const (closure=0x7ff7ef9fb1e0, closure@entry=0x7ff7ef9fb0f0)

at ./core/iostream-impl.hh:204

7 0x0000000000461d1f in memcache::ascii_protocol::handle (this=, in=..., out=...)

8 0x000000000041986b in apply<input_stream::consume(Consumer&) [with Consumer = memcache_ascii_parser; CharType = char]::<lambda()>&> (func=...) at ./core/future.hh:1311

9 repeat<input_stream::consume(Consumer&) [with Consumer = memcache_ascii_parser; CharType = char]::<lambda()> >(<unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x3dbbfe>) (

action=action@entry=<unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x3dbbfe>) at ./core/future-util.hh:201

10 0x00000000004ba72b in consume (consumer=..., this=0x603000515290)

at ./core/iostream-impl.hh:235

11 memcache::ascii_protocol::handle (this=0x603000515338, in=..., out=...)

at apps/memcached/memcache.cc:1039

12 0x000000000041d2b1 in operator() (__closure=0x603000502f10) at apps/memcached/memcache.cc:1359

13 do_until_continued<memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>, memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, soc---Type to continue, or q to quit---

ket_address)> mutable::<lambda()>&>(memcache::tcp_server::<lambda()>::<lambda(connected_socket, socket_address)>::<lambda()> &, <unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x2ea448>, promise<>) (stop_cond=..., action=action@entry=<unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x2ea448>, p=...) at ./core/future-util.hh:153

14 0x000000000041e340 in operator() (fut=..., __closure=)

at ./core/future-util.hh:158

15 do_void_futurize_apply<do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with AsyncAction = memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>; StopCondition = memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>&]::<lambda(std::result_of_t<memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>()>)>, future<> >(<unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x52aae0>) (

func=func@entry=<unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x52aae0>) at ./core/future.hh:1246

16 0x000000000041e474 in apply<do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with AsyncAction = memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>; StopCondition = memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>&]::<lambda(std::result_of_t<memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>()>)>, future<> > (

func=<optimised out>) at ./core/future.hh:1294

17 operator()<future_state<> > (

---Type to continue, or q to quit--- state=<unknown type in /home/s1679450/seastar/build/release/apps/memcached/memcached, CU 0x0, DIE 0x2eae01>, __closure=0x603000502ef0) at ./core/future.hh:939

18 continuation<future::then_wrapped(Func&&) [with Func = do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with AsyncAction = memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>; StopCondition = memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>&]::<lambda(std::result_of_t<memcache::tcp_server::start()::<lambda()>::<lambda(connected_socket, socket_address)> mutable::<lambda()>()>)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >::run(void) (this=0x603000502ee0)

at ./core/future.hh:390

19 0x000000000057e73e in reactor::run_tasks (this=this@entry=0x6030003c3000, tasks=...)

at core/reactor.cc:2304

20 0x00000000005d375b in reactor::run (this=0x6030003c3000) at core/reactor.cc:2762

21 0x00000000005d5091 in smp::<lambda()>::operator()(void) const (__closure=0x60000014a2c0)

at core/reactor.cc:3684

22 0x000000000057b24e in operator() (this=) at /usr/include/c++/5/functional:2267

23 dpdk_thread_adaptor (f=) at core/reactor.cc:3394

24 0x000000000074dfa5 in eal_thread_loop ()

25 0x00007ffff43ee184 in start_thread (arg=0x7ff7ef9fd700) at pthread_create.c:312

26 0x00007ffff411bbed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Here is the gdb backtrace, and we can see the long string that cause the issue

nlfox commented 6 years ago

This issue can be reproduced by parsing the packet content as we see in the stacktrace "get 2\r\nget 6\r\nget 10\r\nget 14\r\nget 18\r\nget 22\r\nget 26\r\nget 30\r\nget 34\r\nget 38\r\nget 42\r\nget 46\r\nget 50\r\nget 54\r\nget 58\r\nget 62\r\nget 66\r\nget 70\r\nget 74\r\nget 78\r\nget 82\r\nget 86\r\nget 90\r\nget 94\r\nget 98\r\nget 102\r\nget 106\r\nget 110\r\nget 114\r\nget 118\r\nget 122\r\nget 126\r\nget 130\r\nget 134\r\nget 138\r\nget 142\r\nget 146\r\nget 150\r\nget 154\r\nget 158\r\nget 162\r\nget 166\r\nget 170\r\nget 174\r\nget 178\r\nget 182\r\nget 186\r\nget 190\r\nget 194\r\nget 198\r\nget 202\r\nget 206\r\nget 210\r\nget 214\r\nget 218\r\nget 222\r\nget 226\r\nget 230\r\nget 234\r\nget 238\r\nget 242\r\nget 246\r\nget 250\r\nget 254\r\nget 258\r\nget 262\r\nget 266\r\nget 270\r\nget 274\r\nget 278\r\nget 282\r\nget 286\r\nget 290\r\nget 294\r\nget 298\r\nget 302\r\nget 306\r\nget 310\r\nget 314\r\nget 318\r\nget 322\r\nget 326\r\nget 330\r\nget 334\r\nget 338\r\nget 342\r\nget 346\r\nget 350\r\nget 354\r\nget 358\r\nget 362\r\nget 366\r\nget 370\r\nget 374\r\nget 378\r\nget 382\r\nget 386\r\nget 390\r\nget 394\r\nget 398\r\nget 402\r\nget 406\r\nget 410\r\nget 414\r\nget 418\r\nget 422\r\nget 426\r\nget 430\r\nget 434\r\nget 438\r\nget 442\r\nget 446\r\nget 450\r\nget 454\r\nget 458\r\nget 462\r\nget 466\r\nget 470\r\nget 474\r\nget 478\r\nget 482\r\nget 486\r\nget 490\r\nget 494\r\nget 498\r\n"