libos-nuse / net-next-nuse

Network Stack in Userspace
Other
289 stars 67 forks source link

simple nuse crashes with wait_queue #45

Open thehajime opened 9 years ago

thehajime commented 9 years ago

TSIA.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ping 192.168.49.58'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __list_del (next=0x7f759c713e1a <lib_sock_poll+90>, prev=0x0) at ./include/linux/list.h:90
90              prev->next = next;
(gdb) bt
#0  __list_del (next=0x7f759c713e1a <lib_sock_poll+90>, prev=0x0) at ./include/linux/list.h:90
#1  __list_del_entry (entry=0x7fff7c482948) at ./include/linux/list.h:102
#2  list_del_init (entry=0x7fff7c482948) at ./include/linux/list.h:145
#3  autoremove_wake_function (wait=0x7fff7c482930, mode=<optimized out>, sync=<optimized out>, key=<optimized out>) at arch/lib/sched.c:173
#4  0x00007f759c714bc6 in __wake_up (q=0x1645318, mode=mode@entry=1, nr_exclusive=nr_exclusive@entry=1, key=key@entry=0xc3) at arch/lib/sched.c:242
#5  0x00007f759c714c05 in __wake_up_sync_key (q=<optimized out>, mode=mode@entry=1, nr_exclusive=nr_exclusive@entry=1, key=key@entry=0xc3) at arch/lib/sched.c:251
#6  0x00007f759c75a2a9 in sock_def_readable (sk=0x16474f8) at net/core/sock.c:2235
#7  0x00007f759c7590ed in sock_queue_rcv_skb (sk=sk@entry=0x16474f8, skb=skb@entry=0x7f75800011b8) at net/core/sock.c:474
#8  0x00007f759c7e863c in raw_rcv_skb (sk=sk@entry=0x16474f8, skb=skb@entry=0x7f75800011b8) at net/ipv4/raw.c:315
#9  0x00007f759c7e9924 in raw_rcv (sk=sk@entry=0x16474f8, skb=0x7f75800011b8) at net/ipv4/raw.c:334
#10 0x00007f759c7e9b11 in raw_v4_input (skb=skb@entry=0x7f758c0008c8, iph=0x7f758c0009e6, hash=hash@entry=1) at net/ipv4/raw.c:194
#11 0x00007f759c7e9b6c in raw_local_deliver (skb=skb@entry=0x7f758c0008c8, protocol=protocol@entry=1) at net/ipv4/raw.c:216
#12 0x00007f759c7bea99 in ip_local_deliver_finish (sk=sk@entry=0x0, skb=skb@entry=0x7f758c0008c8) at net/ipv4/ip_input.c:203
#13 0x00007f759c7bf092 in NF_HOOK_THRESH (thresh=-2147483648, okfn=0x7f759c7be9a0 <ip_local_deliver_finish>, out=0x0, in=<optimized out>, skb=0x7f758c0008c8, sk=0x0, hook=1, pf=2 '\002')
    at ./include/linux/netfilter.h:220
#14 NF_HOOK (okfn=0x7f759c7be9a0 <ip_local_deliver_finish>, out=0x0, in=<optimized out>, skb=0x7f758c0008c8, sk=0x0, hook=1, pf=2 '\002') at ./include/linux/netfilter.h:242
#15 ip_local_deliver (skb=0x7f758c0008c8) at net/ipv4/ip_input.c:256
#16 0x00007f759c7bf2f3 in NF_HOOK_THRESH (thresh=-2147483648, okfn=0x7f759c7bebd0 <ip_rcv_finish>, out=0x0, in=0x163f200, skb=0x7f758c0008c8, sk=0x0, hook=0, pf=2 '\002')
    at ./include/linux/netfilter.h:220
#17 NF_HOOK (okfn=0x7f759c7bebd0 <ip_rcv_finish>, out=0x0, in=0x163f200, skb=0x7f758c0008c8, sk=0x0, hook=0, pf=2 '\002') at ./include/linux/netfilter.h:242
#18 ip_rcv (skb=<optimized out>, dev=0x163f200, pt=<optimized out>, orig_dev=<optimized out>) at net/ipv4/ip_input.c:455
#19 0x00007f759c76d203 in __netif_receive_skb_core (skb=0x7f758c0008c8, pfmemalloc=<optimized out>) at net/core/dev.c:3895
#20 0x00007f759c76da74 in process_backlog (napi=0x7f759cc7be70 <softnet_data+112>, quota=64) at net/core/dev.c:4506
#21 0x00007f759c76d8be in napi_poll (n=0x7f759cc7be70 <softnet_data+112>, repoll=repoll@entry=0x7f759b63ce20) at net/core/dev.c:4744
#22 0x00007f759c76db98 in net_rx_action (h=<optimized out>) at net/core/dev.c:4809
#23 0x00007f759c713fe3 in do_softirq () at arch/lib/softirq.c:69
#24 0x00007f759c714048 in softirq_task_function (context=<optimized out>) at arch/lib/softirq.c:28
#25 0x00007f759c45624d in nuse_task_start_trampoline (context=0x15a6250) at nuse.c:175
#26 0x00007f759b84e182 in start_thread (arg=0x7f759b63d700) at pthread_create.c:312
#27 0x00007f759bf7d47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
# ./nuse ping 192.168.49.254

<5>Linux version 4.1.0-rc7+ (root@ubuntu14) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #0 Wed Jun 17 12:56:11 PDT 2015
thehajime commented 9 years ago

this is a tentative patch to avoid this issue, but it's not a generic solution so need to work more.

https://gist.github.com/thehajime/65e58a101f0c50a04764

pscollins commented 8 years ago

FWIW I don't know if you dug too far into this, but this appears to be caused by race conditions related to the work queue lists. Have you made any progress?

thehajime commented 8 years ago

if I confirmed that this issue is fixed in LKL (https://github.com/lkl/linux), I will close this issue.

pscollins commented 8 years ago

I poked at LKL a tiny bit on Friday and saw some behavior that suggested there are concurrency-related issues there, but I can't say anything solid yet.

FWIW I think you should leave this open as an issue on LibOS unless you are merging LibOS into LKL --- it is important for people who are interested in forking/using LibOS to have an accurate idea of the state of this project on its own so long as it is a standalone project.

thehajime commented 8 years ago

@pscollins agree to keep it opened.

dd76 commented 8 years ago

Hi Hajime, to have userspace tcp/ip stack functionality should we use libOS or LKL. This is to have applications like httpserver integrated with userspace tcp/ip stack library. Plz comment.

thehajime commented 8 years ago

LKL. it has been so much improved since then.