lkl / linux

Linux kernel source tree
https://lkl.github.io/
Other
808 stars 136 forks source link

netperf UDP_STREAM server hangs #175

Open thehajime opened 8 years ago

thehajime commented 8 years ago

it's similar to #165 but for UDP_STREAM case for this time.

To reproduce:

$ LKL_HIJACK_NET_TAP=lkl_ptt1 LKL_HIJACK_NET_IP=192.168.20.2 LKL_HIJACK_NET_NETMASK_LEN=24 LKL_HIJACK_DEBUG=1 ./bin/lkl-hijack.sh netserver -D -f $ for i in {1..100} ; do netperf -H 192.168.20.2 -t UDP_STREAM -l1 ; done

it seems that recvfrom (lkl_syscall) blocks after no more packets from netperf client. more specifically, __skb_wait_for_more_packets() seems to be waiting to be notified.

I'm trying to spot the reason but so far no good news ; that's why I filled this issue. any inputs are really appreciated.

tavip commented 8 years ago

Can you give it a try with 7881bd305cc65378e6634ddf9da28a10378ba39e as HEAD?

thehajime commented 8 years ago

Can you give it a try with 7881bd3 as HEAD?

tried, the same result with the current HEAD of master (732c596ba4697405cd8efc7258f3852512e137ca).

liuyuan10 commented 8 years ago

I can observe the same problem. This is different from #165. The LKL is still pingable which means the virtio and lkl is still functioning.

By looking at /proc/net/snmp, I find some clue. Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors InCsumErrors IgnoredMulti Udp: 112 0 3658 0 3658 0 0 0

It receives 112 msgs and drops 3658. And it hangs in the 2nd netperf test. The first claims receiving 53 msgs. So I think the 2nd test still receives (112 -53 ) = 59 msgs.

Then there can be two possible issues:

  1. the received 59 msgs are not delivered to use space and don't unblock recvfrom properly.
  2. after receiving those msgs, netserver still calls recvfrom and hangs.

You may be able to add some logs in netserver to figure out which it is.

I guess 2 is more possible. But I don't know how netserver knows to stop calling recvfrom in UDP_STREAM