mtcp-stack / mtcp

mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems
Other
1.97k stars 434 forks source link

Problem of mtcp_accept (return -1 frequently) #214

Open zhilongzheng opened 5 years ago

zhilongzheng commented 5 years ago

Hi mTCP team,

I am currently using mtcp as the stack for some applications.

The basic workflow works well, and my applications can be accelerated by mtcp.

BUT, I struggled for tuning higher performance and better scalability over stock Linux kernel stack.

  1. I found that when accepting a new connection, mtcp_accept always returns two fds -- One is the right value (a positive one), but it ALWAYS followed by a -1 value.

This phenomenon has appeared in two of my applications. AND, I debugged the output information in the tutorial app (example/epserver.c), which also has -1 return.

  1. Another confusion is that fd is not recycled, which makes it difficult to reproduce the evaluation result in Figure 7(a) in your NSDI'14 paper. --> Application crashed when fd increase to a relatively large value (e.g., 30K). I have inspected mtcp's source code, which seems the fd COULD be recycled.

The max_concurrency value in config file and MAX_EVENTS value have been adjusted, but it seems not working.

My testbed is a server with two Intel E5-v2690 (12 cores) CPU and an Intel X710 NIC as the mtcp application server, and 8 servers as the workload generator.

Hope for your reply! Thanks!

ajamshed commented 5 years ago

@zhilongzheng ,

The overall performance of a networked application can be effected due to a number of reasons. For example, you need to make sure whether the current bottleneck in your application is the networking stack (and not the application workload itself).

1- All mTCP sockets are non-blocking. So it is not surprising that mtcp_accept() is returning -1 immediately after you accept a connection. This may indicate that the listening queue at that moment is empty. You can check whether this is the case by reading the errno. It should be EAGAIN.

2- I am not really sure what (exactly) is the problem you are facing. But as far as I can see fd can be recycled immediately if you keep the timewait value to 0 (in mtcp.conf file). In this setup fds can be reused almost immediately and it may confuse the operations of a (soon to be) closing/(already) closed connection at such high concurrency levels.

zhilongzheng commented 5 years ago

Thank you for your comments!

I have done more extensive testing, which concluded "P1" seems not a problem for the performance.

And I found fds can be recycled after I changed timewait to 0 according to your suggestions.