Closed ChrisWint closed 6 years ago
Could you also try to use larger backlog numbers passed to mtcp_listen()
, possibly like 4096
? Currently it looks like the accept queue is full, which stores tcp streams that are established, but not yet accepted by the application. mTCP might need larger backlogs since it processes packets in a batch to minimize I/O and context switching overheads.
Hi, thanks for the quick reply
sadly that does not solve the problem, I was already running it with a backlog of 16384. The only thing that improves the situation a bit is reducing TCP timeout, it seems as if the connections aren't closed properly. But I can't find anything to cause that in my code. I included the relvant parts below. I furthermore reduced the packet processing to a minimal xor
to ensure the server is not taking too long to handle each packet and thereby causing the problem.
Is there any way to check/log what happens to the packets in the backlog?
Code: I removed not relevant error handling after each operation (socket check < 0, etc.) as these are not triggered when running the code.
Server code
int listener_socket_descriptor = 0, connection_socket_descriptor = 0;
char recv_buff[buffsize];
unsigned max_fds = 10000 * 3;
int core_limit = 1;
struct mtcp_conf mcfg;
mtcp_getconf(&mcfg);
mcfg.num_cores = core_limit;
mtcp_setconf(&mcfg);
int ret = mtcp_init("mtcp_server.conf");
mtcp_getconf(&mcfg);
mcfg.max_concurrency = max_fds;
mcfg.max_num_buffers = max_fds;
mtcp_setconf(&mcfg);
unsigned core = 0;
mtcp_core_affinitize(core);
mctx_t mctx = NULL;
mctx = mtcp_create_context(core);
listener_socket_descriptor = mtcp_socket(mctx, AF_INET, SOCK_STREAM, 0);
mtcp_setsockopt(mctx, listener_socket_descriptor, SOL_SOCKET, SO_REUSEADDR, new int(1), sizeof(int));
mtcp_bind(mctx, listener_socket_descriptor, reinterpret_cast<struct sockaddr*>(&server_address), sizeof(server_address));
mtcp_listen(mctx, listener_socket_descriptor, 16384);
// listen to connecting clients
while(_is_running) {
connection_socket_descriptor = mtcp_accept(mctx, listener_socket_descriptor, (struct sockaddr*)NULL, NULL);
if (_is_running) {
mtcp_read(mctx, connection_socket_descriptor, recv_buff, sizeof(recv_buff)-1);
recv_buff[0] = recv_buff[0] ^ 1; //Symbolic for result processing
}
mtcp_close(mctx, connection_socket_descriptor);
}
Client Code
int socket_descriptor = 0;
socket_desc = socket(AF_INET, SOCK_STREAM, 0)) < 0);
connect(socket_descriptor, (struct sockaddr *)&socket_address, sizeof(socket_address)));
write(socket_desc, _send_buf, sizeof(_send_buf));
close(socket_desc);
Hi,
Do you also have event-driven implementation of the application for mTCP? mTCP currently does not support blocking calls for its API. (except for mtcp_epoll_wait()
) The blocking features were implemented at first, but are not maintained any more. It seems the app could accept flows at first, but fell a sleep forever after some point when there was nothing to accept temporarily. We should have warned users about this.
The sockets for mTCP should be used event-driven and nonblocking way. You may find some examples in our Wiki and sample applications.
Ok, thanks. That explains the error in the non-epoll implementation, but does not explain why epoll is failing under a huge load. The problem with epoll seems to be of a different nature, here the client fails to open new connections as the server is not closing the connections properly (connections in TIME_WAIT until timeout), neither when closed server side nor client side, nor both. This eventually blocks all free ports, thus no new connections can be opened. I ported the same code to epoll with standard tcp, there the problem is not showing, you can find the mtcp epoll code below. This leads me to believe that this has to be somewhat an issue of mtcp, as the same tcp implementation works fine.
Epoll server, client identical to above
listener_socket_descriptor = mtcp_socket(mctx, AF_INET, SOCK_STREAM, 0);
mtcp_setsockopt(mctx, listener_socket_descriptor, SOL_SOCKET, SO_REUSEADDR, new int(1), sizeof(int));
mtcp_bind(mctx, listener_socket_descriptor, reinterpret_cast<struct sockaddr*>(&server_address), sizeof(server_address));
mtcp_listen(mctx, listener_socket_descriptor, MAX_EVENTS);
struct mtcp_epoll_event ev, events[MAX_EVENTS];
int epollfd = mtcp_epoll_create(mctx, MAX_EVENTS);
ev.events = EPOLLIN;
ev.data.sockid = listener_socket_descriptor;
mtcp_epoll_ctl(mctx, epollfd, EPOLL_CTL_ADD, listener_socket_descriptor, &ev);
int n_fds;
while(_is_running) {
n_fds = mtcp_epoll_wait(mctx, epollfd, events, MAX_EVENTS, -1);
for( int curr_event = 0; curr_event < n_fds; ++curr_event) {
if( events[curr_event].data.sockid == listener_socket_descriptor ){
int currentClientFd = mtcp_accept(mctx, listener_socket_descriptor, NULL, NULL);
if(currentClientFd < 0){
LOG_ERROR("ERROR ON ACCEPT");
cleanupAndExit(mctx, listener_socket_descriptor);
}
mtcp_setsock_nonblock(mctx, currentClientFd);
ev.events = EPOLLIN;
ev.data.sockid = currentClientFd;
mtcp_epoll_ctl(mctx, epollfd, EPOLL_CTL_ADD, currentClientFd, &ev);
} else {
int r = mtcp_read(mctx, events[curr_event].data.sockid, recv_buff, sizeof(recv_buff) - 1);
if(r == 0){
mtcp_close(mctx, events[curr_event].data.sockid);
}
else {
//Tried closing here as well, did not make a difference
//mtcp_close(mctx, events[curr_event].data.sockid);
recv_buff[0] = recv_buff[0] ^ 142; //symbolic result processing
}
}
}
}
Then, do you see new connections being accepted after the timeout set in the configuration? I mean the tcp_timeout
option. I think the server should be able to accept connections even though there are still some connections that aren't closed properly. If the number of concurrent connections does not reach max_concurrency
, it should still work. (mTCP will show errors if the concurrency reaches the maximum.)
Regarding your application, could you also try to make the listening socket nonblocking?
listener_socket_descriptor = mtcp_socket(mctx, AF_INET, SOCK_STREAM, 0);
mtcp_setsock_nonblock(mctx, listener_socket_descriptor);
...
Thanks
Hey, the problems that arise from this are not really serverside issue, the server per se is working fine it is simply not closing connections properly. The problem with the open connections is that they are also still open on the client, thus blocking ports there. If i want to open 65000 connections in a minute (somewhat reasonable in sensor data reporting) the client blocks because he can't find a free port to bind to. But as this happens only with mTCP server implementation, not with TCP, I figured there has to be a problem in mTCP connection handling, more specific in the connection teardown. The connection sockets are closed on both sides, thus there is no reason for mTCP to leave the connection open.
I will however try setting the socket nonblocking, but I can't see how this would help close (send the ack/fin messages serverside) the connection.
Okay I debuged the packages in detail and it seems to be no fault in mTCP, my bad. In the TCP-TCP connection the connections gets shut down with the RST from the server, e.g.
Client Server
FIN,ACK ->
<- RST,ACK
while mTCP uses the proper teardown. I will rewrite my application to reuse connections to avoid this problem.
Thank you so much for your help!
You're welcome. We also had a function, mtcp_abort()
, immediately closing the connection with RST
for benchmarking purposes, but it isn't maintained any more. If you are interested, please look at mtcp/src/api.c
and mtcp/src/include/mtcp_api.h
. I cannot guarantee that it will work, but the implementation still exists in api.c
. You can simply write the definition for mtcp_abort
on mtcp_api.h
to bring it back.
I'll close this issue since it has been resolved. Please feel free to open this again or create a new issue if you have further questions. Thanks.
Hi,
while trying to integrate my application with mTCP I ran into issues when sending many small packages from a TCP client. Ater a certain load is reached the connection on the client is lost with
Connection reset by peer
and the server logsNow this is far below any throughput where I would expect this to happen (200 packets/s without epoll, 5000 packets/s with epoll). The application run with a default TCP connection is able to handle at least 40000 packets/s. Furthermore running on a lower throughput for a longer time eventually runs into the same errors.
After these errors first occured even 1 packet/s connections will reproduce the error messages for some packets, it seems like the queue is not dequeued/cleared properly.
If it is relevant, here is my mtcp config