uNetworking / uSockets

Miniscule cross-platform eventing, networking & crypto for async applications
Apache License 2.0
1.29k stars 267 forks source link

Extremely low performance and double free #194

Closed Zabrane closed 1 year ago

Zabrane commented 1 year ago

While hammering the echo_server.c (i deleted this line + set SSL=0), i got this error:

$ sw_vers                                                                                                                                                                                                                                                     
ProductName:        macOS
ProductVersion:     13.1
BuildVersion:       22C65

$ clang --version                                                                                                                                                                                                                                             
Apple clang version 14.0.0 (clang-1400.0.29.202)

$ git clone https://github.com/uNetworking/uSockets.git
$ cd uSockets
$ make; make examples
$ ./echo_server
Listening on port 3000...
Client connected
echo_server(62921,0x7ff8503538c0) malloc: double free for ptr 0x7ff1e0008000
echo_server(62921,0x7ff8503538c0) malloc: *** set a breakpoint in malloc_error_break to debug
fish: Job 1, './echo_server' terminated by signal SIGABRT (Abort)

The performances are also poor:

Destination: [127.0.0.1]:3000
Total data sent:     10.2 MiB
Total data received: 8.4 MiB
Bandwidth per channel: 31.261⇅ Mbps
Test duration: 5.00187 s.

Is there a way to increase the receive (resp. send) buffer?

We are trying to replace an old proprietary TCP sever with uSockets. Here's what we can get already:

Destination: [127.0.0.1]:3000
Total data sent:     19247.0 MiB 
Total data received: 19245.2 MiB
Bandwidth per channel: 32256.206⇅ Mbps
Test duration: 5.00517 s.

This is 3x order of magnitude faster than uSockets.

More or less same result under Linux (Ubuntu-20.04 LTS):

$ uname -a
Linux 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 10.4.0-1ubuntu1~20.04) 10.4.0

$ ./echo_server
Listening on port 3000...
Client connected
double free or corruption (top)
Aborted (core dumped)

Test result:

Destination: [127.0.0.1]:3000
Total data sent:     24.3 MiB 
Total data received: 18.1 MiB
Bandwidth per channel: 71.169⇅ Mbps
Test duration: 5.00203 s.
uNetworkingAB commented 1 year ago

That example is doing malloc, memcpy and free every time it streams a chunk to kernel, so I wouldn't use it for benchmarking, esp. not with large data. And if you see double frees then it's pretty broken.

uNetworkingAB commented 1 year ago

You probably want a pre-allocated ring buffer to add/remove to/from if you benchmark

Zabrane commented 1 year ago

@uNetworkingAB could you please help us to adapt the echo TCP example to use a pre-allocated ringbuffer? Any test code will be more than welcome. I really want to get rid of this proprietary server.

Zabrane commented 1 year ago

That example is doing malloc, memcpy and free every time it streams a chunk to kernel, so I wouldn't use it for benchmarking, esp. not with large data. And if you see double frees then it's pretty broken.

The benchmark consists of sending and receiving a single character as fast as possible. Thus, no large data is involved. Just a single char.

uNetworkingAB commented 1 year ago

void bsd_socket_nodelay(LIBUS_SOCKET_DESCRIPTOR fd, int enabled) { setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (void *) &enabled, sizeof(enabled)); }

bsd_socket_nodelay(us_poll_fd((struct us_poll_t *)s), 0);

You probably want to run this in on_open to disable TCP_NODELAY - pretty sure your proprietary variant has TCP_NODELAY=false, we have it true by default

Zabrane commented 1 year ago

@uNetworkingAB didn't change anything even with static buffer (no malloc involved, 1 big global alloc) and TCP_NODELAY set to false per your recommendation.

Linux:

Destination: [127.0.0.1]:3000
Total data sent:     11.7 MiB
Total data received: 6.0 MiB
Bandwidth per channel: 29.729⇅ Mbps
Test duration: 5.00832 s.
Zabrane commented 1 year ago

@uNetworkingAB Hi. How can i set the socket's buffers size when using uSockets? To get this:

setsockopt(s, SOL_SOCKET, SO_RCVBUF, &sz, sizeof(sz));
setsockopt(s, SOL_SOCKET, SO_SNDBUF, &sz, sizeof(sz));

Does uSockets set the socket to non-blocking?

fcntl(s, F_SETFL, O_NONBLOCK, s)
Zabrane commented 1 year ago

@uNetworkingAB I've noticed that you perform a socket write in the on_echo_socket_data callback. Why? If i don't write during reads, the on_echo_socket_writable callback's never called.

struct us_socket_t *on_echo_socket_data(struct us_socket_t *s, char *data, int length) {
    struct echo_socket *es = (struct echo_socket *) us_socket_ext(SSL, s);
    /*  don't write, just buffer up the number of 'x' to send back  */
    es->length += length;
    return s;
}

Could you please shed some light on the underlying uSockets design?

uNetworkingAB commented 1 year ago

If you want to pay for consulting time you can send me such an email and we can set it up. It's becoming obvious that you're benchmarking apples vs. carrots here, as the echo_server doesn't do what your alternative does. With some rough math you can infer that you must be doing 3.8 billion messages (chars) per second, which is 0.1 nanosecond a pop, which is not the case. So it's an apple vs. carrot comparison