Closed mrschyte closed 9 months ago
The high cpu utilization only seems to be happening if the stream is played with a media client which buffers only up to a certain point. Downloading the stream using e.g. wget doesn't trigger the issue.
Adding a small amount of sleep to the on_ziti_data function cuts the CPU usage, however I'm not sure if this is the best way to resolve this.
static ssize_t on_ziti_data(ziti_connection conn, const uint8_t *data, ssize_t len) {
struct io_ctx_s *io = ziti_conn_data(conn);
ZITI_LOG(TRACE, "got %zd bytes from ziti", len);
if (io == NULL) {
ZITI_LOG(WARN, "null io. underlay connection possibly leaked. ziti_conn[%p] len[%zd]", conn, len);
ziti_close(conn, NULL);
return UV_ECONNABORTED;
}
ziti_io_context *ziti_io_ctx = io->ziti_io;
if (len > 0) {
ssize_t accepted = ziti_tunneler_write(io->tnlr_io, data, len);
// sleep 0.1s if the client is not accepting data to avoid tight loop on epoll_wait
if (accepted == 0) {
usleep(100000);
}
if (accepted < 0) {
ZITI_LOG(ERROR, "failed to write to client");
ziti_sdk_c_close(io->ziti_io);
}
return accepted;
} else if (len == ZITI_EOF) {
ZITI_LOG(DEBUG, "ziti connection sent EOF (ziti_eof=%d, tnlr_eof=%d)", ziti_io_ctx->ziti_eof, ziti_io_ctx->tnlr_eof);
ziti_io_ctx->ziti_eof = true; /* no more data will come from this connection */
if (ziti_io_ctx->tnlr_eof) /* both sides are done sending now, so close both */ {
ziti_close(conn, ziti_conn_close_cb);
} else {
// this ziti conn can still receive but it will not send any more, so
// we will not write to the client any more. send FIN to the client.
// eventually the client will send FIN and the tsdk will call ziti_sdk_c_close_write.
ziti_tunneler_close_write(io->tnlr_io);
}
} else if (len < 0) {
ZITI_LOG(DEBUG, "ziti connection is closed due to [%zd](%s)", len, ziti_errorstr(len));
ziti_close(conn, ziti_conn_close_cb);
}
return len;
}
It's a known issue. The problem is the lack of back pressure mechanism in the underlying SDK. It is being worked on and should be available in the tunneler relatively soon.
Thanks, that's great news! I think we should keep this issue open until openziti/tlsuv#171 is fixed since it wasn't obvious that this problem is already tracked.
The tlsuv issue was closed. Please reopen if reproducible with latest.
I'm trying to stream video through a Ziti service and after a couple of seconds from starting the stream, CPU usage jumps to 100% in the tunneler. I'm using ziti-edge-tunnel v0.22.7 on Linux and I'm able to reliably reproduce the issue.
Attaching strace to the tunneler process shows that most of the time is spent in epoll_wait which is repeatedly called a huge number of times from libuv. It seems that epoll_wait is called on an empty descriptor list with a 0 timeout so it returns immediately. Unfortunately my libuv knowledge is not the best, but I don't think epoll_wait should run in a hot loop without sleeping at all.
Although CPU utilization is high, everything else seems to be working fine and I don't see any errors in the tunneler logs.