panjf2000 / gnet

🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.
https://gnet.host
Apache License 2.0
9.71k stars 1.04k forks source link

Conn.AsyncWrite() performance tracking #423

Closed panjf2000 closed 8 months ago

panjf2000 commented 2 years ago

Tracking the performance of Conn.AsyncWrite()

Need some benchmarks and scenarios to help diagnose it.

gvitali commented 2 years ago

Test example: ~100 kb TCP packet

conn.Write(), 0.1 ms

log.Println("start")
bytesWritten, err = conn.Write(tcpBytes)
log.Println("stop")

conn.AsyncWrite(), 40 ms

log.Println("start")
err = conn.AsyncWrite(tcpBytes, func(c gnet.Conn) error {
    log.Println("stop")
    return nil
})

Time difference has been calculated between "start" and "stop".

panjf2000 commented 2 years ago

Test example: ~100 kb TCP packet

conn.Write(), 0.1 ms

log.Println("start")
bytesWritten, err = conn.Write(tcpBytes)
log.Println("stop")

conn.AsyncWrite(), 40 ms

log.Println("start")
err = conn.AsyncWrite(tcpBytes, func(c gnet.Conn) error {
  log.Println("stop")
  return nil
})

Time difference has been calculated between "start" and "stop".

Thanks for the info here, however, there is little point in focusing on the latency of every single call of AsyncWrite() cuz this method is asynchronous which will be pushed into a queue and taken out by eventloop, and only then is it executed, thus, it's no surprise that AsyncWrite() has greater single latency than Write(), what we should really look into is the overall benchmark result, have you run the benchmark against your whole system? and how does the overall benchmark result look like?

symsimmy commented 10 months ago

@panjf2000 你好作者,我现在发现 asyncWrite 在做游戏网关转发游戏数据的时候会有性能问题。请问有什么办法做到线程安全的实时写回 client 吗?

// 初始化连接
func (c *serverConn) init(conn gnet.Conn) error {
    c.conn = conn
        // 这里启动一个协程,用来将游戏服务器通过下面的 Push 接口调用存到 chWrite channel 的数据,消费后推送回 用户 client
    gopool.Go(c.write)

    return nil
}

// Push 发送消息(异步),接收所有准备发给 client 的数据
func (c *serverConn) Push(msg []byte, msgType ...int) error {
    c.chWrite <- chWrite{typ: dataPacket, msg: msg}
    return nil
}

// 消费 chWrite channel,将数据返回 client
func (c *serverConn) write() {
    for {
        select {
        case write, ok := <-c.chWrite:
            if !ok {
                return
            }

            buf, err := pack(write.msg)
            if err != nil {
                log.Errorf("packet message error: %v", err)
                continue
            }

            if err = c.doWrite(buf); err != nil {
                log.Errorf("connection:[%v] uid:[%v] write message error: %v", c.ID(), c.UID(), err)
            }
        }
    }
}

// 当有 1000 个连接,将游戏服务器上其余玩家的位置信息广播给 client 时,这里数据量很好,然后通过下面的方法打印时间,会发现从调用 AsyncWrite 到真正写入完成的时间逐渐在增长。后面慢慢到几十秒了
func (c *serverConn) doWrite(buf []byte) (err error) {
    if atomic.LoadInt32(&c.state) == int32(network.ConnClosed) {
        return
    }
    start := time.Now()
    err = c.conn.AsyncWrite(buf, func(conn gnet.Conn, err error) error {
        if err != nil {
            prom.GateServerWriteErrorCounter.WithLabelValues(util.ToString(c.id), util.ToString(c.uid), err.Error()).Inc()
        }
        spend := time.Since(start).Milliseconds()
        if spend > 10 {
            log.Warnf("cid:%+v uid:%+v asyncWrite spend:%+v ms", c.id, c.uid, spend)
        }
        return nil
    })

    return err
}

连接增长到 1000 个的时候,512B 左右的数据包,每秒转发数据包大概在几十万这个量级的时候,延迟就逐渐增长

image
panjf2000 commented 10 months ago

目前 gnet 的内部是无锁的架构设计,所以目前 asyncWrite 是并发安全地写数据的唯一方法,如果这种方式确实有性能瓶颈的话,那可能这种架构确实需要重构一下支持并发实时回写数据。不过你确定是因为这个才导致延迟的吗?我看你的 QPS 挺大的了,你有没有试过用 go net 做过对比测试呢?

panjf2000 commented 10 months ago

请问有什么办法做到线程安全的实时写回 client 吗?

你可以通过 c.Fd() 获取连接的 fd,然后直接通过系统调用 syscall.Write 回写数据。这种方式的话你就必须自己管理 buffer,没写完的数据保存起来然后下次再回写的时候要先写上次遗留的数据。而且,如果你用这种方式的话就只能用这一种方法,而不能和 gnetc.Write/c.AsyncWrite API 混用。

symsimmy commented 10 months ago

@panjf2000 我的代码场景和这个 issue 是一样的 #236 ,直接用 golang 的 tcp 目前同步 write没有瓶颈,表现是 cpu 消耗更高。

我本来是想调用 c.Write,这个测试下来是没有写延迟的,但这个确实是并发不安全的,在另一个线程发消息会遇到这个 issue 的问题 操作底层 conn_map 带来的并发读写问题 #525。

someview commented 8 months ago

这个问题目前有进展吗,我们也遇到类似的情况,在连接数为2000的 qps的时候,比较慢,在连接数为2万相同qps的时候,时延蹭一下就涨上去了

IrineSistiana commented 8 months ago

遇到问题+1。

目测是因为单线程负载饱和导致写入数据的速度跟不上异步产生数据的速度。

发现改大这个。延时能降低。

https://github.com/panjf2000/gnet/blob/ab69eecfc88050f53cca1878f99319e2bd16875f/internal/netpoll/epoll_events.go#L33

有个疑问,MaxAsyncTasksAtOneTime 作用是什么?有 MaxAsyncTasksAtOneTime 限制,高负载时会不会导致 AsyncTasks 无法及时处理导致队列堆积,延时上升,最坏情况导致 oom ?

panjf2000 commented 8 months ago

MaxAsyncTasksAtOneTime is utilized to prevent the event loops from being starved by asynchronous tasks.

IrineSistiana commented 8 months ago

MaxAsyncTasksAtOneTime is utilized to prevent the event loops from being starved by asynchronous tasks.

但也可能导致 asynchronous tasks 被饿死。如果应用会调用大量 AsyncWrite()。

我们内部模拟了一下,如果把 MaxAsyncTasksAtOneTime 去掉,然后负载打满,AsyncWrite() 平均延时从 1.8s 变 200ms。

可不可以如果 task 是 AsyncWrite,则不需要限制?

panjf2000 commented 8 months ago

What's your specific scenario using gnet? More details would be helpful on this.

IrineSistiana commented 8 months ago

简化场景:每收到客户端的一个请求,服务端要调用很多(比如10) 次 AsyncWrite。

这个问题目前有进展吗,我们也遇到类似的情况,在连接数为2000的 qps的时候,比较慢,在连接数为2万相同qps的时候,时延蹭一下就涨上去了

压力测试的时候和这老哥说的差不多。"蹭"一下就涨上去了。重点怀疑有队列堆积了。所以注意到了 asyncTaskQueue。然后 MaxAsyncTasksAtOneTime 。然后怀疑

有 MaxAsyncTasksAtOneTime 限制,高负载时会不会导致 AsyncTasks 无法及时处理导致队列堆积,延时上升,最坏情况导致 oom ?

然后考虑极端情况下,到宁肯让 eventloop 饿死(会减慢请求接收速度,减慢客户端请求发送速度,减慢服务端处理请求的速度,最后减慢 AsyncWrite 产生的速度,达到自平衡,所以问题不大),也不能让 AsyncWrite 在 asyncTaskQueue 堆起来 (延时飙升最坏情况 oom)。所以把 MaxAsyncTasksAtOneTime 限制去了后又测了一下。

someview commented 8 months ago

What's your specific scenario using gnet? More details would be helpful on this.

我们是用来做群聊的网关,1次请求就会对应多个asyncWrite. 比较好奇事件循环的代码. 我之前看redis是事件循环里面,优先处理上一次遗留的任务,再处理本轮循环的任务. nodejs的事件循环也有类似的机制。

//  before本轮任务
//  本轮任务、定时器
// after本轮任务

事件循环部分的机制,有些奇怪, berfore本轮任务里面,应该要先处理写回数据的任务才对啊.

如果是这样的话,也能够实现整体的平衡。即使读写比例不一致.

IrineSistiana commented 8 months ago

比较好奇事件循环的代码. 我之前看redis是事件循环里面,优先处理上一次遗留的任务,再处理本轮循环的任务. nodejs的事件循环也有类似的机制。 事件循环部分的机制,有些奇怪, berfore本轮任务里面,应该要先处理写回数据的任务才对啊.

我们讨论的内容和猜测和这老哥写的一模一样 (lll¬ω¬)...都觉得是 MaxAsyncTasksAtOneTime 的问题。

panjf2000 commented 8 months ago

To clarify, asyncTaskQueue is not used just for leftover tasks because they can be appended with new tasks constantly while event loops are processing incoming I/O events. Furthermore, the event of asynchronous write is not the only kind of event that can be added to asyncTaskQueue, there are other kinds of events such as "connection close". Therefore, it's not some mechanism of storing and processing legacies like the equivalents in redis or node.js.

I didn't expect the scenarios where there would be extremely multiple asynchronous writes per one read when I added AsyncWrite/AsyncWritev for gnet.Conn and decided to put these tasks of asynchronous write in asyncTaskQueue.

But I think it's worth pondering if it will make sense to put the events of asynchronous write in urgentAsyncTaskQueue instead of asyncTaskQueue, which should mitigate your latency issue here. However, I'm a little concerned that this change might have an adverse impact on some existing scenarios.

IrineSistiana commented 8 months ago

我们比较奇怪 asyncTaskQueue 的作用。

如果我代码理解没错,eventloop 是没有办法保证 asyncTaskQueue 不堆积的。asyncTaskQueue 是无长度限制 list 。 最坏情况 oom,只是理论触发很难,因为确实 eventloop 一直在消耗 task,实际只能看到是队列堆积延时飙升。

asyncTaskQueue 的 task 无论是 AsyncWrite 还是其他事件,即使不紧急,但都是不能随便丢弃的,所以 task 迟早都要执行,而且是在 eventloop 线程里执行,~那么 asyncTaskQueue 设定 MaxAsyncTasksAtOneTime 故意推迟 task 执行的意义是什么?~ 防止被无限的 asyncTask 饿死

panjf2000 commented 8 months ago

I think I've made myself clear about delaying tasks asyncTaskQueue: to avoid starving event loops for low-priority tasks in the queue. Again, asyncTaskQueue is not for high-priority tasks and

I didn't expect the scenarios where there would be extremely multiple asynchronous writes per one read when I added AsyncWrite/AsyncWritev for gnet.Conn and decided to put these tasks of asynchronous write in asyncTaskQueue.

panjf2000 commented 8 months ago

If you were trying to write multiple responses on one request, would using AsnyncWritev instead of AsyncWrite solve your latency problem here?

IrineSistiana commented 8 months ago

不好意思。终于看懂了。asyncTaskQueue 另一种极端情况(一边一直无限制 Enqueue),然后另一边 eventloop 一直 Dequeue 直到空队列,确实会把 eventloop 饿死。

也许最简单最保险的方法是把 MaxAsyncTasksAtOneTime 变成可配置选项,让用户选择?

panjf2000 commented 8 months ago

也许最简单最保险的方法是把 MaxAsyncTasksAtOneTime 变成可配置选项,让用户选择?

This was an alternative in my mind. But adding a new configuration that requires users to comprehend the rationale behind it, I'm afraid this might make things more complicated than it already is. Maybe the simple and better approach is to put the events of asynchronous write in urgentAsyncTaskQueue.

panjf2000 commented 8 months ago

@IrineSistiana @someview Could you try out #563 and see if it reduces the latencies for your use cases? Thanks!

IrineSistiana commented 8 months ago

遇到问题+1。

目测是因为单线程负载饱和导致写入数据的速度跟不上异步产生数据的速度。

发现改大这个。延时能降低。

https://github.com/panjf2000/gnet/blob/ab69eecfc88050f53cca1878f99319e2bd16875f/internal/netpoll/epoll_events.go#L33

我们内部模拟了一下,如果把 MaxAsyncTasksAtOneTime 去掉,然后负载打满,AsyncWrite() 平均延时从 1.8s 变 200ms。

I believe setting MaxAsyncTasksAtOneTime=999999 has same effect as #563. Pressure test looks good. But this is a preliminary test and has no practical value.

We have not encountered this problem in our actual environment. Maybe it's because of the small load.

panjf2000 commented 8 months ago

New thought: maybe introduce a new option AsyncWriteNoDelay and only if it's set to true, we put all asynchronous writes to urgentAsyncTaskQueue:

type Options struct {
    ...

    // AsyncWriteNoDelay indicates whether to ignore the MaxAsyncTasksAtOneTime limitation and write the data
    // to the peer immediately.
    AsyncWriteNoDelay bool
}

func WithAsyncWriteNoDelay(noDelay bool) Option {
    return func(opts *Options) {
        opts.AsyncWriteNoDelay = noDelay
    }
}
IrineSistiana commented 8 months ago

Thought: i think when taking about TCP write, all golang users will assume that operation will not have delay. (E.g. go/net set TCPNoDelay by default). And when gnet users call asyncwrite, they believe it has no-delay as well. Provide a no-delay option,which indicates the default has delay, is a little bit confusing.

I don't think using urgentAsyncTaskQueue will cause starving issues unless someone is trying to send tons of data using asyncwrite(), while the thread of eventloop is at full load. This scenario is rare. Why not using go/net.

So I prefer putting asyncwrite in urgentAsyncTaskQueue by default.

panjf2000 commented 8 months ago

At this point, I'm more worried about the precedence of events than I am about the starving issue. You see, there are some events like Wake that precede AsyncWrite currently, and some existing programs might have relied on that event sequence, meaning that we may break something somewhere if we change the default behaviors.

IrineSistiana commented 8 months ago

My understanding, (from code and our pressure tests, not from production env though)

The latency issue, that we are facing, is because when the load is high (but not reaching the limit), the queue may be piled up in a very short period and was not dequeued in many eventloop cycles because of the MaxAsyncTasksAtOneTime limit. It is not because the write io was slow.

Although Wake and AsyncWirte have different priorities. Under normal circumstances (not reaching cpu limit, the asyncTaskQueue can be cleared in one cycle, so, no latency issues), the latency of Wake should (not tested, just guess) just a little less (<1ms, eventloop cycles is fast) than that of Asyncwrite. Unless the eventloop is at the cpu limit and the events are quened up. When the eventloop is starving, you are already in big trouble. This may happen in tests, but can't happen in production env. The only solution may be switching to multi-thread. So, I think we do not need to worry about the starving issue.

Also, there is no mechanism to guarantee the execution order of these two calls. The user can not expect that a Wake call will always be invoked before AsyncWirte .

Piling tasks into an unlimited queue without enqueue restriction and dequeue guarantee, is so risky. So, I don't think we should worry about priority either and should use https://github.com/panjf2000/gnet/issues/423#issuecomment-2026922742 (aka, a single urgentAsyncTaskQueue) to avoid queue pile-up and for simplicity.

panjf2000 commented 8 months ago

OOM caused by the unlimited queue should be the last thing we care about. It's like any memory allocator in any language or kernel, they don't account for the OOM of users' programs. If you need more memory, just throw more physical memory on it or get more machines and split your services. The same rule applies to any networking frameworks or libraries: if your network traffic is already so overloaded that all memory has been taken up by your service, then you should "distributed-system" your service and do some load-balancing to mitigate the problem of memory consumption.

While I find your viewpoints of event sequence make sense, I'm not with you on the suggestion of the single tasks queue. We have only a few kinds of event for tasks queue at the moment, but there might be more coming along in the future, this wouldn't be necessarily true, but it could and I'd like to keep that possibility. As a result of which, we need hierarchical(priority) queuing in the case of that complicated scenario. Thus, I have a more pragmatic solution for this: retain the current two queues, but enqueue all tasks into urgentAsyncTaskQueue while setting up a threshold and degrading all incoming tasks other than AsyncWrite/AsyncWritev to the low-priority asyncTaskQueue after that threshold is reached. In that case, we get to drain out all high-priority tasks and prevent other I/O events from being delayed by low-priority tasks.

someview commented 8 months ago

@IrineSistiana @someview Could you try out #563 and see if it reduces the latencies for your use cases? Thanks!

和之前的场景测试结果基本没有差别,感觉本质上还是多个连接排队写数据导致的。如果连接数多一些,比如万级别,asyncwritev时延就上去了(payload 500byte),在200ms这个级别。如果多个连接可以并发写,没有这个时延问题,我们采用netpoll库来测试过.