panjf2000 / gnet

🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.
https://gnet.host
Apache License 2.0
9.69k stars 1.04k forks source link

[Bug]: the buff of eventloop.cache used cannot be released #659

Closed szza closed 1 week ago

szza commented 1 week ago

Actions I've taken before I'm here

What happened?

the buff of eventloop.cache always occupied

1

Major version of gnet

v2

Specific version of gnet

github.com/panjf2000/gnet/v2 v2.5.2

Operating system

Linux

OS version

Linux 6.2.0-39-generic x86_64

Go version

1.21.6

Relevant log output

1

Code snippets (optional)

1

How to Reproduce

the buff of eventloop.cache always occupied after no data communication

Does this issue reproduce with the latest release?

It can reproduce with the latest release

panjf2000 commented 1 week ago

The eventloop.cache is supposed to retain for the life of your gnet application, which is how this was designed.

Did it lead to any issues?

szza commented 1 week ago

The eventloop.cache is supposed to retain for the life of your gnet application, which is how this was designed.

Did it lead to any issues? the issue is: When the data traffic is very low, the memory previously occupied by eventloop.cache cannot be released and continues to occupy a large amount of memory.

panjf2000 commented 1 week ago

There is one buffer for each event loop and the number of the event loops is typically equal to the number of CPU. Besides, the buffer size is not expected to be large because it only caches the bytes every time you read data from the connection.

When you said "a large amount of memory", how many bytes were you referring to? Any statistics you can show me?

szza commented 1 week ago

There is one buffer for each event loop and the number of the event loops is typically equal to the number of CPU. Besides, the buffer size is not expected to be large because it only caches the bytes every time you read data from the connection.

When you said "a large amount of memory", how many bytes were you referring to? Any statistics you can show me?

71 % after data traffic decrease

3
panjf2000 commented 1 week ago

This is uncanny, normally eventloop.cache shouldn't grow this big. Could you show me the code you write in OnTraffic?

szza commented 1 week ago

reduce the code:

func (cm *connectionManager) OnTraffic(c gnet.Conn) (action gnet.Action) {
    for c.InboundBuffered() > 0 {
        // 1. decode
        if h, msg, err := ZeroCopyDecode(c); err != nil {
            if err == io.ErrShortBuffer {
                return
            }
            return gnet.Close
        }

        //... handle the msg
    }

    return gnet.None
}

// used for gent receive message
func ZeroCopyDecode(c gnet.Reader) (*Header, pb.Message, error) {
    total, err := c.Peek(MaxIntSize)
    if err != nil {
        return nil, nil, err
    }
    lenSize, length := DecodeInt(total)
    if lenSize > length {
        return nil, nil, proto.ErrorInvalidPacket
    }
    buff, err := c.Peek(length)
    if err != nil {
        return nil, nil, err
    }

    defer c.Discard(length)
    return DocodeBody(buff[lenSize:])
}

func DocodeBody(buff []byte) (h *Header, msg pb.Message, err error) {
    // 1. decode header
    h = FetchHeader(0)
    h.ReadFromBytes(buff)

    // 2. deocde body length
    bodySize, bodyLen := DecodeInt(buff[h.Size():])

    // 3. decode body
    msg = proto.NewMessage(h.Uri)
    if msg == nil || bodyLen == 0 {
        return
    }

    err = pb.Unmarshal(buff[h.Size()+bodySize:h.Size()+bodySize+bodyLen], msg)
    return
}
szza commented 1 week ago

This is uncanny, normally eventloop.cache shouldn't grow this big. Could you show me the code you write in OnTraffic? The code has been pasted above.

panjf2000 commented 1 week ago

total, err := c.Peek(MaxIntSize)

What's this MaxIntSize? Is it the maximum value of int type? If so, it should be the first thing you need to fix because that's the root cause of this issue. You should peek one packet at a time, otherwise the eventloop.cache will be bloated if you peek a large size.

szza commented 1 week ago

total, err := c.Peek(MaxIntSize)

What's this MaxIntSize? Is it the maximum value of int type? If so, it should be the first thing you need to fix because that's the root cause of this issue. You should peek one packet at a time, otherwise the eventloop.cache will be bloated if you peek a large size.

const ( MaxIntSize = 8 ).

In addition, the mem growed location is 'buff, err := c.Peek(length)'

szza commented 1 week ago

total, err := c.Peek(MaxIntSize)

What's this MaxIntSize? Is it the maximum value of int type? If so, it should be the first thing you need to fix because that's the root cause of this issue. You should peek one packet at a time, otherwise the eventloop.cache will be bloated if you peek a large size.

actually, we decode the one packet every time as the code pasted above:

  1. ZeroCopyDecode a packet to get a msg, and then
  2. handle the msg

dont return the OnTraffic func util meet the error io.ErrShortBuffer or read buff empty

panjf2000 commented 1 week ago

How big is a packet? Also, what's the version of gnet you're using? You didn't fill out the issue template right.

szza commented 1 week ago

How big is a packet? Also, what's the version of gnet you're using? You didn't fill out the issue template right.

Sorry,gent version is 'v2.5.2', mistakenly viewed the gnet version as the linux version.

The average size of packets should generally not exceed 32k; under certain exceptional circumstances, it may double, may be grow to a few megabytes.

panjf2000 commented 1 week ago

If you keep reading data until io.ErrShortBuffer in OnTracffic, the eventloop.cache shouldn't wind up to be bigger than your maximum packet size. Are you absolutely sure there is no some particularly large packet coming in?

panjf2000 commented 1 week ago

Like over one hundred megabytes?

szza commented 1 week ago

Like over one hundred megabytes?

impossible to exceed 10 M

szza commented 1 week ago

I think using bsPool to fetch memory and release it in Discard func might be a solution.

panjf2000 commented 1 week ago

I think using bsPool to fetch memory and release it in Discard func might be a solution.

This could work for Conn.Peek(), but not for Conn.Next().

I'll run some tests with big packets and see if it can reproduce your issue. Next move will depend on the test result.

panjf2000 commented 1 week ago

Updated

Never mind, I've tracked down the root cause and fixed it.


660 should fix this issue.

However, I discovered another uncanny "issue" when I did pprof on gnet transmitting 10MB packets, idle connections would have inuse memory of the elastic.RingBuffer, which should have been released: gnet-conn-mem-pprof Did you see this when you ran pprof with your gnet server? @szza

szza commented 6 days ago

Updated

Never mind, I've tracked down the root cause and fixed it.

660 should fix this issue.

However, I discovered another uncanny "issue" when I did pprof on gnet transmitting 10MB packets, idle connections would have inuse memory of the elastic.RingBuffer, which should have been released: gnet-conn-mem-pprof Did you see this when you ran pprof with your gnet server? @szza

Yes, i just have found this problem, I'm trying to slove it

szza commented 6 days ago

Idle connections still occupy memory, which the ringbuf cannot release.

5
szza commented 6 days ago

Enable ET mode is a temporary solution ?

szza commented 6 days ago

open a new issue about the ringbuf: https://github.com/panjf2000/gnet/issues/662