panjf2000 / gnet

🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.
https://gnet.host
Apache License 2.0
9.51k stars 1.03k forks source link

[Bug]: gnet.Conn.Peek crashes slice bounds out of range #615

Closed serious-snow closed 2 months ago

serious-snow commented 2 months ago

Actions I've taken before I'm here

What happened?

gnet.Conn.Peek 发生崩溃 测试过 Next 一样

经过测试: 2.5.0+会出现这个问题,换到2.4.2正常

如果每次读取完数据也不会崩溃,但是目前我的需求就是想控制每个请求发包的数量,免得for循环占用太多时间,我想处理一定条数就关闭连接。

func (es *echoServer) OnTraffic(c gnet.Conn) gnet.Action {
    const headerLen = 2
    for {
        // header
        headerBuf, _ := c.Peek(headerLen)
        if len(headerBuf) < headerLen {
            break
        }

        bodyLen := binary.BigEndian.Uint16(headerBuf)
        msgLen := headerLen + int(bodyLen)

        data, _ := c.Peek(msgLen)
        if len(data) < msgLen {
            break
        }
        _, _ = c.Discard(msgLen)
        // clone data
        dd := string(data)
        go func() {
            // do something slowly
            time.Sleep(time.Millisecond)
            _ = c.AsyncWrite([]byte(dd), nil)
        }()
    }
    return gnet.None
}

Major version of gnet

v2

Specific version of gnet

v2.5.0 +

Operating system

Linux, macOS

OS version

Linux 5.15.0-105-generic x86_64,Darwin 23.5.0 arm64

Go version

go version go1.22.0 darwin/arm64

Relevant log output

[gnet] 2024-06-22T15:54:38.18572+08:00  INFO    logging/logger.go:256   Launching gnet with 10 event-loops, listening on: tcp://:53
2024/06/22 15:54:38 echo server with multi-core=true is listening on tcp://:53
panic: runtime error: slice bounds out of range [:-244]

goroutine 36 [running]:
github.com/panjf2000/gnet/v2.(*conn).Peek(0x14000512000, 0x10091e474?)
        /Users/wangjian/go/pkg/mod/github.com/panjf2000/gnet/v2@v2.5.3/connection_unix.go:366 +0x258
main.(*echoServer).OnTraffic(0x100875400?, {0x1009da0b0, 0x14000512000})
        /Users/wangjian/work/dns/gnet-examples/echo_tcp/echo.go:36 +0x60
github.com/panjf2000/gnet/v2.(*eventloop).read(0x14000176100, 0x14000512000)
        /Users/wangjian/go/pkg/mod/github.com/panjf2000/gnet/v2@v2.5.3/eventloop_unix.go:144 +0xf8
github.com/panjf2000/gnet/v2.(*conn).processIO(0x14000512000, 0x14000110ae0?, 0x11?, 0x1)
        /Users/wangjian/go/pkg/mod/github.com/panjf2000/gnet/v2@v2.5.3/connection_bsd.go:32 +0x64
github.com/panjf2000/gnet/v2.(*eventloop).orbit.func1(0x11, 0xffff, 0x1)
        /Users/wangjian/go/pkg/mod/github.com/panjf2000/gnet/v2@v2.5.3/reactor_default.go:66 +0x1a8
github.com/panjf2000/gnet/v2/internal/netpoll.(*Poller).Polling(0x14000162140, 0x14000492f10)
        /Users/wangjian/go/pkg/mod/github.com/panjf2000/gnet/v2@v2.5.3/internal/netpoll/poller_kqueue_default.go:121 +0x264
github.com/panjf2000/gnet/v2.(*eventloop).orbit(0x14000176100)
        /Users/wangjian/go/pkg/mod/github.com/panjf2000/gnet/v2@v2.5.3/reactor_default.go:54 +0x80
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /Users/wangjian/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x58
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
        /Users/wangjian/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x98

Code snippets (optional)

package main

import (
    "encoding/binary"
    "flag"
    "fmt"
    "log"
    "time"

    "github.com/panjf2000/gnet/v2"
)

type echoServer struct {
    gnet.BuiltinEventEngine

    eng       gnet.Engine
    addr      string
    multicore bool
}

func (es *echoServer) OnBoot(eng gnet.Engine) gnet.Action {
    es.eng = eng
    log.Printf("echo server with multi-core=%t is listening on %s\n", es.multicore, es.addr)
    return gnet.None
}

func (es *echoServer) OnTraffic(c gnet.Conn) gnet.Action {
    const headerLen = 2
    // header
    headerBuf, _ := c.Peek(headerLen)
    if len(headerBuf) < headerLen {
        return gnet.None
    }
    bodyLen := binary.BigEndian.Uint16(headerBuf)
    msgLen := headerLen + int(bodyLen)

    data, _ := c.Peek(msgLen)
    if len(data) < msgLen {
        return gnet.None
    }

    _, _ = c.Discard(msgLen)
    // clone data
    dd := string(data)
    go func() {
        // do something slowly
        time.Sleep(time.Millisecond)
        _ = c.AsyncWrite([]byte(dd), nil)
    }()
    return gnet.None
}

func main() {
    var port int
    var multicore bool

    // Example command: go run echo.go --port 53 --multicore=true
    flag.IntVar(&port, "port", 53, "--port 53")
    flag.BoolVar(&multicore, "multicore", true, "--multicore true")
    flag.Parse()
    echo := &echoServer{addr: fmt.Sprintf("tcp://:%d", port), multicore: multicore}
    log.Fatal(gnet.Run(echo, echo.addr, gnet.WithMulticore(multicore)))
}

How to Reproduce

  1. 运行代码
  2. 使用dnsperf 压测

dnsperf -T 12 -c 12 -d records.txt -s 127.0.0.1 -l 60 -p 53 -m tcp

records.txt 内容

www29.test1.com.  A
www30.test1.com.  A
  1. 出现 panic

Does this issue reproduce with the latest release?

It can reproduce with the latest release

panjf2000 commented 2 months ago

I think the following code changes should fix this bug:

diff --git a/connection_unix.go b/connection_unix.go
index 68699f16..cfce3a92 100644
--- a/connection_unix.go
+++ b/connection_unix.go
@@ -325,13 +325,13 @@ func (c *conn) Next(n int) (buf []byte, err error) {
    }
    head, tail := c.inboundBuffer.Peek(n)
    defer c.inboundBuffer.Discard(n) //nolint:errcheck
-   if len(head) == n {
+   if len(head) >= n {
        return head[:n], err
    }
    c.loop.cache.Reset()
    c.loop.cache.Write(head)
    c.loop.cache.Write(tail)
-   if inBufferLen == n {
+   if inBufferLen >= n {
        return c.loop.cache.Bytes(), err
    }

@@ -352,13 +352,13 @@ func (c *conn) Peek(n int) (buf []byte, err error) {
        return c.buffer[:n], err
    }
    head, tail := c.inboundBuffer.Peek(n)
-   if len(head) == n {
+   if len(head) >= n {
        return head[:n], err
    }
    c.loop.cache.Reset()
    c.loop.cache.Write(head)
    c.loop.cache.Write(tail)
-   if inBufferLen == n {
+   if inBufferLen >= n {
        return c.loop.cache.Bytes(), err
    }

Could you verify if these code changes resolve this problem on your machines and return here? Thanks! If it does, you can open a pull request merging those code changes into gnet if you're interested in contributing code to this project.

panjf2000 commented 2 months ago

FYI, this bug should have been caused by these code changes.

serious-snow commented 2 months ago

I think the following code changes should fix this bug:我认为以下代码更改应该可以修复此错误:

diff --git a/connection_unix.go b/connection_unix.go
index 68699f16..cfce3a92 100644
--- a/connection_unix.go
+++ b/connection_unix.go
@@ -325,13 +325,13 @@ func (c *conn) Next(n int) (buf []byte, err error) {
  }
  head, tail := c.inboundBuffer.Peek(n)
  defer c.inboundBuffer.Discard(n) //nolint:errcheck
- if len(head) == n {
+ if len(head) >= n {
      return head[:n], err
  }
  c.loop.cache.Reset()
  c.loop.cache.Write(head)
  c.loop.cache.Write(tail)
- if inBufferLen == n {
+ if inBufferLen >= n {
      return c.loop.cache.Bytes(), err
  }

@@ -352,13 +352,13 @@ func (c *conn) Peek(n int) (buf []byte, err error) {
      return c.buffer[:n], err
  }
  head, tail := c.inboundBuffer.Peek(n)
- if len(head) == n {
+ if len(head) >= n {
      return head[:n], err
  }
  c.loop.cache.Reset()
  c.loop.cache.Write(head)
  c.loop.cache.Write(tail)
- if inBufferLen == n {
+ if inBufferLen >= n {
      return c.loop.cache.Bytes(), err
  }

Could you verify if these code changes resolve this problem on your machines and return here? Thanks! If it does, you can open a pull request merging those code changes into gnet if you're interested in contributing code to this project.您能否验证这些代码更改是否在您的计算机上解决了此问题并返回此处?谢谢!如果是这样,如果你有兴趣向这个项目贡献代码,可以打开一个拉取请求,将这些代码更改合并到其中 gnet

it works, thanks. If I'm going to create a pr, do I need to write additional test code for this bug?

panjf2000 commented 2 months ago

it works, thanks. If I'm going to create a pr, do I need to write additional test code for this bug?

It seems hard to write such a test that can reproduce this issue, but it would be nice if you could create one. But I'm OK to review a PR without that test, after all, it only reverts the code to the previous version.

serious-snow commented 2 months ago

it works, thanks. If I'm going to create a pr, do I need to write additional test code for this bug?

It seems hard to write such a test that can reproduce this issue, but it would be nice if you could create one. But I'm OK to review a PR without that test, after all, it only reverts the code to the previous version.

没有写测试用例,想到方法的时候再来这里补充吧。pr 在这里,https://github.com/panjf2000/gnet/pull/616, ps: 这是我第一次提交pr, 可能有什么不规范的地方。

panjf2000 commented 2 months ago

Fixed by #616