lrstanley / girc

:bomb: girc is a flexible IRC library for Go :ok_hand:
https://pkg.go.dev/github.com/lrstanley/girc
MIT License
139 stars 13 forks source link

bug: crash after reconnecting when not sending a message #63

Closed geekosaur closed 8 months ago

geekosaur commented 1 year ago

๐ŸŒง Describe the problem

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xb23397]

goroutine 4248 [running]:
github.com/lrstanley/girc.(*Client).readLoop(0xc000454a80, {0x1722cb8, 0xc0005704b0})
    /home/allbery/go/pkg/mod/github.com/lrstanley/girc@v0.0.0-20230729130341-dd5853a5f1a6/conn.go:440 +0x297
github.com/lrstanley/girc/internal/ctxgroup.(*Group).Go.func1()
    /home/allbery/go/pkg/mod/github.com/lrstanley/girc@v0.0.0-20230729130341-dd5853a5f1a6/internal/ctxgroup/ctxgroup.go:58 +0x63
created by github.com/lrstanley/girc/internal/ctxgroup.(*Group).Go in goroutine 31
    /home/allbery/go/pkg/mod/github.com/lrstanley/girc@v0.0.0-20230729130341-dd5853a5f1a6/internal/ctxgroup/ctxgroup.go:55 +0x79

This is happening with matterbridge, both latest release and git versions, but doesn't appear to be related directly to matterbridge. It also doesn't appear to be the same problem as #14 because the bridge has been quiescent aside from the reconnect, and the line numbers/traceback differ.

โ›… Expected behavior

I expect the IRC client to successfully resume after reconnecting,

๐Ÿ”„ Minimal reproduction

No response

๐Ÿ’  Version: girc

v0.0.0-20230729130341-dd5853a5f1a6

๐Ÿ–ฅ Version: Operating system

linux/ubuntu

โš™ Additional context

I have not as yet tried rebuilding matterbridge with the version on pkg.go.dev. If it matters, the specific Ubuntu release is 22.04.2.

๐Ÿค Requirements

lrstanley commented 1 year ago

I'm unable to replicate on master, specifically using steps:

  1. Connect(), all settings default other than the basics (server, port, nick, user, name)
  2. server initiated disconnect
  3. reconnect with Connect()

Don't see any issues in the above scenario.

Looking at the panic, I have an idea what the issue could be. However, I don't know where the original event is coming from (internal to girc, server, or the caller of the library). Is there any way you have anything that is reproducible? Even if it's with matterbridge, an example matterbridge config file (which just contains IRC if possible), so I can hook up a debugger?

geekosaur commented 1 year ago

I don't have a reproducer because it's happening when my somewhat unstable network drops, not when a proper disconnect occurs. This could probably be simulated with a suitable kernel network driver, but I think only FreeBSD has the appropriate kernel support.

Jille commented 8 months ago

I'm seeing the same stacktrace.

debug:00:03:28 conn.go:435: closing readLoop
debug:00:03:28 client.go:420: received signal to close, flushing 0 events and executing
debug:00:03:28 client.go:431: closing execLoop
debug:00:03:28 conn.go:586: closing sendLoop
debug:00:03:28 conn.go:369: received error, beginning cleanup: read tcp 10.4.3.39:60794->185.100.59.59:6667: i/o timeout
debug:00:03:28 handler.go:29: < CLIENT_DISCONNECTED irc.efnet.nl:6667
debug:00:03:28 handler.go:211: [1/1] exec rmoSuelNipGLXawNpUCR => *
debug:00:03:28 handler.go:232: [1/1] done rmoSuelNipGLXawNpUCR == 4.621ยตs
2024/01/24 00:03:28 Failed to connect to irc.efnet.nl:6667: read tcp 10.4.3.39:60794->185.100.59.59:6667: i/o timeout <== this is my log.Print with the error from .Connect()
debug:00:03:38 conn.go:307: connecting to irc.efnet.nl:6667... (sts: false, config-ssl: false)
debug:00:03:38 conn.go:530: starting sendLoop
debug:00:03:38 client.go:841: > CAP LS 302
debug:00:03:38 client.go:841: > NICK SnoozeThis
debug:00:03:38 client.go:841: > USER SnoozeThis * * www.snoozethis.com
[...]
debug:00:03:39 handler.go:29: < :irc.efnet.nl 004 Sn00zeThs irc.efnet.nl ircd-ratbox-3.0.10 oiwszcrkfydnxbauglZCD biklmnopstveIrS bkloveI
debug:00:03:39 handler.go:211: [1/1] exec rmoSuelNipGLXawNpUCR => *
debug:00:03:39 handler.go:29: < CLIENT_GENERAL_UPDATED
debug:00:03:39 handler.go:211: [1/1] exec rmoSuelNipGLXawNpUCR => *
debug:00:03:39 handler.go:232: [1/1] done rmoSuelNipGLXawNpUCR == 1.448ยตs
debug:00:03:39 handler.go:232: [1/1] done rmoSuelNipGLXawNpUCR == 862ns
debug:00:03:39 handler.go:211: [1/1] exec UoZYLEdqkoZCuTuRpLKz => 004
debug:00:03:39 handler.go:29: < CLIENT_GENERAL_UPDATED
debug:00:03:39 panic.go:884: closing readLoop
debug:00:03:39 handler.go:211: [1/1] exec rmoSuelNipGLXawNpUCR => *
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x948946]

goroutine 5681 [running]:
github.com/lrstanley/girc.(*Client).readLoop(0xc0003fc380, {0xe329a8, 0xc000534080})
    /home/quis/go/pkg/mod/github.com/lrstanley/girc@v0.0.0-20230911164840-f47717952bf9/conn.go:440 +0x266
github.com/lrstanley/girc/internal/ctxgroup.(*Group).Go.func1()
    /home/quis/go/pkg/mod/github.com/lrstanley/girc@v0.0.0-20230911164840-f47717952bf9/internal/ctxgroup/ctxgroup.go:58 +0x6e
created by github.com/lrstanley/girc/internal/ctxgroup.(*Group).Go
    /home/quis/go/pkg/mod/github.com/lrstanley/girc@v0.0.0-20230911164840-f47717952bf9/internal/ctxgroup/ctxgroup.go:55 +0x8d

The networking on my server is unstable as of yesterday, so that seems to match as well.

You could try reproducing by installing an iptables rule that drops all traffic; or by using lsof(1) + gdb(1) to find the file descriptor and close(2) it.