zeromq / goczmq

goczmq is a golang wrapper for CZMQ.
Mozilla Public License 2.0
591 stars 94 forks source link

SIGABRT and core dump when trying to check pollin/pollout on a destroyed sock #291

Open aHugues opened 3 years ago

aHugues commented 3 years ago

Hi there, I've ran into an issue when working with this project.

Due to an error on my part, I've tried to access the method Sock.Pollin on a sock that was previously destroyed. Doing so resulted in a SIGABRT and a core dump from the underlying czmq lib.

Steps to reproduce

This simple example would result in a similar crash

package main

import (
    "fmt"
    "log"
    "time"

    goczmq "gopkg.in/zeromq/goczmq.v4"
)

func checkPollin(sock *goczmq.Sock) {
    for {
        fmt.Printf("pollin: %t\n", sock.Pollin())
        time.Sleep(100 * time.Millisecond)
    }
}

func main() {
    sock, err := goczmq.NewSub("inproc://test", "test")
    if err != nil {
        log.Fatal("Impossible to create sock")
    }

    go checkPollin(sock)
    time.Sleep(150 * time.Millisecond)
    sock.Destroy()
    time.Sleep(500 * time.Millisecond)
}

Running this example would result in the following output:

pollin: false
pollin: false
panic_zmq: src/zsock_option.inc:3760: zsock_events: Assertion `self' failed.
SIGABRT: abort
PC=0x7f34f5bac18b m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7f34f5bac18b
stack: frame={sp:0x7ffd70922c60, fp:0x0} stack=[0x7ffd70123fb8,0x7ffd70922ff0)

... (omitted core dump)

Reason

The czmq lib calls assert (self); in the beginning of each method, which would fail when the sock has been destroyed because resources have been freed. When the assert fails, a SIGABRT is sent, which triggers the core dump.

The goczmq lib does not perform additional verification before calling the C method

// Events returns the current value of the socket's events option
func Events(s *Sock) int {
    val := C.zsock_events(unsafe.Pointer(s.zsockT))
    return int(val)
}

// Pollin returns true if there is a Pollin
// event on the socket
func (s *Sock) Pollin() bool {
    return Events(s)&Pollin == Pollin
}

Solutions

My fix was to make sure that the case was not possible (so joining all goroutines before destroying the sock), but two solutions would make this approach more robust, as a SIGABRT is not recoverable:

The second solution might modify the Pollin signature with something like func (s *Sock) Pollin() (bool, error) which would make it easy to process errors during the Pollin call

Additional remarks

For now, I'm pretty sure this affects the Pollin and Pollout methods. I suspect that this affects all methods calling directly the C library as well, for instance RecvFrame, but I have not checked that.

Elvis339 commented 5 months ago

Hey @aHugues have you implemented and tested your solution?