libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p
https://github.com/libp2p/specs/tree/master/pubsub
Other
317 stars 182 forks source link

panic: close of closed channel #425

Closed prestonvanloon closed 3 years ago

prestonvanloon commented 3 years ago

Reported https://github.com/prysmaticlabs/prysm/issues/8958.

May 29 13:45:06 prysm prysm.sh[2728527]: panic: close of closed channel
May 29 13:45:06 prysm prysm.sh[2728527]: goroutine 210 [running]:
May 29 13:45:06 prysm prysm.sh[2728527]: github.com/libp2p/go-libp2p-pubsub.(*Subscription).close(...)
May 29 13:45:06 prysm prysm.sh[2728527]:         external/com_github_libp2p_go_libp2p_pubsub/subscription.go:46
May 29 13:45:06 prysm prysm.sh[2728527]: github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleRemoveSubscription(0xc0002c2c00, 0xc061f75700)
May 29 13:45:06 prysm prysm.sh[2728527]:         external/com_github_libp2p_go_libp2p_pubsub/pubsub.go:674 +0xa5
May 29 13:45:06 prysm prysm.sh[2728527]: github.com/libp2p/go-libp2p-pubsub.(*PubSub).processLoop(0xc0002c2c00, 0x1e31910, 0xc00037f2c0)
May 29 13:45:06 prysm prysm.sh[2728527]:         external/com_github_libp2p_go_libp2p_pubsub/pubsub.go:558 +0x1505
May 29 13:45:06 prysm prysm.sh[2728527]: created by github.com/libp2p/go-libp2p-pubsub.NewPubSub
May 29 13:45:06 prysm prysm.sh[2728527]:         external/com_github_libp2p_go_libp2p_pubsub/pubsub.go:293 +0xb7e

I'm not sure how to reproduce this issue, but this routine seems to close a chan that is already closed.

https://github.com/libp2p/go-libp2p-pubsub/blob/v0.4.0/subscription.go#L45-L47

github.com/libp2p/go-libp2p-pubsub v0.4.0
aschmahmann commented 3 years ago

What's going on is that the application is calling sub.Cancel() on a subscription that has already been cancelled. The awkward thing here is that if the application didn't have a second active subscription for the topic then this wouldn't be triggered due to https://github.com/libp2p/go-libp2p-pubsub/blob/e6ad80cf4782fca31f46e3a8ba8d1a450d562f49/pubsub.go#L666-L669

It seems like we should probably have a boolean in the subscription object that can track if the object has been closed or not. Once we have that we can either:

  1. Panic if the subscription has already been closed (IMO bad UX, but it does help detect bugs like this)
  2. Just return if the subscription has already been closed

@vyzo any thoughts/reason not to do option 2, it seems like that was the original intent here since we return if the subscription is missing.


This test reproduces the panic:

func TestProducePanic(t *testing.T) {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    const numHosts = 5
    topicID := "foobar"
    hosts := getNetHosts(t, ctx, numHosts)
    ps := getPubsub(ctx, hosts[0])

    // Create topic
    topic, err := ps.Join(topicID)
    if err != nil {
        t.Fatal(err)
    }

       // Create subscription we're going to cancel
    s, err := topic.Subscribe()
    if err != nil {
        t.Fatal(err)
    }
    // Create second subscription to keep us alive the the subscription map after the first one is cancelled
    s2, err := topic.Subscribe()
    if err != nil {
        t.Fatal(err)
    }
    _ = s2

    s.Cancel()
    time.Sleep(time.Second)
    s.Cancel()
    time.Sleep(time.Second)
}
vyzo commented 3 years ago

option 2 sounds best.

On Thu, Jun 10, 2021, 20:28 Adin Schmahmann @.***> wrote:

What's going on is that the application is calling sub.Cancel() on a subscription that has already been cancelled. The awkward thing here is that if the application didn't have a second active subscription for the topic then this wouldn't be triggered due to https://github.com/libp2p/go-libp2p-pubsub/blob/e6ad80cf4782fca31f46e3a8ba8d1a450d562f49/pubsub.go#L666-L669

It seems like we should probably have a boolean in the subscription object that can track if the object has been closed or not. Once we have that we can either:

  1. Panic if the subscription has already been closed (IMO bad UX, but it does help detect bugs like this)
  2. Just return if the subscription has already been closed

@vyzo https://github.com/vyzo any thoughts/reason not to do option 2?

This test reproduces the panic:

func TestProducePanic(t *testing.T) { ctx, cancel := context.WithCancel(context.Background()) defer cancel()

const numHosts = 5 topicID := "foobar" hosts := getNetHosts(t, ctx, numHosts) ps := getPubsub(ctx, hosts[0])

// Create topic topic, err := ps.Join(topicID) if err != nil { t.Fatal(err) }

   // Create subscription we're going to cancel

s, err := topic.Subscribe() if err != nil { t.Fatal(err) } // Create second subscription to keep us alive the the subscription map after the first one is cancelled s2, err := topic.Subscribe() if err != nil { t.Fatal(err) } _ = s2

s.Cancel() time.Sleep(time.Second) s.Cancel() time.Sleep(time.Second) }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libp2p/go-libp2p-pubsub/issues/425#issuecomment-858814319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAI4ST3LHTDUK5JETPHRMLTSDY4NANCNFSM46O5M2ZQ .