gopherjs / websocket

websocket provides high- and low-level bindings for the browser's WebSocket API.
BSD 3-Clause "New" or "Revised" License
110 stars 25 forks source link

sporadic runtime error: send on closed channel #13

Open gordonklaus opened 9 years ago

gordonklaus commented 9 years ago

I am sporadically seeing this error at: https://github.com/gopherjs/websocket/blob/master/conn.go#L130 Presumably it could also happen at: https://github.com/gopherjs/websocket/blob/master/conn.go#L122

Because the sends are done in separate goroutines, they aren't guaranteed to execute anytime soon – in particular not before c.ch is eventually closed.

Might I recommend, as JS runs single-threaded, simply to queue messages in a slice rather than a chan? No synchronization necessary, no need to spin up goroutines -> consistent results! Hopefully ;-)

gordonklaus commented 9 years ago

Ah, silly me. Of course you need synchronization in order to do blocking Read.

gordonklaus commented 9 years ago

The problem seems to go away if I increase the buffer size of Conn.ch. Although I suppose it would crop up again with a slow Reader under heavy messaging load.

I should correct my initial reasoning about this bug: I only expect the error to occur at: https://github.com/gopherjs/websocket/blob/master/conn.go#L130 and never at: https://github.com/gopherjs/websocket/blob/master/conn.go#L122 because c.ch <- nil will never come after the channel is closed. But the order of these two sends is indeterminate, and thus the non-nil send may execute long after the nil send, and even after the channel is closed.

gordonklaus commented 9 years ago

Despite my ironclad reasoning above, I am observing "send on closed channel" in onClose. I have verified that onClose is the only instance sending nil, and that it only tries to do it once, but that it somehow tries to do so after the channel has already been closed. Baffled.

theclapp commented 8 years ago

I am seeing this too, in (as far as I can tell) up to date versions of gopherjs and this websocket package.

My server sends two messages and closes the websocket. The websocket package receives the first and puts it on the conn channel, where another waiting goroutine immediately reads it. The websocket receives the second message and queues it (filling the channel's one-item capacity). The onClose() then runs, which tries to send a nil, which blocks, except that it seems to queue the nil and then block, don't ask me how.

At that point, the goroutine that's reading the channel services the second message, and then gets the nil that the onClose sent, and calls handleFrame() in conn.go, which closes the channel. Then the onClose() goroutine is finally scheduled again, just after the Javascript $send() call, but before the check to see if the channel is closed. Which it now is closed, so it throws a "send on closed channel" error.

In the Chrome console, if I click on the $b function in the backtrace from the error, I land in the compiled JS of conn.go's onClose() function, with the cursor in the indicated spot (linebreaks added):

$r = $send(c.ch, ptrType$3.nil);
$s = 1;
case 1: if($c) { $c = false; $r = $r.$blk(); }
                                  // ^cursor here, right before $blk()

$r.$blk() is what actually checks for a closed channel and throws the error.

I too made the error go away by increasing the channel capacity from 1 to 2, but agree that that seems like hiding the problem, not fixing it.

I also got the error to go away, even with a capacity of zero on that channel, by calling close(c.ch) in handleFrame() in its own goroutine.

Before (broken):

} else if message == nil {
    // See onClose for the explanation about sending a nil item.
    close(c.ch)
    return nil, io.EOF
}

After (seems to work in light testing):

} else if message == nil {
    // See onClose for the explanation about sending a nil item.
    go func() {
        close(c.ch)
    }()
    return nil, io.EOF
}

This probably isn't the best solution, but it may point the way to the problem or a better solution. Putting the close in a goroutine of its own seems to let the check-for-close after the nil is sent complete successfully.

This looks like it might possibly be a gopherjs error, so I'm going to tag @shurcooL. I don't have any more time tonight or I'd try my hand at creating a minimal test case in the playground.

dmitshur commented 8 years ago

Hey, thanks for the detailed analysis and the mention @theclapp. I am absolutely interested in looking at this issue and finding an optimal solution. But realistically, I won't be able to get to it for some time (maybe this weekend, but more likely a few weekends away). I'll add it to my backlog of tasks to get to so I don't forget. Thanks! :)

theclapp commented 8 years ago

Minimal (or at least, really small) test case in the playground, http://www.gopherjs.org/playground/#/RyMy65T8Kr

package main

func main() {
    ch := make(chan bool)
    go func() {
        ch <- false
    }()
    go func() {
        select {
        case <-ch:
            close(ch)
        }
    }()
}

Result: JS Console: Uncaught Error: runtime error: send on closed channel

Actually this kind of clinches that this isn't a websocket bug. Will file a bug on gopherjs.

dmitshur commented 8 years ago

Awesome find @theclapp! We'll definitely need to resolve this bug in GopherJS first, then it's likely this WebSocket issue will either go away, or be easier to fix.