gorilla / websocket

Package gorilla/websocket is a fast, well-tested and widely used WebSocket implementation for Go.
https://gorilla.github.io
BSD 2-Clause "Simplified" License
22.06k stars 3.47k forks source link

[question] concurrent write to websocket connection with mutex.Lock() #828

Open Jrenk opened 1 year ago

Jrenk commented 1 year ago

Describe the problem you're having

Using the gorilla/websocket package I sometimes get the following error, despite using mutex.Lock: concurrent write to websocket connection

Versions

"Show me the code!"

type WebsocketClient struct {
    websocket   *websocket.Conn
    mutex            sync.Mutex
}

func (websocketClient *WebsocketClient) sendMessage(msg interface{}) error {
    websocketClient.mutex.Lock()
    defer websocketClient.mutex.Unlock()

    return websocketClient.websocket.WriteJSON(msg)
}

In an http handler the websocket clients get created:

ws, err := upgrader.Upgrade(w, r, nil)
...
clients[id] = &WebsocketClient{
    websocket: ws,
}

The sendMessage function gets called from multiple go-routines and the WriteJSON never gets called directly anywhere else. From my understanding the mutex.Lock() should prevent concurrent websocket writes.

Jrenk commented 1 year ago

The stack trace points to the sendMessage function. As I already mentioned this is the only place where the WriteJSON or any other Write of the gorilla package gets called. So there is no possibility of the application calling WriteJSON without calling mutex.Lock() beforehand. I don't think this issue should be closed.

FZambia commented 1 year ago

@Jrenk hello, could you provide a reproducing example/test/benchmark then to make it clear that it's an issue of this package?

NomNomCameron commented 1 year ago

@Jrenk I'm running into a similar situation. I have a loop in a goroutine that writes to an incoming socket connection for the heartbeat response. Before each write, I lock a common mutex and unlock after the write. The issue seems like it may be tied to a race condition as the issue doesn't happen consistently. Did you figure out what was causing your issue?

Jrenk commented 1 year ago

@NomNomCameron No I didn't figure it out. The problem happened very rarely and it didn't occur for a very long time. I'm still not sure what causes this sadly.

ayjayt commented 1 year ago

@Jrenk

Could it be someone pinged your websocket server and the default ping handler tried to write a pong while it was locked?

edit:I might test this later today, we'll see.

ayjayt commented 1 year ago

I'm imagining a pull request where configuring the websockets server would ask if you want to stick a lock or mutex or something a couple layers deep. Any thoughts on that?

Jrenk commented 1 year ago

@ayjayt that could actually be the problem. Let me know if you tested this. A lock in the package itself could resolve this as you said.

ayjayt commented 1 year ago

Yeah, I mean it has to be it. I might test it, but you should use

Conn.SetPingHandler(func(string) error) anyway to protect against race conditions.

Here is what the default handler looks like:

h = func(message string) error {
    err := c.WriteControl(PongMessage, []byte(message), time.Now().Add(writeWait))
    if err == ErrCloseSent {
        return nil
    } else if e, ok := err.(net.Error); ok && e.Temporary() {
        return nil
    }
    return err
}

from: https://github.com/gorilla/websocket/blob/9111bb834a68b893cebbbaed5060bdbc1d9ab7d2/conn.go#L1158

ayjayt commented 1 year ago

@coreydaley is this not a bug? If I'm correct, anyone who uses any *Write*() function at anytime might collide with the WriteControl() called by the default handler and produce a concurrent write error. I am not confident in this!

edit: especially since most people are going to run their read loops and write loops in separate go routines, and the default handler is actually called automatically by NextReaders()'s advanceFrame() call

FZambia commented 1 year ago

From Gorilla WebSocket docs:

Connections support one concurrent reader and one concurrent writer.

Applications are responsible for ensuring that no more than one goroutine calls the write methods (NextWriter, SetWriteDeadline, WriteMessage, WriteJSON, EnableWriteCompression, SetCompressionLevel) concurrently and that no more than one goroutine calls the read methods (NextReader, SetReadDeadline, ReadMessage, ReadJSON, SetPongHandler, SetPingHandler) concurrently.

The Close and WriteControl methods can be called concurrently with all other methods.

As I mentioned above - if you think it's a bug in a package you better provide a reproducer instead of guessing. You can use https://pkg.go.dev/golang.org/x/tools/cmd/stress tool to increase the chance of finding the reliable reproducer.

ayjayt commented 1 year ago

if you think it's a bug in a package you better provide a reproducer instead of guessing.

If you want me to spend my time working on this project, and building something to reproduce this bug, you better be nice to me!

We're having a discussion before spending time writing code.

[contributors note: redaction was not needed]

ghost commented 1 year ago

anyone who uses any Write() function at anytime might collide with the WriteControl() called by the default handler and produce a concurrent write error.

The websocket connection uses a mutex (implemented using a channel) to prevent WriteControl from colliding with the *Write* methods. The websocket connection write methods acquire the mutex before executing any write related methods on the network connection (SetWriteDeadline and Write).

(Why the websocket connection uses a channel instead of a sync.Mutex is a puzzle for another day.)

coreydaley commented 1 year ago

Please keep all comments and conversations civil, please keep in mind that Github users are from all over the world and speak hundred of different languages, and their native languages are not english, but they are being gracious enough to use english here. So do not assume that anything is being said with a specific tone or meaning. Disrespectful language will not be tolerated in this community.

We are almost done with the project transition and then we will begin triaging and working on issues, but we are not quite to that point yet. If you think that you have found a bug and would like to submit a pull request, please do so and we will review it as part of triaging/addressing this issue.

ayjayt commented 1 year ago

If I'm going to be chastised for asking someone to speak respectfully to me, I will simply not participate here further.

coreydaley commented 1 year ago

If I'm going to be chastised for asking someone to speak respectfully to me, I will simply not participate here further.

That is your prerogative. This is not the first comment within your short time of contributing to this project that could be deemed disrespectful to someone who has disagreed with you or questioned you: https://github.com/gorilla/websocket/issues/841#issuecomment-1644901325

As far as I am concerned that is two strikes. One more and we will have to discuss blocking you from interacting with the organziation.

jaitaiwan commented 4 months ago

@e9x That sort of comment will not be tolerated. I sincerely hope that you are able to find help that allows you to be a more civil member of society.

coreydaley commented 4 months ago

Not that it will do any good but I reported the comment from @e9x, not sure what I did to elicit such a comment.

jaitaiwan commented 1 week ago

Alright, I'm very tired, but I think I've reviewed this ticket enough to come to the conclusion that this should technically be considered a bug. While the documentation does specify that it supports only one reader/writer doing so concurrently at a time, it doesn't make it clear that if you're doing your own concurrent writes you'd also have to overwrite the ping handler.

In this case the appropriate solution is likely to document the need for the ping handler being specified if performing your own concurrent writing and to have this resolved in what ever future version of this library.

hulkingshtick commented 1 week ago

if you're doing your own concurrent writes you'd also have to overwrite the ping handler.

That's not true. WriteControl, the method used by the ping handler, can be called concurrently with the other write methods. The connection is implemented to support the concurrent call. The documentation states that the concurrent call is allowed.

The concurrent write to websocket connection panic cannot be triggered by the WriteControl method because the WriteControl method does not directly or indirectly call the flushFrame method where the concurrency detector is implemented. The concurrent write to websocket connection panic is caused by concurrent application calls to WriteMessage, WriteJSON and NextWriter, not the ping handler.

Edit: Previous comments (1, 2) also explain why the ping handler is not a problem.