[bug] NextReader() panic

gorilla / websocket

Package gorilla/websocket is a fast, well-tested and widely used WebSocket implementation for Go.

https://gorilla.github.io

BSD 2-Clause "Simplified" License

22.06k stars 3.47k forks source link

[bug] NextReader() panic #657

Closed calbot closed 3 years ago

calbot commented 3 years ago

in conn.go NextReader...

    c.readErrCount++
    if c.readErrCount >= 1000 {
        panic("repeated read on failed websocket connection")
    }

This shouldn't panic if a timeout on read happens because I expect this. I would think readErrCount would reset if a subsequent successful read was completed. But that doesn't fix the problem in general since there could still be a string of timeouts over 1000.

I don't think you should panic or increment the readErrCount if the error was a timeout. I don't know if I like the panic here in general. Perhaps this panic could be optionally disabled on by setting a flag on the connection.

I'm trying to use gorrilla for a load test and using SetReadDeadline to test multiple connections in a loop (rather than go routines per connection which isn't scaleable)

ghost commented 3 years ago

Timeout errors fail the connection (see doc here).

The application called a read method 1000 times after the connection failed. That's a very strong signal that the application is not checking and handling errors. A panic is more useful than letting the application spin in a tight loop banging its little head against a big wall of errors.

calbot commented 3 years ago

@TaylorRex that may be the common case but for me I would like at least the option of having it disabled. Especially since net.Error has a Temporary()method. For temporary issues the connection should be recoverable for further reading. And as I said the count is never reset so this could accumulate over hours of normal use and just kill your app randomly.

This code doesn't know enough to determine whether it's in a tight loop or not so it doesn't really achieve what it's stating. There should be a time recorded the first error to see if it's actually in a tight loop if that's what this is for.

calbot commented 3 years ago

Also, readErr is never reset as far as I can tell so there isn't really any point in waiting for 1000 calls to panic since you're never going to read anything anyways since ReadNext only reads if readErr == nil

ghost commented 3 years ago

The package does not return net.Errors with Temporay() == true. Timeout errors fail the connection.

The code detects an application that repeatedly calls read on a failed connection. That's almost certainly a bug in the application, tight loop or not.

there isn't really any point in waiting for 1000 calls

A panic on the first bad call breaks applications that handle some errors. Here's an example:

for {
    _, meta, _ := c.ReadMessage()
    _, payload, err := c.ReadMesage()
    if err != nil {
        return err
    }
    process(meta, payload)
    }

The application expects message in pairs from the client. Checking the error on the second message is sufficient to handle errors. This application will break if the first bad call to ReadMessage panics.

ghost commented 3 years ago

Timeouts are not in recoverable at all points in the code. The code can be modified so recovery is possible, but that will take some work.

Example: The reads following this line are an example of where the timeout is not recoverable. A couple of header bytes were consumed from the stream earlier in the function. If the function blows out with an error and is called again, the function will be out of sync by those two bytes.

Edit: There is an issue on recovery from timeout. See #474.

calbot commented 3 years ago

Ok, thanks for explaining this. I've modified it in my vendored code to continue after timeout and haven't run into the corruption issue yet so hopefully it will be fine for my testing purposes. I'll close this since #474 is duplicate