philippseith / signalr

SignalR server and client in go
MIT License
131 stars 39 forks source link

Client maintains HTTP connection when negotiate fails with a 404 #161

Closed brandon-hc closed 12 months ago

brandon-hc commented 1 year ago

I am using the SignalR client with the WithConnector option to maintain a connection to the SignalR server. I have found that if the route the client tries to connect to returns a 404, the client maintains the HTTP connection to the server. Using TCPView, I can see multiple established connections for each connection attempt.

After doing some investigating, it appears the issue has to do with DisableKeepAlives not being set to true when creating the http.Client in httpconnection.go NewHTTPConnection.

Been also experiencing an issue where after about 5 minutes of attempts, NewHTTPConnection hangs on httpConn.client.Do, still trying to investigate this issue futher.

For reference, below is a sample application I've been using to reproduce this issue.

package main

import (
    "context"
    "github.com/philippseith/signalr"
    "time"
)

var connectionCtx context.Context
var connectionCancel context.CancelFunc
var client signalr.Client

func main() {
    connectionLoopCtx, _ := context.WithCancel(context.Background())

    var err error
    client, err = signalr.NewClient(connectionLoopCtx, signalr.WithConnector(getConnection))
    if err != nil {
        panic(err)
    }

    client.Start()

    for {
    }
}

func getConnection() (signalr.Connection, error) {
    if connectionCtx != nil {
        connectionCancel()
    }
    connectionCtx, connectionCancel = context.WithTimeout(context.Background(), 2*time.Second)

    return signalr.NewHTTPConnection(connectionCtx, "https://localhost:11001/signalr/badroute")
}
philippseith commented 1 year ago

I don't know if this is really an issue. IMHO the open connections are just a result of the connection pooling in the net/http package. The behavior could easily be changed by setting Close for the negotiate request in NewHTTPConnection, but would lead to a slower connection process when the server response is ok. @brandon-hc what is your use case? Why are open tcp connections a problem there?

brandon-hc commented 1 year ago

That's fair, the more I think about the issue I'm trying to solve this is definitely the wrong approach.

My use case is trying to make sure my service handles accessing an older version of the server, which won't have the new hub causing a 404. I want to have it keep trying in the event the server gets updated, but the real issue I'm facing is when my service repeatedly hits a 404 when trying to negotiate, eventually the backoff gives up and it starts trying to connect rapidly. With each connection being kept alive, this quickly causes to exhaust all available ports on the system and take down networking.

When trying to isolate it with my test code above, I encountered a different behavior where it was eventually hanging on connecting rather than the issue I've encountering with the backoff, so I tried to isolate the port exhaustion part instead.

The issue isn't the keep alive itself, as you are right, it is definitely beneficial to maintain the connection if its succeeds, but if the negotiate fails it appears the connection isn't reused and is left open. With the back off and the connections eventually timing out (I'm assuming the server is dropping them, although I'm not certain), it usually isn't an issue. I guess it could still be an issue if someone wasn't using WithConnector and calling Start and failing very quickly, although I haven't tested that scenario.

Truly I need to investigate my back off issue further and replicate it a in test case rather than go after a symptom of the issue. If you would like, I can close this issue and create a new one if I find something with the backoff. Either way I will follow up with my findings.

I should add that my initial theory for my backoff issue is that the library has a DefaultMaxElapsedTime of 15 minutes, which afterwards NextBackOff() returns Stop, which is -1 duration by what I can tell. Although when testing this theory with my test case above, it continued beyond 15 minutes until eventually it hung during the HTTP request.

Although from what I can tell, its possible that it is still my issue or could related, as from my current understanding of the code should work, is the default exponential backoff settings should return Stop(time.Duration=-1) to time.After(boff.NextBackOff()) after the MaxElapsedTime(15 minutes). When a negative value is passed to time.After, it returns the current time, which would cause the channel to return instantly.

brandon-hc commented 1 year ago

Alright, so I think I've figured out my issues:

1. My test code randomly hanging on the HTTP request. This was a noob mistake by me using for{} instead of select{} and causing a deadlock. 🤦‍♂️ Once I figured this out, my sample code was able to replicate issue 2. 2. After 15 minutes of failed attempts, the client begins retrying connection immediately. This definitely follows with the default backoff settings, I see three options to handle it:

3. When client connection fails with a 404, the HTTP connection remains open and is not reused. My real issue here was I wasn't using WithHTTPClient, so in NewHTTPConnection it creates a new client on Line 57 which doesn't allow for re-use of the connection. If I changed it to use http.DefaultClient or use WithHTTPClient option it only ever had one open connection as expected.

If you make the the for to select change in the sample code above, it should replicate issue 2 & 3. Each connection attempt will open a new HTTP connection, eventually some start to get cleaned up as they timeout I presume. After 15 minutes, backoff elapses and it starts connecting repeatedly until it hits port exhaustion.

This screenshot shows multiple connections after the first 10ish attempts before the backoff gets to its MaxInterval image

In the interim, I'm just going to manually modify client.go to set MaxElapsedTime to 0 and use WithHTTPClient to resolve my original issue.

If you like would like, let me know you how want to handle it and I can make a pull request for it. It is an awesome library and would like to be able to contribute back where I can!

philippseith commented 1 year ago

@brandon-hc, I did some googling on leaking connections in the go http package and found something there: https://blog.cubieserver.de/2022/http-connection-reuse-in-go-client/ In #162 all responses are read to end before they are closed. Furthermore I changed the client fallback to http.DefaultClient. Can you try this PR with your setup?

brandon-hc commented 1 year ago

That definitely explains the behavior I was seeing with .Close(), I guess it makes senses to ensure the body is read and closed if you want to reuse the connection.

I have tested #162 with my test case from above and can confirm that I see it reusing the HTTP connection as expected.

The new WithBackoff option fixes my rapid connection after 15 minutes issue, although I feel like the the loop in client.Start should check to see if the backoff has exceeded its max and back out/throw error.

Thank you for assistance with the issue and your hard work on the library!

GrimmiMeloni commented 1 year ago

The new WithBackoff option fixes my rapid connection after 15 minutes issue, although I feel like the the loop in client.Start should check to see if the backoff has exceeded its max and back out/throw error.

@philippseith for what it's worth - we just stumbled into this exact problem. While it is nice that we can fix it with the withBackoff option, it is an odd default behavior.

Our scenario was that a cloud provided SignalR Server was rejecting auth tokens, and after 15 minutes the client started to hammer the cloud endpoint by trying to connect 10 times in a second. I think you should reconsider to change the default to MaxElapsedTime = 0. It would have saved us some headaches.

philippseith commented 12 months ago

@GrimmiMeloni your discussion with @andig helped a me lot to understand the problem. I will add a check for backoff.Stop which ends the client loop.

philippseith commented 12 months ago

@GrimmiMeloni, @brandon-hc See #177