hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.05k stars 4.4k forks source link

http.Response.Body of api response is sometimes not read completely and connections are not reused. #3316

Open oliverbestmann opened 6 years ago

oliverbestmann commented 6 years ago

consul version for both Client and Server

Client: 0.8.5 Server: 0.8.5

Operating system and Environment details

Go 1.8.3

Description of the Issue (and unexpected/desired result)

The consul api client in golang does not consume the complete body of the response received from the consul server. Because of that, the http (keep alive) connection can not be re-used and will be closed.

Reproduction steps

Use a connection that prints a message, each time a new connection is created, then call some endpoints, like client.Health().Service("my-service", "", true, nil):

api.NewClient(&api.Config{
    Address: consulAddress,
    HttpClient: &http.Client{
        Transport: &http.Transport{
            Proxy: nil,
            DialContext: func(ctx context.Context, network, address string) (net.Conn, error) {
                log.Infof("Dialing %s %s", network, address)
                return dial(ctx, network, address)
            },
            ResponseHeaderTimeout: 90 * time.Second,
        },
    },
})
findkim commented 3 years ago

Hi @oliverbestmann! I happened to be looking into this while debugging a seemingly related bug on another project using the Consul API client (https://github.com/hashicorp/consul-terraform-sync/issues/146). I wanted to add my findings from following the reproduction steps you provided hopefully to help out anyone else.

I wrote up the test below to capture concurrent and serial requests using the Consul API client. It reveals that a new connection is established for concurrent requests. Once a pool of connections are established, subsequent serial requests reuse those existing connections.

client_test.go ```go package client_test import ( "context" "log" "net" "net/http" "sync" "testing" "time" "github.com/hashicorp/consul/api" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func TestAPIClient_reusableConns(t *testing.T) { var dialCount int dialer := net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, } transport := http.Transport{ Proxy: nil, Dial: dialer.Dial, DialContext: func(ctx context.Context, network, address string) (net.Conn, error) { dialCount++ log.Printf("[INFO] Dialing %s %s", network, address) return dialer.DialContext(ctx, network, address) }, MaxIdleConns: 5, MaxIdleConnsPerHost: 5, IdleConnTimeout: 90 * time.Second, TLSHandshakeTimeout: 10 * time.Second, } config := api.DefaultConfig() config.HttpClient = &http.Client{Transport: &transport} client, err := api.NewClient(config) require.NoError(t, err) // New connections: concurrent requests var wg sync.WaitGroup wg.Add(5) for i := 0; i < 5; i++ { go func() { services, qm, err := client.Health().Service("app", "", true, nil) require.NoError(t, err) t.Log("Nubmer of services:", len(services)) t.Log("Request time:", qm.RequestTime) wg.Done() }() } wg.Wait() // Reuses connections: serial requests waits for complete // Implies that the API client is reading all of the body for i := 0; i < 5; i++ { services, qm, err := client.Health().Service("app", "", true, nil) require.NoError(t, err) t.Log("Number of services:", len(services)) t.Log("Request time:", qm.RequestTime) require.NoError(t, err) } assert.Equal(t, dialCount, 5, "unexpected # of connections reused") } ```

Output consul/api client v1.4.0 (go1.14.6) with server 1.9.0

$ go test ./client_test/ -v
=== RUN   TestAPIClient_reusableConns
2020/12/08 11:22:24 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:22:24 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:22:24 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:22:24 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:22:24 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:49: Number of services: 100
    client_test.go:50: Request time: 9.65514ms
    client_test.go:49: Number of services: 100
    client_test.go:50: Request time: 9.994027ms
    client_test.go:49: Number of services: 100
    client_test.go:50: Request time: 9.494824ms
    client_test.go:49: Number of services: 100
    client_test.go:50: Request time: 9.994659ms
    client_test.go:49: Number of services: 100
    client_test.go:50: Request time: 10.192487ms
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 4.494578ms
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 3.266552ms
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 3.105789ms
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 3.33691ms
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 3.260026ms
--- PASS: TestAPIClient_reusableConns (0.05s)
PASS
ok      github.com/hashicorp/consul/client_test 0.384s

Output consul/api from v0.8.5 with server v0.8.5

$ go test ./client_test/ -v -count 1
=== RUN   TestAPIClient_reusableConns
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:49: Nubmer of services: 100
    client_test.go:50: Request time: 4.261913ms
    client_test.go:49: Nubmer of services: 100
    client_test.go:50: Request time: 4.370877ms
    client_test.go:49: Nubmer of services: 100
    client_test.go:50: Request time: 5.019223ms
    client_test.go:49: Nubmer of services: 100
    client_test.go:50: Request time: 4.301784ms
    client_test.go:49: Nubmer of services: 100
    client_test.go:50: Request time: 4.88288ms
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 2.380345ms
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 2.111134ms
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 2.150876ms
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 2.486228ms
2020/12/08 11:35:11 [INFO] Dialing tcp 127.0.0.1:8500
    client_test.go:61: Number of services: 100
    client_test.go:62: Request time: 1.968497ms
    client_test.go:66: 
            Error Trace:    client_test.go:66
            Error:          Not equal: 
                            expected: 10
                            actual  : 5
            Test:           TestAPIClient_reusableConns
            Messages:       unexpected # of connections reused
--- FAIL: TestAPIClient_reusableConns (0.03s)
FAIL
FAIL    github.com/hashicorp/consul/client_test 0.159s
FAIL

I tested this against consul/api v1.4.0 running go v1.14.6, and compared the relevant changes to Consul version v0.8.5. There were no major changes between v0.8.5 and the module v1.4.0 API client that were related to the reading of the body.

A difference however are the versions of Go to build from which includes the standard encoding/json library. A trailing \n may be a culprit where the JSON decoding of the response stops processing the reader and leaves some bytes causing the TLS connection to close and not be reused. (sourced from bradfitz https://github.com/golang/go/issues/20528#issuecomment-309170928 and Consul source code)

The JSON decoding library may have changed since then, I didn't dive deep into this, but it might account for the \n now and reading the whole response body.

Please do share if you're observing that TLS connections are not being reused with newer versions of Consul.