googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.73k stars 1.28k forks source link

storage: Not retrying "http2: client connection lost" error #7905

Open c-nuro opened 1 year ago

c-nuro commented 1 year ago

Client

storage

Issue Error of "http2: client connection lost" is not retried by default, but this is a transient network error. https://github.com/googleapis/google-cloud-go/blob/storage/v1.30.1/storage/invoke.go#L101

cojenco commented 1 year ago

Hi @c-nuro, thanks for your report.

Could you share a little bit more information on what you were trying to do and how you encountered this error? It would be helpful to understand and record what is the use case.

tritone commented 1 year ago

The error seems to come from here: https://github.com/golang/go/blob/master/src/net/http/h2_bundle.go#L8154 . There is no exported error type so we can't handle it programmatically aside from string matching.

@neild could you clarify if it is expected for this error to be surfaced outside of net/http, and if so how we should handle it?

@c-nuro please let us know what library method you were calling when you encountered this error, and any details about under what circumstances or how often it arises. Also, please include details of the Go language version, storage library version, and environment in which you are running.

c-nuro commented 1 year ago

I'm calling get blob content API. go version is 1.18.4 storage version 1.24.0 (we upgraded to later version together with our own shouldRetry handler, so not observed in later version) Execute in GKE environment.

It happens occasionally in our workload. Seems there's some correlation when the CPU of the node is saturated but not sure. But I think the error can be expected from network perspective and should be retried.

neild commented 1 year ago

My recommendation would be to treat any error as potentially retriable. There are a number of things that can go wrong on the network when sending an HTTP request, and trying to divide them into transient and permanent faults is somewhere between difficult and impossible.

This specific error is caused by the client sending a PING frame to the server and not receiving a response within the expected amount of time. I don't see a reason to treat this differently than any other network fault.

I would be interested to know examples of errors that can be returned by Transport.RoundTrip that should not be retried; I can't think of any, but perhaps I'm missing something.

zsdyx commented 10 months ago

Why does this happen, even in the case of retries.Is there a solution now

import (
    "context"
    "sync"
    "time"

    "cloud.google.com/go/storage"
    "github.com/googleapis/gax-go/v2"
)

var (
    once          sync.Once
    storageClient *storage.Client
    err           error
)

// GetStorageClient get gcs storage client
func GetStorageClient() (*storage.Client, error) {
    once.Do(func() {
        storageClient, err = storage.NewClient(context.Background())
        storageClient.SetRetry(storage.WithBackoff(gax.Backoff{
            Initial:    1 * time.Second,
            Max:        15 * time.Second,
            Multiplier: 3,
        }))
    })
    return storageClient, err
}
edhaight commented 10 months ago

@zsdyx, I believe you need to configure the retry handler to include handling this additional error. This can be configured in a few ways through the client, bucket, or object level. Paraphrasing here, but I think this is the idea:

shouldRetry := func(err error) bool {
  switch {
  case err == nil:
    return false
  case err.Error() == "http2: client connection lost":
    return true
  default:
    return storage.ShouldRetry(err)
  }
}

ctx := context.Background()
client, err := storage.NewClient(ctx)
if err != nil {
    return fmt.Errorf("storage.NewClient: %w", err)
}
defer client.Close()
// Configure what is retried for all operations using this Client.
client.SetRetry(storage.WithErrorFunc(shouldRetry))
tritone commented 10 months ago

+1 to the answer from @edhaight ; use WithErrorFunc to customize retryable errors.

shivraj001 commented 8 months ago

@tritone We are facing the same in big query with continuous logs of http2: client connection lost while using golang client. In our scenario we are using a scheduler service which queries big query to fetch data from the dataset.