googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.73k stars 1.28k forks source link

[storage] "stream error: stream ID x; INTERNAL_ERROR" #784

Closed JeanMertz closed 6 years ago

JeanMertz commented 6 years ago

We are reading tens of millions of objects from GCS and seem to be hitting an issue where an error "stream error: stream ID 4163; INTERNAL_ERROR" is returned after processing files for a while.

It's pretty hard to debug the issue, as it takes several hours before the issue occurs, but we've had the issue two consecutive times in a row now.

We are using version eaddaf6dd7ee35fd3c2420c8d27478db176b0485 of the storage package.

Here's the pseudo code of what we are doing:

cs, err := cloudstorage.NewClient(ctx)
// err...
defer cs.Close()
b := cs.Bucket(...)
q := &storage.Query{Prefix: ...}
it := b.Objects(ctx, q)
for {
    a, err := it.Next()
    if err == iterator.Done {
        break
    }

    handleObject(...)
}

We have retry logic built into the handleObject function, but even retrying doesn't help. Also, once the error shows up, it doesn't go away anymore, reading of all lines and files now return the same error.

We're thinking of building some retry logic around the client itself, closing it and opening a new one to see if that works, and we're still digging deeper, but I wanted to report this nonetheless, in case anyone else has also run into this.

jba commented 6 years ago

It's unlikely to be an error in this client. I've reported it to other teams internally.

jba commented 6 years ago

What version of Go are you using?

Also, what commit is your google.golang.org/api/storage/v1 package at?

JeanMertz commented 6 years ago

google.golang.org/api/storage/v1 at 0aaeb37e5bf8ee5d75973b162ffd0f9f6cfdd91d

$ go version
go version go1.9 darwin/amd64

We've seen it happen another couple of times, no real insights yet, other than it does keep happening.

jba commented 6 years ago

I filed a bug against Go. The link should be above this comment.

JeanMertz commented 6 years ago

Thanks. I've since rebuilt the service that gave this issue, but am still seeing it occur.

One thing I noticed, the files we are fetching are gzipped, so we do pass the ObjectHandle.NewReader through gzip.NewReader. Not sure if this is relevant, but the code works 99% of the time as explained before (and it doesn't fail on the same file every time), so I can't see anything we're doing wrong here.

anthmgoogle commented 6 years ago

This issue seems to have been clarified as a golang/go issue. Suggest continue any required discussion there.

Bankq commented 6 years ago

@JeanMertz have you been able to work around this? I've seen the same issue quite often. Thanks!

dansiemon commented 6 years ago

I also see this frequently. I have a job that pulls a bunch of 250MB files from cloud storage for processing and 1/4 runs die with this error.

Latest gcloud lib rev.

Bankq commented 6 years ago

@dansiemon FWIW I don't see this error anymore when I retried the operation. By retry I meant create a new storage.Object

maddyblue commented 6 years ago

Also seeing this when pulling lots of files from GCS (https://github.com/cockroachdb/cockroach/issues/20917). It's medium difficult to retry because it occurs somewhere in the middle of reading a file, which means that any function that needs to be able to retry has to now become idempotent. My stuff was built to stream data to various processing stages, making retrying difficult. Furthermore, the files I'm reading can sometimes be > 1GB, causing retrying to be even more difficult since that's a lot of work or data to buffer just in case we have to reread due to this bug.

@anthmgoogle As bradfitz said (https://github.com/golang/go/issues/22235#issuecomment-346725087) this is the http2 package just passing on an error it saw, it is not the thing producing the error, so I don't think it's correct to close this bug as being an issue in another project. Could you reopen it and look into this again?

jseeley78 commented 6 years ago

I too am hitting this error when pulling down files from gcloud storage and have 3 auto retires (with an exponential backoff). Was anyone able to find anything that helped?

bradfitz commented 6 years ago

@jba, related to a discussion I had with @ mikeyu83 the other day, the Go Cloud Storage library should probably try to cut large/failed (or just failed) transfers up into multiple HTTP requests, stitched together an io.Reader for the user concatenated from responses from multiple HTTP Range requests.

joe94 commented 6 years ago

We're also seeing this in our project (https://github.com/Comcast/tr1d1um/pull/58/files#diff-111de0df2ea714c4ceaac0c8defba0cfR86) <-(running this PR locally). We are building with go 1.9.3.

Specifically, client.Do() returns some error like the following Post https://[server-path]:443/api/v2: stream error: stream ID 1; INTERNAL_ERROR

Interestingly, we are not sending large chunks of data (like seen in some previous comments) but we're still seeing this issue. We are simply sending some short, well-formatted json payload.

Disabling http/2 on the server side seems to "solve" the issue but I am not sure if that is optimal: https://github.com/Comcast/webpa-common/commit/41dd674d55a87364c6f0693f61f079fb934b6a63

We thought http/2 should transparently work in the standard net/http package but that does not seem to be the case. However, it could be the case we are misusing it in which case we would appreciate some help.

joe94 commented 6 years ago

Just an update to my post above: After further investigation, we found that it is very likely we were seeing these issues due to overly aggressive http.Server timeouts such as writeTimeout.

jba commented 6 years ago

I've had no luck reproducing this. I'm using go 1.10 and the storage client at head (34015874fda3aebe9b4b1dc9d9f3e794ecb6a005). I've got 10,000 goroutines reading fifty 250M files (each file read my many goroutines) and I don't see any errors.

I still plan to do what Brad suggested above, but I sure would like to be able to reproduce this so I'm not flying blind.

maddyblue commented 6 years ago

You may need larger files? This reproduced for us again last night. We have a test that reads some large (12G) files from GCS (not in parallel with any other reads, only one go routine reading) and we got this error again.

jba commented 6 years ago

The commit mentioned above will try to read the remaining content when it sees INTERNAL_ERROR. Since I can't test it IRL, I'm hoping this thread's participants will try it out.

maddyblue commented 6 years ago

We've pulled in that commit to cockroach. We run some large nightly tests that were failing a few times per week from this bug. I'll report back new results in a while.

apimation commented 6 years ago

Hopefully this helps. We solved this on the server side by changing ReadTimeout property on http.Server{}

rayrutjes commented 6 years ago

👋 @mjibson , did the change fix the issue?

rayrutjes commented 6 years ago

BTW, the fix suggests a change only on the reading side. In our case we see similar issues from time to time on the writing side: Post https://www.googleapis.com/upload/storage/v1/b/***/o?alt=json&projection=full&uploadType=multipart: stream error: stream ID 17; INTERNAL_ERROR

Is this somehow related or should I open a new issue?

maddyblue commented 6 years ago

We have not seen this error again since the patch (2 weeks). But it occurred rarely enough that I'm still going to give it another 2 weeks before being convinced.

maddyblue commented 6 years ago

We haven't seen this error anymore when using the referenced commit. But we continue to see it on code that doesn't have this commit. I think it's safe to mark this issue as closed.

jba commented 6 years ago

Thanks for the update, @mjibson. Closing this (woot).

@rayrutjes, if you're still having problems writing, please open another issue.

ghost commented 2 years ago

I see this issues during downloading files from ipfs web client (ipfs.io) with go http client. Ipfs is a p2p software written in go. download speed will change frequently because It's a p2p network. I think that's a cause of this problem

garan82 commented 1 year ago

We got a lot of this errors with Go SDK 1.20/1.20.1, and did not have this with Go 1.95. Anyone experiencing the same issue?

Festlandtommy commented 1 year ago

We are occasionally getting this error with Go 1.20. We are reading binary audio (<10MB/req) using http.Server and ran into this error a couple of times.

Edit: Downgrade to Go 1.19 did not fix the issue, also bumping the http.Server.ReadTimeout had no effect.