Closed inlined closed 7 years ago
Continuing from https://github.com/GoogleCloudPlatform/google-cloud-node/pull/2255#issuecomment-299329661 (cc @s2young)
I'm not able to get the process mentioned in this issue to fail. Does anyone have a fool proof way I can get this script or another to break? I've been uploading 24 images at a time and the thumbnails are all generated successfully.
My app is doing quite a bit more than the example, and I'm stripping some of this out to see if I can avoid the problem. But all is just standard Storage api stuff:
The application is a mobile app, with a content management piece that allows users to upload photos when building a layout. Images are resized for display on Tablets, Large/Small phones, etc. @stephenplusplus
One more caveat @stephenplusplus: If you follow the pattern I've given, make sure you ignore the upload of the copies. In my case, when I upload a resized copy I am adding some metadata to the image so that the imageListener function can ignore those storage events. I got myself into an infinite loop early on!
Haha, thanks for the tip! I was able to reproduce the error (ECONNRESET and ESOCKETTIMEDOUT) but it was only when I was deleting the file before it was done being used. I'm not sure why that triggered those errors, but that's the only thing I can pin-point as the cause. I'll continue to play around.
Here's my script-- https://gist.github.com/stephenplusplus/1ac55732adadc1562737cf903307a98d. It takes an upload and creates 10 thumbnails. I'm testing with 25 at a time, and I'm consistently ending up with the correct total of images in the bucket, 250.
My theory theory is this happens between periods of inactivity with the Cloud Function instance is throttled. I think there's a cached socket that dies--possibly from being too CPU starved to acknowledge any keep-alive fast enough. Try the same script now (hopefully your instance is still alive) or a 20m break after that attempt. Once you see it you should see the error be pretty fatal until a redeploy (which forces a reallocation of your Cloud Function instance)
Another update, in case it helps. I have another Firebase Function that is listening to the RealTime DB for changes and then deleting the related image from Google Cloud Storage. Basically, in my CMS when a user deletes an image he's deleting the data representation of it in Firebase DB. The listener then deletes the actual image from Storage.
I am experiencing the ECONNRESET error in this db listener function. The result is I have orphaned images that are attached to object ids that no longer exist in my db.
@inlined I ran the script successfully, waited 20 minutes, and it ran successfully again without errors.
Would anyone be able to give me a repo I can clone, deploy, and then steps I can follow to reproduce the error?
Hey all, I forked your gist and add a couple of things to hopefully help - though I didn't test this so be warned!
1) Added a step to the thumbnail sequence that writes a node to the real-time db referencing the image's path. 2) Added a firebase db listener that is triggered when you delete the node created in step 1. It then removes the file from Cloud Storage (or at least is supposed to!).
My recommendation is run the script, wait a few minutes, and then manually delete some of the nodes created in the real-time db and see if the ECONNRESET happens. I hope this helps @stephenplusplus !
https://gist.github.com/s2young/068f082fd30d17f0a1f2146d953106de
Hey @stephenplusplus I've added you as a collaborator to my 'app-server' project. You should be able to deploy this to your firebase project, and then locally run 'jasmine' command to trigger the /spec/dev/mage.spec.js test. In this test's current state, I'm seeing tons of ECONNRESETs in my Firebase Functions logs. I hope it helps.
Also, if it helps, I noticed a big jump in ECONNRESETs when I used a singleton instance of a google-cloud/storage object rather than one instance per function.
And yet another thing: I just updated my firebase-tools, and now the firebase.config().firebase call no longer works, so the code that names the bucket is broken. You'll need to tweak that.
Do you know how to programmatically grab the current-context projectId from either firebase or googe-cloud objects? It seems to me that if I'm authed on my local machine my sdk should be able to know the context it's running in. Or do I have to have a config file?? Thanks @stephenplusplus
I did some digging and found a post on StackOverflow from someone who ran into this (cc @DrPaulBrewer). He opened an issue on the official issue tracker, "cloud storage frequently throws ECONNRESET, read or write, from cloud functions environment." It seems the community has found a couple solutions:
forever: false
@inlined @DrPaulBrewer @hayanmind can you check your apps using the newest @google-cloud/storage
official module? Theoretically, these issues should be fixed, since we released the logic that retries on ECONNRESET errors yesterday. Also, any other information that could help us fix this once and for all would be very welcome.
As a side note, you should be able to set forever: false
without forking:
var gcs = require('@google-cloud/storage')(...)
gcs.interceptors.push({
request: function(reqOpts) {
reqOpts.forever = false
return reqOpts
}
})
@s2young thank you for that gist -- I set it up and ran it. I've been doing this for the last several hours, waiting anywhere from 3, 5, 10, 25, and 60 minutes between events and monitoring the logs. Still, everything is working consistently for me.
Do you know how to programmatically grab the current-context projectId from either firebase or googe-cloud objects?
The libraries that I help maintain are the @google-cloud/* ones, so in this case @google-cloud/storage
. With that library, if we're not given a projectId
argument explicitly, we will use the getProjectId()
function from google-auto-auth
internally:
var authClient = require('google-auto-auth')()
authClient.getProjectId(function(err, projectId) {})
That will pick it up from the GCLOUD_PROJECT
environment variable, the gcloud SDK, a service account JSON key file, and Google Cloud environments (App Engine, Compute Engine, Cloud Functions).
@s2young I'm doing exact same steps (except augmenting metadata) in my app, and it ate my brain for a week thinking why using async/await causes this problem. After changing back my code to promises I kept getting error again and then I've figured it wasn't my code but Cloud Functions'.
@stephenplusplus as I stated above, my code is doing exactly same things w. @s2young's and after seeing your reply, I've deleted node_modules
, npm install
, recompiled (typescript/webpack) and deployed my functions. Not getting errors so far.
I did get error again (after few tries within 5-7 minutes). So I'll try forever: false
solution now.
However I didn't see any version number change in google-cloud-node
installed from last time I was getting errors (yesterday), so is the change you made internal?
Update
After forever: false
applied, I didn't get any errors, will update here if anything happens.
However I didn't see any version number change in
google-cloud-node
installed from last time I was getting errors (yesterday), so is the change you made internal?
Yes, I added the retry solution in a dependency, retry-request, so new and re-installations of @google-cloud/storage
will pick up the change. The all-in-one package, google-cloud
will not have it until our next release.
Thanks for trying these solutions out. Please keep letting us know the results you get.
If everyone can confirm:
forever: false
is the best solution, andwe can automatically set forever
to false when we see that we're in the Cloud Functions environment.
So far, that is true for me. I haven't experienced any further ECONNRESETs since making the forever:false change. Thanks so much!
We have disabled the forever agent by default in Cloud Function environments. If you un- and re-install @google-cloud/storage, you will pick up the new behavior automatically. Thanks for all of the helpful debugging, everyone!
@stephenplusplus Thanks for your efforts. Other tasks here took priority. I updated my SO posting to reflect the above changes but have not had time to test.
I am seeing ECONNRESET in my Google Cloud Functions now. I am using @google-cloud/bigquery
: 1.3.0
, @google-cloud/storage
: 1.7.0
Unfortunately, even explicitly setting the reqOpts.forever = false
does not solve the issue for me. Any new things changed this behavior?
Could this be due to the fact that network and CPU are throttled down to 0 between function invocations, so the connection is lost on the new invocation?
I get this error 100% of the time when trying to upload files in parallel. Adding
storage.interceptors.push({
request(reqOpts: DecorateRequestOptions) {
reqOpts.forever = false;
return reqOpts;
},
});
did nothing to resolve the issue. Our use case is using the client library to upload files in parallel, a fairly straightforward use case.
edit by @stephenplusplus
If you're just joining the conversation...
Here is where we stand: We have disabled the forever agent by default in Cloud Function environments. If you un- and re-install @google-cloud/storage, you will pick up the new behavior automatically.
Original Post
Environment details
Steps to reproduce
Theory
I've noticed this in my own Cloud Functions and only in those that use google-cloud-node. I think the socket is dying between requests when the Cloud Functions environment is throttled. This should be OK, but I suspect the google-cloud-node module needs to add an extra error case here. I'll write a pull request to that effect.