googleapis / nodejs-pubsub

Node.js client for Google Cloud Pub/Sub: Ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics.
https://cloud.google.com/pubsub/
Apache License 2.0
517 stars 228 forks source link

DEADLINE_EXCEEDED makes application not receiving messages at all #770

Closed mahaben closed 4 years ago

mahaben commented 4 years ago

Environment details

Node.js version: v12.7.0 npm version: 6.10.0 @google-cloud/pubsub version: "^1.0.0",

Error:

insertId: "gnr3q1fz7eerd" jsonPayload: { level: "error" message: "unhandledRejection" originalError: { ackIds: [1] code: 4 details: "Deadline exceeded" } }

After receiving this error, the app does not receive messages anymore and we have to exit the application to recreate the kubernetes pod.

Any help would be appreciated!

bcoe commented 4 years ago

Hey @mahaben did this issue recently start happening?

mahaben commented 4 years ago

Hey @bcoe, first time after upgrading googlecloud/pubsub to ^1.0.0. Any workaround to recreate subsciption after this error?

hx-markterry commented 4 years ago

We used to see these error messages, we now see these errors in all our projects that use it:

Error: Failed to connect to channel. Reason: Failed to connect before the deadline
    at MessageStream._waitForClientReady (/src/node_modules/@google-cloud/pubsub/build/src/message-stream.js:318:19)
WaldoJeffers commented 4 years ago

I can confirm this after upgrading to PubSub ^1.0.0, all our services stop sending pubsubs after the error occurs.

The full stacktrace is

Retry total timeout exceeded before any response was received Error: Retry total timeout exceeded before any response was received
    at repeat (/app/node_modules/@google-cloud/pubsub/node_modules/google-gax/build/src/normalCalls/retries.js:80:31)
    at Timeout.setTimeout [as _onTimeout] (/app/node_modules/@google-cloud/pubsub/node_modules/google-gax/build/src/normalCalls/retries.js:113:25)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10) 

Can I suggest raising the priority on this issue?

maxmoeschinger commented 4 years ago

Non of our services using pubsub is working anymore either. We are using version 1.1.0 Getting this:

Error: Retry total timeout exceeded before any response was received
    at repeat (/var/www/app/node_modules/google-gax/build/src/normalCalls/retries.js:80:31)
    at Timeout.setTimeout [as _onTimeout] (/var/www/app/node_modules/google-gax/build/src/normalCalls/retries.js:113:25)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10)

And this:

Error: Failed to connect to channel. Reason: Failed to connect before the deadline
  File "/var/www/app/node_modules/@google-cloud/pubsub/build/src/message-stream.js", line 318, col 19, in MessageStream._waitForClientReady
    throw new ChannelError(e);

We have to restart our services every 10 minutes because of that.

It also seems til it is storing more and more to disk as disk usage goes up over time.

ddehghan commented 4 years ago

We also are hitting this. It happens after an hour or two and our publishing stops completely.

My only suspicion was that since we created and cached the topic in our constructor, the topic was timing out. we changed our implementation to to call publish like this:

pubsub.topic('xx').publish()

Now I running some tests to see if that was it or not. If not I am out of ideas since our code matches the examples in this repo.

Our platform is node 12 on alpine on GKE.

pworkpop commented 4 years ago

Seeing the same error Error: Failed to connect to channel. Reason: Failed to connect before the deadline at MessageStream._waitForClientReady with @google-cloud/pubsub 0.31.1, same outcome application cannot receive messages. Does subscription.close().then(() => subscription.open()); help in this case?

Tolgor commented 4 years ago

Same error here with "@google-cloud/storage": "^3.3.1".

Having

const storage = new Storage({
  projectId: config.googleCloud.projectId
});
const bucket = storage.bucket(config.googleCloud.storage.publicBucketName);

and leaving the nodejs process running, raises the error from time to time.

{ Error: Retry total timeout exceeded before any response was received
    at repeat (/home/deploy/app/node_modules/google-gax/build/src/normalCalls/retries.js:80:31)
    at Timeout.setTimeout [as _onTimeout] (/home/deploy/app/node_modules/google-gax/build/src/normalCalls/retries.js:113:25)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10) code: 4 }
ddehghan commented 4 years ago

Nope. That didn't work. ;-( still getting this.

GoogleError: Retry total timeout exceeded before any response was received
    at repeat (/var/www/app/node_modules/google-gax/src/normalCalls/retries.ts:98:23)
    at Timeout._onTimeout (/var/www/app/node_modules/google-gax/src/normalCalls/retries.ts:140:13)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7) {
  code: 4
}
pworkpop commented 4 years ago

Getting similar total timeout exceeded before any response was received errors with subscription.close().then(() => subscription.get()); What is the best approach, should we retry the operation ourselves until it goes through or better tweak the default GAX retry options? (https://googleapis.github.io/gax-nodejs/interfaces/BackoffSettings.html)

To me it seems google pubsub servers have a bug or degraded to a point that it makes them not respond within the expected deadlines.

maxmoeschinger commented 4 years ago

I have now downgraded to @google-cloud/pubsub version 0.31.0 and added this to my package.json:

"resolutions": {
        "google-gax": "1.3.0"
}

Seems like things are working for longer than 10 minutes now.

FabianHutin commented 4 years ago

Hello, We hit the same problem here since 10/02. We tried upgrading to 0.32.1, and even to 1.1.0. Didn't solve a thing. We are running in App Engine, so when one of the instances starts hitting the error, it snowballs and errors flow like crazy until the instance gets killed and another instance starts. Then, errors stop flowing for a bit.

Tolgor commented 4 years ago

Since https://github.com/grpc/grpc-node/issues/1064#issuecomment-539432287

Using

"resolutions": {
    "@grpc/grpc-js": "^0.6.6"
}

as a temporary fix, works for me.

callmehiphop commented 4 years ago

I'm putting this issue at the top of my list. Would anyone be able to re-test with the latest version of gRPC? A release (v0.6.6) went out yesterday and it may or may not have a fix for this. All that should be needed is to delete any lock files you might have and re-install the PubSub client with the same version you currently have pinned.

gae123 commented 4 years ago

I believe we are hitting this one as well. After the application runs fine for several hours, we get the following logged from the subscription error handler. New messages that usually arrive once a minute have stopped arriving 5 minutes earlier.

image

wondering if this issue is related to this @bcoe are you thinking the same?

Here are some environment details:

GKE: 1.14.3-gke.11 nodejs: FROM node:10.14-alpine

# yarn list | grep google
├─ @axelspringer/graphql-google-pubsub@1.2.1
│  ├─ @google-cloud/projectify@0.3.3
│  ├─ @google-cloud/pubsub@^0.28.1
│  ├─ @google-cloud/pubsub@0.28.1
│  │  ├─ @google-cloud/paginator@^0.2.0
│  │  ├─ @google-cloud/precise-date@^0.1.0
│  │  ├─ @google-cloud/projectify@^0.3.0
│  │  ├─ @google-cloud/promisify@^0.4.0
│  │  ├─ google-auth-library@^3.0.0
│  │  ├─ google-gax@^0.25.0
│  ├─ google-auth-library@3.1.2
│  ├─ google-gax@0.25.6
│  │  ├─ google-auth-library@^3.0.0
│  │  ├─ google-proto-files@^0.20.0
│  ├─ google-proto-files@0.20.0
│  │  ├─ @google-cloud/promisify@^0.4.0
├─ @google-cloud/common-grpc@1.0.5
│  ├─ @google-cloud/common@^2.0.0
│  ├─ @google-cloud/common@2.2.2
│  │  ├─ @google-cloud/projectify@^1.0.0
│  │  ├─ @google-cloud/promisify@^1.0.0
│  │  ├─ google-auth-library@^5.0.0
│  ├─ @google-cloud/projectify@^1.0.0
│  ├─ @google-cloud/promisify@^1.0.0
│  ├─ @google-cloud/promisify@1.0.2
├─ @google-cloud/common@0.32.1
│  ├─ @google-cloud/projectify@^0.3.3
│  ├─ @google-cloud/projectify@0.3.3
│  ├─ @google-cloud/promisify@^0.4.0
│  ├─ google-auth-library@^3.1.1
│  ├─ google-auth-library@3.1.2
├─ @google-cloud/iot@1.1.3
│  └─ google-gax@^1.0.0
├─ @google-cloud/kms@0.1.0
│  ├─ google-auth-library@1.6.1
│  ├─ google-gax@^0.17.1
│  ├─ google-gax@0.17.1
│  │  ├─ google-auth-library@^1.6.1
│  │  ├─ google-proto-files@^0.16.0
├─ @google-cloud/logging-winston@2.1.0
│  ├─ @google-cloud/logging@^5.3.1
│  ├─ google-auth-library@^5.2.2
├─ @google-cloud/logging@5.3.1
│  ├─ @google-cloud/common-grpc@^1.0.5
│  ├─ @google-cloud/paginator@^2.0.0
│  ├─ @google-cloud/paginator@2.0.1
│  ├─ @google-cloud/projectify@^1.0.0
│  ├─ @google-cloud/promisify@^1.0.0
│  ├─ @google-cloud/promisify@1.0.2
│  ├─ google-auth-library@^5.2.2
│  ├─ google-gax@^1.0.0
├─ @google-cloud/paginator@0.2.0
├─ @google-cloud/precise-date@0.1.0
├─ @google-cloud/projectify@1.0.1
├─ @google-cloud/promisify@0.4.0
├─ @google-cloud/pubsub@0.31.0
│  ├─ @google-cloud/paginator@^2.0.0
│  ├─ @google-cloud/paginator@2.0.1
│  ├─ @google-cloud/precise-date@^1.0.0
│  ├─ @google-cloud/precise-date@1.0.1
│  ├─ @google-cloud/projectify@^1.0.0
│  ├─ @google-cloud/promisify@^1.0.0
│  ├─ @google-cloud/promisify@1.0.2
│  ├─ google-auth-library@^5.0.0
│  ├─ google-gax@^1.0.0
├─ @google-cloud/storage@2.5.0
│  ├─ @google-cloud/common@^0.32.0
│  ├─ @google-cloud/paginator@^0.2.0
│  ├─ @google-cloud/promisify@^0.4.0
├─ @google/maps@0.5.5
│  ├─ @google-cloud/logging-winston@2.1.0
│  ├─ @google-cloud/logging@5.3.1
│  ├─ @google-cloud/iot@1.1.3
│  ├─ @google-cloud/kms@0.1.0
│  ├─ @google-cloud/pubsub@0.31.0
│  ├─ @google-cloud/storage@2.5.0
│  ├─ @google/maps@0.5.5
│  ├─ @types/google__maps@0.5.2
├─ @types/google__maps@0.5.2
│  ├─ google-libphonenumber@^3.1.6
│  ├─ google-auth-library@^3.0.0
│  ├─ google-auth-library@3.1.2
├─ google-auth-library@5.3.0
│  ├─ google-p12-pem@2.0.2
│  │  ├─ google-p12-pem@^2.0.0
├─ google-gax@1.6.4
│  ├─ google-auth-library@^5.0.0
├─ google-libphonenumber@3.2.5
├─ google-p12-pem@1.0.4
├─ google-proto-files@0.16.1
│  ├─ google-p12-pem@^1.0.0
├─ passport-google-oauth@1.0.0
│  ├─ passport-google-oauth1@1.x.x
│  └─ passport-google-oauth20@1.x.x
├─ passport-google-oauth1@1.0.0
├─ passport-google-oauth20@1.0.0
bcoe commented 4 years ago

@gae123 mind adding grpc to that grep? The specific dependency having issues is a sub-dependency of pubsub.

One thing that is jumping out at me immediately though, is that you're not running pubsub@1.0.0? So it would appear you're actually having issues with the < 1.0.0 version of PubSub?

bcoe commented 4 years ago

@gae123 I would, if possible, suggest trying out PubSub@1.0.0 as (outside of the rough week of hot fixes we've had) we've been working hard to improve stability.

@mahaben are you able to ry out 0.6.6 of grpc-js as well, sounds like this fix might be on the right track.

gae123 commented 4 years ago

@gae123 mind adding grpc to that grep? The specific dependency having issues is a sub-dependency of pubsub.

@bcoe I have modified the original post to add the information you asked for

bcoe commented 4 years ago

@mahaben closing this for now, as we believe it is fixed with the latest version of PubSub we've released.

@gae123 could I bother you to open a new issue. The dependency graph you're using is using PubSub in a variety of places, as a deep dependency, but none of the versions linked are up-to-date. I believe you are running into different issues related to older versions of the grpc library.

mahaben commented 4 years ago

@bcoe @callmehiphop I don't think this issue should be closed. It still doesn't work after upgrading to "@google-cloud/pubsub": "^1.1.1"

callmehiphop commented 4 years ago

@mahaben have you tried running against the latest version (v1.1.1)?

mahaben commented 4 years ago

@callmehiphop yes I tried with (v1.1.1) after deleting lock file..

callmehiphop commented 4 years ago

@mahaben well that's no good, could you run npm ls and give me a print out of your dependency tree?

mahaben commented 4 years ago
├─┬ @google-cloud/pubsub@1.1.1
│ ├─┬ @google-cloud/paginator@2.0.1
│ │ ├── arrify@2.0.1 deduped
│ │ └── extend@3.0.2 deduped
│ ├── @google-cloud/precise-date@1.0.1
│ ├── @google-cloud/projectify@1.0.1
│ ├── @google-cloud/promisify@1.0.2
│ ├─┬ @grpc/grpc-js@0.6.7
│ │ └── semver@6.3.0
│ ├── @sindresorhus/is@1.2.0
│ ├─┬ @types/duplexify@3.6.0
│ │ └── @types/node@12.7.12
│ ├── @types/long@4.0.0
│ ├── arrify@2.0.1
│ ├── async-each@1.0.3
│ ├── extend@3.0.2
│ ├─┬ google-auth-library@5.4.0
│ │ ├── arrify@2.0.1 deduped
│ │ ├── base64-js@1.3.1
│ │ ├── fast-text-encoding@1.0.0
│ │ ├─┬ gaxios@2.0.1
│ │ │ ├── abort-controller@3.0.0 deduped
│ │ │ ├── extend@3.0.2 deduped
│ │ │ ├─┬ https-proxy-agent@2.2.2
│ │ │ │ ├─┬ agent-base@4.3.0
│ │ │ │ │ └─┬ es6-promisify@5.0.0
│ │ │ │ │   └── es6-promise@4.2.8
│ │ │ │ └─┬ debug@3.2.6
│ │ │ │   └── ms@2.1.2 deduped
│ │ │ └── node-fetch@2.6.0 deduped
│ │ ├─┬ gcp-metadata@3.1.0
│ │ │ ├── gaxios@2.0.1 deduped
│ │ │ └─┬ json-bigint@0.3.0
│ │ │   └── bignumber.js@7.2.1
│ │ ├─┬ gtoken@4.1.0
│ │ │ ├── gaxios@2.0.1 deduped
│ │ │ ├─┬ google-p12-pem@2.0.2
│ │ │ │ └── node-forge@0.9.1
│ │ │ ├── jws@3.2.2 deduped
│ │ │ └── mime@2.4.4
│ │ ├─┬ jws@3.2.2
│ │ │ ├─┬ jwa@1.4.1
│ │ │ │ ├── buffer-equal-constant-time@1.0.1
│ │ │ │ ├─┬ ecdsa-sig-formatter@1.0.11
│ │ │ │ │ └── safe-buffer@5.2.0 deduped
│ │ │ │ └── safe-buffer@5.2.0 deduped
│ │ │ └── safe-buffer@5.2.0
│ │ └─┬ lru-cache@5.1.1
│ │   └── yallist@3.1.1
│ ├─┬ google-gax@1.7.0
│ │ ├─┬ @grpc/grpc-js@0.6.6
│ │ │ └── semver@6.3.0 deduped
│ │ ├─┬ @grpc/proto-loader@0.5.2
│ │ │ ├── lodash.camelcase@4.3.0
│ │ │ └── protobufjs@6.8.8 deduped
│ │ ├─┬ abort-controller@3.0.0
│ │ │ └── event-target-shim@5.0.1
│ │ ├─┬ duplexify@3.7.1
│ │ │ ├─┬ end-of-stream@1.4.4
│ │ │ │ └─┬ once@1.4.0
│ │ │ │   └── wrappy@1.0.2
│ │ │ ├── inherits@2.0.4
│ │ │ ├─┬ readable-stream@2.3.6
│ │ │ │ ├── core-util-is@1.0.2
│ │ │ │ ├── inherits@2.0.4 deduped
│ │ │ │ ├── isarray@1.0.0
│ │ │ │ ├── process-nextick-args@2.0.1
│ │ │ │ ├── safe-buffer@5.1.2
│ │ │ │ ├─┬ string_decoder@1.1.1
│ │ │ │ │ └── safe-buffer@5.1.2
│ │ │ │ └── util-deprecate@1.0.2
│ │ │ └── stream-shift@1.0.0
│ │ ├── google-auth-library@5.4.0 deduped
│ │ ├── is-stream-ended@0.1.4 deduped
│ │ ├── lodash.at@4.6.0
│ │ ├── lodash.has@4.5.2
│ │ ├── node-fetch@2.6.0
│ │ ├── protobufjs@6.8.8 deduped
│ │ ├─┬ retry-request@4.1.1
│ │ │ ├─┬ debug@4.1.1
│ │ │ │ └── ms@2.1.2
│ │ │ └─┬ through2@3.0.1
│ │ │   └── readable-stream@2.3.6 deduped
│ │ ├── semver@6.3.0 deduped
│ │ └── walkdir@0.4.1
│ ├── is-stream-ended@0.1.4
│ ├── lodash.snakecase@4.1.1
│ ├── p-defer@3.0.0
│ └─┬ protobufjs@6.8.8
│   ├── @protobufjs/aspromise@1.1.2
│   ├── @protobufjs/base64@1.1.2
│   ├── @protobufjs/codegen@2.0.4
│   ├── @protobufjs/eventemitter@1.1.0
│   ├─┬ @protobufjs/fetch@1.1.0
│   │ ├── @protobufjs/aspromise@1.1.2 deduped
│   │ └── @protobufjs/inquire@1.1.0 deduped
│   ├── @protobufjs/float@1.0.2
│   ├── @protobufjs/inquire@1.1.0
│   ├── @protobufjs/path@1.1.2
│   ├── @protobufjs/pool@1.1.0
│   ├── @protobufjs/utf8@1.1.0
│   ├── @types/long@4.0.0 deduped
│   ├── @types/node@10.14.21
│   └── long@4.0.0
├── delay@4.3.0
└─┬ shortid@2.2.15
  └── nanoid@2.1.2
callmehiphop commented 4 years ago

@mahaben thanks! Are you still experiencing the same exact issue you mentioned in the issue overview? After re-reading the thread, I have reason to believe that the issue you are experiencing might be different than what the other users here are seeing.

anguillanneuf commented 4 years ago

@callmehiphop An earlier suggestion in this thread to use the latest @grpc/grpc-js (v0.6.6) didn't quite work for some customers. They reported the same error with high volume of incoming messages unless they restarted the affected instances.

Error: Retry total timeout exceeded before any response was received

      1. at repeat (
      /app/node_modules/google-gax/build/src/normalCalls/retries.js:80)
      2. at Timeout._onTimeout (
      /app/node_modules/google-gax/build/src/normalCalls/retries.js:113)

I see a new release of @grpc/grpc-js (v0.6.7) but is that where the issue is?

callmehiphop commented 4 years ago

@anguillanneuf that is a good question, I'm still trying to play catch up from last week. @alexander-fenster @bcoe do either of you have any insight here? I didn't look very deeply into this yet but it seems like either grpc isn't the problem or 0.6.6 did not contain a fix for this particular issue.

MichaelMarkieta commented 4 years ago

I am able to replicate this issue by deploying a GAE Standard service that pulls messages from PubSub at a rate of about 40/s. Instances need to be redeployed.

gae123 commented 4 years ago

Using the following did not work for us. The issue came back after several hours of starting a deployment like before.

"resolutions": {
    "@grpc/grpc-js": "^0.6.6"
}
callmehiphop commented 4 years ago

I'm attempting to work on a reproduction case right now, but if anyone could enable gRPC logging and supply us with the output that might prove helpful.

Just need to enable the following environment variables

GRPC_TRACE=all 
GRPC_VERBOSITY=DEBUG
pworkpop commented 4 years ago

Someone suggested in https://github.com/grpc/grpc-node/issues/1064#issuecomment-538512609 downgrading @grpc/grpc-js to 0.5.4 so we are pinning google-gax to 1.6.2 until a working solution is provided

"resolutions": {
  "google-gax": "1.6.2"
}
MichaelMarkieta commented 4 years ago

I'm attempting to work on a reproduction case right now, but if anyone could enable gRPC logging and supply us with the output that might prove helpful.

Just need to enable the following environment variables

GRPC_TRACE=all 
GRPC_VERBOSITY=DEBUG

@callmehiphop can these be simply added to the GAE Standard environmental variables? I've tried that but don't see any additional logs in stackdriver.

callmehiphop commented 4 years ago

@MichaelMarkieta I believe so. Can you confirm if you have @grpc/grpc-js >= 0.6.5 installed? I think these are a pretty new addition.

MichaelMarkieta commented 4 years ago
$ npm ls --depth=1
...
├─┬ @google-cloud/pubsub@1.1.1
│ ├── @google-cloud/paginator@2.0.1
│ ├── @google-cloud/precise-date@1.0.1
│ ├── @google-cloud/projectify@1.0.1
│ ├── @google-cloud/promisify@1.0.2
│ ├── @grpc/grpc-js@0.6.7
│ ├── @sindresorhus/is@1.2.0
│ ├── @types/duplexify@3.6.0
│ ├── @types/long@4.0.0
│ ├── arrify@2.0.1
│ ├── async-each@1.0.3
│ ├── extend@3.0.2
│ ├── google-auth-library@5.4.0
│ ├── google-gax@1.7.1
│ ├── is-stream-ended@0.1.4
│ ├── lodash.snakecase@4.1.1
│ ├── p-defer@3.0.0
│ └── protobufjs@6.8.8
...
MichaelMarkieta commented 4 years ago
$ cat app.yaml
runtime: nodejs10
instance_class: F4
env_variables:
  GRPC_TRACE: ALL
  GRPC_VERBOSITY: DEBUG
callmehiphop commented 4 years ago

@MichaelMarkieta I think it works, although it didn't show up in the stack driver UI for me, I did have success using the gcloud tool with gcloud app logs tail -s default

FabianHutin commented 4 years ago

Hello, Just a followup. We upgraded to pubsub 1.1.1 and we don't have errors anymore. However, latency to push messages to pub sub is incredibly high. (between 5s and 50s)

npomfret commented 4 years ago

@google-cloud/pubsub@1.1.1 definitely not working for me. I regularly see {"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}} and then my entire app stops working. The subscribers stop processing messages.

Is there an agreed on workaround in any of the above?

[EDIT]. I'm also seeing this error from time to time

{"ackIds":["KlgRTgQhIT4wPkVTRFAGFixdRkhRNxkIaFEOT14jPzUgKEUSASBuFSFCXhliaFxcdQdQC00geTQnYltFVQhCUnRfcysvV1tbdAVRDR56e2Z0aF8XCSr75KDd7KSXWUZgTbTgwcVHXbKv4JoiZh49WxJLLD5-MDxFQV5AEkw7CURJUytDCw"],"code":4,"details":"Deadline exceeded","metadata":{"internalRepr":{},"options":{}}}

Don't know if it's related though.

Errors started happening at 2019-10-09 17:35:28.252 BST (Wed Oct 09 16:35:28 UTC)

paksu commented 4 years ago

I can confirm that we still get "Failed to connect before the deadline" with version 1.1.1 and latest version of google-gax and grpc-js

npomfret commented 4 years ago

Downgrading to 0.31.0 hasn't worked for me.

Also, downgrading to 0.31.0 AND adding "resolutions": {"google-gax": "1.6.2"} hasn't worked for me.

...and downgrading to 0.31.0 AND adding "resolutions": "{"google-gax": "1.6.2", "@grpc/grpc-js": "0.5.2"}} hasn't worked for me.

[edit] anyone got any sort of fix? I've tried all I can think of including reverting back to before I got the error for the first time but I cannot get my app working.

MatthieuLemoine commented 4 years ago

Downgrading to 0.29.1 should do the trick @npomfret

This is the last working version

npomfret commented 4 years ago

@MatthieuLemoine what can I say, I tried and it also doesn't work. I still get the errors and no messages are being sent or delivered.

I've downgraded, deleted my lock files and reinstalled, deleted _nodemodules and reinstalled. I've even deleted and recreated my subscriptions. Nothing works. My app has been broken for a day almost and I can't get it working.

MichaelMarkieta commented 4 years ago

I haven't changed anything since last night (eastern time zone), and the last time I received an error in GAE was 10/10/19 2:32 AM EST. Not sure what's changed... Instance counts across the service's that use pubsub are steady, and messages are ack'd at a steady rate. I would say, things became a lot more stable around 4:00 AM EST.

image

Majors blips in memory usage are when rolling out changes to the GAE services. Nothings been touched since ~2:00 AM.

MichaelMarkieta commented 4 years ago

Combing through the logs and GRPC debugging output I see some errors that line up with the Unack' count blips (bottom left graph in the previous message's attachment):

image

MichaelMarkieta commented 4 years ago

One more interesting log bunch that looks quite different than the rest, from earlier in the night around 3:45 AM EST

image

npomfret commented 4 years ago

Observed a new error just now:

{"code":16,"metadata":{"_internal_repr":{"www-authenticate":["Bearer realm=\"https://accounts.google.com/\", error=\"invalid_token\""]},"flags":0}}

MichaelMarkieta commented 4 years ago

Looks like there are 3 errors in my playground that come up over and over:

Error: Failed to "acknowledge" for XX message(s). Reason: Deadline exceeded

Error: Failed to "modifyAckDeadline" for XX message(s). Reason: Retry total timeout exceeded before any response was received

Error: Failed to connect to channel. Reason: Failed to connect before the deadline
callmehiphop commented 4 years ago

@murgatroid99 does anything in the provided logs stick out to you?

npomfret commented 4 years ago

This problem is getting worse and worse for me. I've downgraded to "@google-cloud/pubsub": "0.29.1" and deleted all my subscriptions before recreating them but I'm still getting thousands of errors. Is anyone at google making any progress on this?

callmehiphop commented 4 years ago

@npomfret we're definitely making efforts here to resolve this as quickly as we can. Any additional info you can provide about your environment and application would be very helpful.