RST_STREAM error keeps showing up

jakeleventhal commented 4 years ago

Environment details

OS: macOS Catalina 10.15.5 Beta (19F53f)
Node.js version: 13.7.0
npm version: 6.13.6
@google-cloud/firestore version: 3.7.4

Steps to reproduce

This error keeps appearing over and over in my logs (not regularly reproducible):

Error: 13 INTERNAL: Received RST_STREAM with code 2
at Object.callErrorFromStatus (/api/node_modules/@grpc/grpc-js/src/call.ts:81)
at Object.onReceiveStatus (/api/node_modules/@grpc/grpc-js/src/client.ts:324)
at Object.onReceiveStatus (/api/node_modules/@grpc/grpc-js/src/client-interceptors.ts:439)
at Object.onReceiveStatus (/api/node_modules/@grpc/grpc-js/src/client-interceptors.ts:402)
at Http2CallStream.outputStatus (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:228)
at Http2CallStream.maybeOutputStatus (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:278)
at Http2CallStream.endCall (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:262)
at ClientHttp2Stream.<anonymous> (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:532)
at ClientHttp2Stream.emit (events.js:315)
at ClientHttp2Stream.EventEmitter.emit (domain.js:485)
at emitErrorCloseNT (internal/streams/destroy.js:76)
at processTicksAndRejections (internal/process/task_queues.js:84)

samborambo305 commented 3 years ago

@bitcoinbullbullbull Probably... This and other issues are opened from several months ago. In my case, I had to proxy all of these firebase operations to cloud functions...

That's interesting. I think that is a temporary solution for my problem. Would you be willing to hop on a brief video call to answer some specifics on how you do this? If so, can you email me at urirahimi@gmail.com?

hyst3ric41 commented 3 years ago

@bitcoinbullbullbull Sure, I'd be glad to help you, I mean I didn't achieve something special, so it's a simple idea in fact 😅 Imma mail to you

jakeleventhal commented 3 years ago

I am still getting RST_STREAM or DEADLINE_EXCEEDED errors roughly every 20 minutes for several months now

jakeleventhal commented 3 years ago

I don't mean to be pushy on this, but I don't understand how something this severe has not gotten more attention? @josegpulido and @bitcoinbullbullbull having to basically re-implement core firestore functionality seems ridiculous to me?

AlejandroBaldwin commented 3 years ago

Does anybody know how to make a request to the firestore db as a http request? according to this: https://github.com/googleapis/nodejs-vision/issues/785#issuecomment-762938989 It probably work if we make REST API calls instead of using the client libraries

schmidt-sebastian commented 3 years ago

FWIW, our next big project will be to transition most API call in this client to HTTP Rest. The primary goal is to reduce startup time, but I personally think that it will help with this issue too. If we get lucky, we can have something by end of the quarter.

IchordeDionysos commented 3 years ago

@schmidt-sebastian would this have a huge impact on the latency of the function calls when using HTTP vs GRPC?

schmidt-sebastian commented 3 years ago

The goal is to reduce latency.

gitmoto commented 3 years ago

we are experiencing this error intermittently (every couple of weeks).

due to error: Error: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)

interestingly, when it does occur, it happens at the beginning of one of our most commonly-invoked cloud functions. seems suspiciously like a coldstart-related issue?

"firebase": "^8.6.1",
"firebase-admin": "^9.8.0",
"firebase-functions": "^3.14.1",

ir-fuel commented 3 years ago

Same problem here

{"severity":"INFO","message":"Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error

when doing a collection.where.get call

pROFESOR11 commented 3 years ago

We have been facing the same issue quite often, and the severity is very high as it ends up with data loss or unacceptable latency times most of the time. We are getting the same error on both web app and firebase functions.

Here I'd like to share firebase function logs after setting GRPC_TRACE=call_stream and GRPC_VERBOSITY=DEBUG. I hope it will help to find a solution.

In this example case, it waits for about 3 minutes because of Received RST_STREAM with code 2 triggered by internal client error: read ETIMEDOUT, and then tries again. This specific case here means about 3 minutes latency for our users.

We get the same error for the writes too, and that's a worse case, because we lose the data, and it's impossible to put it back later. I hope there will be a solution soon.

// related code block

console.log("Getting doc data");

const workDoc = await admin.firestore().collection("work").doc(workId).get();
const workData = workDoc.data();

console.log("Success");

// dependencies

"firebase-admin": "^9.11.0",
"firebase-functions": "^3.14.1",

>  '2021-08-08T12:35:54.377Z': Getting doc data
>  2021-08-08T12:35:54.383Z | call_stream | [13] Sending metadata
>  2021-08-08T12:35:54.383Z | call_stream | [13] write() called with message of length 138
>  2021-08-08T12:35:54.384Z | call_stream | [13] end() called
>  2021-08-08T12:35:54.384Z | call_stream | [13] deferring writing data chunk of length 143
>  2021-08-08T12:35:54.384Z | call_stream | Starting stream on subchannel 142.250.187.170:443 with headers
>       x-goog-api-client: gax/2.22.1 gapic/4.14.1 gl-node/14.17.4 grpc/1.3.6 gccl/4.14.1 fire/9.11.0
>       google-cloud-resource-prefix: projects/beyondmars-staging/databases/(default)
>       x-goog-request-params: database=projects%2Fbeyondmars-staging%2Fdatabases%2F(default)
>       authorization: Bearer ey...(masked out intentionally)
>       grpc-timeout: 299998m
>       grpc-accept-encoding: identity,deflate,gzip
>       accept-encoding: identity
>       :authority: firestore.googleapis.com:443
>       user-agent: grpc-node-js/1.3.6
>       content-type: application/grpc
>       :method: POST
>       :path: /google.firestore.v1.Firestore/BatchGetDocuments
>       te: trailers
>
>  2021-08-08T12:35:54.384Z | call_stream | [13] attachHttp2Stream from subchannel 142.250.187.170:443
>  2021-08-08T12:35:54.385Z | call_stream | [13] sending data chunk of length 143 (deferred)
>  2021-08-08T12:35:54.385Z | call_stream | [13] calling end() on HTTP/2 stream
>  2021-08-08T12:38:17.034Z | call_stream | [13] Node error event: message=read ETIMEDOUT code=ETIMEDOUT errno=Unknown system error -60 syscall=read
>  2021-08-08T12:38:17.035Z | call_stream | [13] HTTP/2 stream closed with code 2
>  2021-08-08T12:38:17.036Z | call_stream | [13] ended with status: code=13 details="Received RST_STREAM with code 2 triggered by internal client error: read ETIMEDOUT"
>  2021-08-08T12:38:17.037Z | call_stream | [13] cancelWithStatus code: 1 details: "Cancelled on client"
>  2021-08-08T12:38:19.240Z | call_stream | [14] Sending metadata
>  2021-08-08T12:38:19.240Z | call_stream | [14] write() called with message of length 138
>  2021-08-08T12:38:19.240Z | call_stream | [14] end() called
>  2021-08-08T12:38:19.241Z | call_stream | [14] deferring writing data chunk of length 143
>  2021-08-08T12:38:19.345Z | call_stream | Starting stream on subchannel 142.250.187.170:443 with headers
>       x-goog-api-client: gax/2.22.1 gapic/4.14.1 gl-node/14.17.4 grpc/1.3.6 gccl/4.14.1 fire/9.11.0
>       google-cloud-resource-prefix: projects/beyondmars-staging/databases/(default)
>       x-goog-request-params: database=projects%2Fbeyondmars-staging%2Fdatabases%2F(default)
>       authorization: Bearer ey...(masked out intentionally)
>       grpc-timeout: 299895m
>       grpc-accept-encoding: identity,deflate,gzip
>       accept-encoding: identity
>       :authority: firestore.googleapis.com:443
>       user-agent: grpc-node-js/1.3.6
>       content-type: application/grpc
>       :method: POST
>       :path: /google.firestore.v1.Firestore/BatchGetDocuments
>       te: trailers
>
>  2021-08-08T12:38:19.345Z | call_stream | [14] attachHttp2Stream from subchannel 142.250.187.170:443
>  2021-08-08T12:38:19.345Z | call_stream | [14] sending data chunk of length 143 (deferred)
>  2021-08-08T12:38:19.345Z | call_stream | [14] calling end() on HTTP/2 stream
>  2021-08-08T12:38:20.256Z | call_stream | [14] Received server headers:
>       :status: 200
>       content-disposition: attachment
>       content-type: application/grpc
>       date: Sun, 08 Aug 2021 12:38:20 GMT
>       alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
>
>  2021-08-08T12:38:20.257Z | call_stream | [14] receive HTTP/2 data frame of length 1369
>  2021-08-08T12:38:20.257Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.257Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.257Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.257Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.259Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.259Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.261Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.261Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.262Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.263Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.264Z | call_stream | [14] receive HTTP/2 data frame of length 987
>  2021-08-08T12:38:20.350Z | call_stream | [14] receive HTTP/2 data frame of length 1369
>  2021-08-08T12:38:20.350Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.350Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.351Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.351Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.353Z | call_stream | [14] receive HTTP/2 data frame of length 1378
>  2021-08-08T12:38:20.353Z | call_stream | [14] receive HTTP/2 data frame of length 38
>  2021-08-08T12:38:20.353Z | call_stream | [14] parsed message of length 24433
>  2021-08-08T12:38:20.353Z | call_stream | [14] filterReceivedMessage of length 24433
>  2021-08-08T12:38:20.354Z | call_stream | [14] pushing to reader message of length 24428
>  2021-08-08T12:38:20.366Z | call_stream | [14] Received server trailers:
>       grpc-status: 0
>       content-disposition: attachment
>
>  2021-08-08T12:38:20.367Z | call_stream | [14] received status code 0 from server
>  2021-08-08T12:38:20.367Z | call_stream | [14] ended with status: code=0 details=""
>  2021-08-08T12:38:20.367Z | call_stream | [14] close http2 stream with code 0
>  '2021-08-08T12:38:20.368Z': Success

bhr commented 3 years ago

Hitting the issue too when writing a large number of new documents in batch. Hitting it only when the number of documents written gets > 10k

Exception:

error: Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: read ECONNRESET
    at Object.callErrorFromStatus (/Users/benedikt/preact-apps/node_modules/@grpc/grpc-js/src/call.ts:81:24)
    at Object.onReceiveStatus (/Users/benedikt/preact-apps/node_modules/@grpc/grpc-js/src/client.ts:334:36)
    at Object.onReceiveStatus (/Users/benedikt/preact-apps/node_modules/@grpc/grpc-js/src/client-interceptors.ts:426:34)
    at Object.onReceiveStatus (/Users/benedikt/preact-apps/node_modules/@grpc/grpc-js/src/client-interceptors.ts:389:48)
    at /Users/benedikt/preact-apps/node_modules/@grpc/grpc-js/src/call-stream.ts:249:24
    at processTicksAndRejections (internal/process/task_queues.js:79:11)
Caused by: Error: 
    at WriteBatch.commit (/Users/benedikt/preact-apps/node_modules/@google-cloud/firestore/build/src/write-batch.js:413:23)
    at DocumentReference.set (/Users/benedikt/preact-apps/node_modules/@google-cloud/firestore/build/src/reference.js:343:14)
    at FirestoreBatchWriter.<anonymous> (/Users/benedikt/preact-apps/packages/firebase-admin/src/batchWriter.ts:19:21)
    at Generator.next (<anonymous>)
    at /Users/benedikt/preact-apps/packages/firebase-admin/dist/batchWriter.js:8:71
    at new Promise (<anonymous>)
    at __awaiter (/Users/benedikt/preact-apps/packages/firebase-admin/dist/batchWriter.js:4:12)
    at FirestoreBatchWriter.set (/Users/benedikt/preact-apps/packages/firebase-admin/src/batchWriter.ts:18:88)
    at /Users/benedikt/preact-apps/packages/firebase-admin/src/api/analytics.ts:142:24
    at Array.map (<anonymous>)

Package.json

"firebase-admin": "^9.11.1",

Here's the code of FirestoreBatchWriter for reference:

import { firestore } from 'firebase-admin';

const FIRESTORE_BATCH_LIMIT = 500;

export class FirestoreBatchWriter {
  private readonly db: firestore.Firestore;

  private batch: firestore.WriteBatch | undefined;

  private operationsInBatchCount: number;

  constructor(db: firestore.Firestore) {
    this.db = db;
    this.batch = undefined;
    this.operationsInBatchCount = 0;
  }

  set = async (reference: firestore.DocumentReference, changes: Record<string, any>) => {
    await reference.set(changes);
    await this.addBatchOperation();
  };

  update = async (reference: firestore.DocumentReference, changes: Record<string, any>) => {
    await reference.update(changes);
    await this.addBatchOperation();
  };

  delete = async (reference: firestore.DocumentReference) => {
    await reference.delete();
    await this.addBatchOperation();
  };

  private addBatchOperation = async () => {
    await this.processBatch(true, false, true);
  };

  closeBatch = async (reopen: boolean = true) => {
    await this.processBatch(false, true, reopen);
  };

  private processBatch = async (increaseCounter: boolean, force: boolean, reopen: boolean) => {
    if (!this.batch) {
      this.batch = this.db.batch();
      this.operationsInBatchCount = 0;
    }

    if (increaseCounter) {
      this.operationsInBatchCount += 1;
    }

    if (force || this.operationsInBatchCount % FIRESTORE_BATCH_LIMIT === 0) {
      console.debug('Commiting batch');

      await this.batch.commit();
      this.operationsInBatchCount = 0;
      if (reopen) {
        this.batch = this.db.batch();
      } else {
        this.batch = undefined;
      }
    }
  };
}

jeremiahlachica commented 2 years ago

I'm getting this error as well. Any updates, please?

I'm using: "firebase-admin": "^9.12.0"

Runtime: Node 16

opalrose510 commented 2 years ago

Hello, Same issue here, across all our prod servers. The error rate is extremely high, but it's in our snapshot listeners, not the writes. "firebase-admin": "~9.11.0", node 14

riksnelders commented 2 years ago

Still getting these issues daily in production causing data los. Any update on this?

adi-rds commented 2 years ago

Hello I am getting this on Datastore reads, especially get requests that go up to the 1000 key limit. The error started appearing within the first 2-3 reads once we upgraded libraries while continuing to run Node v12.18.3.

"firebase": "^9.6.5", "firebase-admin": "^9.12.0",

It was originally

"firebase": "^7.11.0",
"firebase-admin": "^8.10.0",

The key issue seems that it breaks the @google-cloud/datastore library as well as all these libraries have dependencies on grpc-js which gets upgraded from 1.1.3 to 1.5.4 (latest).

Though both are minor versions i.e. no breaking changes, in fact there seem to be quite a bit of sensitivity to the Node version, as grpc-js 1.5.4 works better in Node 16.13.2. The immediate error went away (that would happen after just 1500 or so entity reads) but it's not clear if it's going to happen when the read volume goes even higher.

Overall quite a scary issue.

michaelAtCoalesce commented 2 years ago

Hi, we hit this yesterday and it caused 3 customer outages for us yesterday in production.

i have a recreateable test case that causes this - and i can get it to happen in seconds. if someone from either grpc or firestore projects wants to hop on and do some debugging to close this issue that has been open for 2 years - let me know. i'm happy to hop on a zoom and do some pair debugging on this one.

@murgatroid99 @schmidt-sebastian

michaelAtCoalesce commented 2 years ago

also want to add something that looks related and hasnt been mentioned - the Error: 13 seems to coincide with Error: 8 RESOURCE_EXHAUSTED.

node version 14.15.3

logs

adi-rds commented 2 years ago

Michael,

I'm guessing you upgraded the Firebase libraries but didn't upgrade the Node version. In our experience, the issue went away when we went to Node 16.

Firestore, Firebase-admin, and the @google/datastore libraries all have a shared dependency on grpc-js, and it in turn is sensitive to which Node version is running. When we upgraded these libraries, we got very similar errors which went away when we went from Node 12 to 16.

Cheers, Adi

On Fri, Mar 25, 2022 at 1:46 PM michaelAtCoalesce @.***> wrote:

also want to add something that looks related and hasnt been mentioned - the Error: 13 seems to coincide with Error: 8 RESOURCE_EXHAUSTED.

node version 14.15.3

[image: logs] https://user-images.githubusercontent.com/96628583/160182579-44560f60-0726-417e-b2f2-d7d4c80d5533.jpg

— Reply to this email directly, view it on GitHub https://github.com/googleapis/nodejs-firestore/issues/1023#issuecomment-1079310174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIWKQUPHSYQWUSKEULM355DVBYCW7ANCNFSM4MJANDOQ . You are receiving this because you commented.Message ID: @.***>

-- Aditya Ramani Ramani Data Science LLC https://www.linkedin.com/in/adiramani/ 773-580-5665

michaelAtCoalesce commented 2 years ago

Hey Adi, I blew away my yarn.lock and i'm at the following versions:

grpc:1.5.0 @google-cloud/firestore: 5.0.2, firebase-admin: 10.0.2,

these are the latest versions of these packages. i've tried node 12,14,16,17 and i can still recreate the issue within seconds.

BTW - your issue was with reads. all of my reads complete, but writes do not.

michaelAtCoalesce commented 2 years ago

looks like the issue has something to do with concurrency. when i limit my concurrency to 1 - the writes complete successfully. but even when the # of writes concurrency is at 2, i start getting these errors.

i can only limit my concurrency in my recreate case - in my real app - we must have parallel writes. this ticket has been open for 2 YEARS - and causes CUSTOMER OUTAGES FOR us. can someone at google responsible for firestore give us an update?

webbertakken commented 2 years ago

looks like the issue has something to do with concurrency.

We're getting these errors without concurrency. 1 off transactions - a few per hour (about 3 failures per 1500 transactions).

causes CUSTOMER OUTAGES FOR us.

Recommend wrapping your call in a small retry function to mitigate the problem.

schmidt-sebastian commented 2 years ago

I promise it's not a belated April Fool's, but we actually have a fix: https://github.com/grpc/grpc-node/pull/2084 This will be part of the next @grpc/grpc-js release.

murgatroid99 commented 2 years ago

I want to temper expectations there. That change fixes "RST_STREAM with error code 2" errors in some cases, but we don't know if there are other causes of the same error.

murgatroid99 commented 2 years ago

That change has been released, so if you update your dependencies, you should get it. Please try it out to see if it helps.

I would like to note that @michaelAtCoalesce was able to share a consistent reproduction of the error, and that was a great help with tracking down the bug we found. So if anyone else encounters a similar error, a similar reproduction would probably be helpful for that too.

michaelAtCoalesce commented 2 years ago

hey all - on @grpc/grpc-js version 1.6.0 - i got errors like this - {"code":4,"details":"The datastore operation timed out, or the data was temporarily unavailable.","metadata":{},"note":"Exception occurred in retry method that was not classified as transient"}

{"code":4,"details":"The datastore operation timed out, or the data was temporarily unavailable.","metadata":{},"note":"Exception occurred in retry method that was not classified as transient"} Error: 4 DEADLINE_EXCEEDED: The datastore operation timed out, or the data was temporarily unavailable

GoogleError; {"code":4} Error: Total timeout of API google.firestore.v1.Firestore exceeded 600000 milliseconds before any response was received.

with the NEWER version of @grpc/grpc-js 1.6.7 - for the same case i got the following errors.

{"code":1,"details":"Call cancelled","metadata":{},"note":"Exception occurred in retry method that was not classified as transient"} Error: 1 CANCELLED: Call cancelled

So, basically I'm still getting errors with a highly parallel write scenario - but now I'm getting "Call Cancelled" with grpc v1.6.7, versus "DEADLINE_EXCEEDED" with v 1.6.0.

whats the meaning behind Call Cancelled..? googling this - it looks like it is OK to retry these if the write is idempotent?

MiguelNiblock commented 2 years ago

I'm getting this error when using firebase from within cypress and connecting it to emulator for e2e testing. I can't make any requests to populate the database because of this error.

michaelAtCoalesce commented 2 years ago

an update from my end - we had to rearchitect our data layout to avoid more than one write to a document per second (This is described in the best practices for firestore) https://firebase.google.com/docs/firestore/best-practices#updates_to_a_single_document

i've never heard of a data store that limits you to 1 write per second.... but whatever.

the thing is, you can get away with having a few writes to a document for a second, but if you start doing it at scale and highly parallel, you are 1) more likely to get the issue 2) less likely to be able to just retry and have the write succeed.

i put in a retry of 3, and was still unable to get a successful write when dealing with hundreds of concurrent writes.

so, although i do think that firestore should really work on these issues and not have weird best practices like 1 write a second - we were able to rewrite our app to get things to succeed (with minimal retrying)

untdecocs commented 1 year ago

Trying to write to collection. Protocol error. No info anywhere. What should be done ?

"firebase-admin": "^11.5.0"

ziedHamdi commented 1 year ago

Happens to me with the emulator on every attempt, so maybe an easy case to reproduce

Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error
    at callErrorFromStatus (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/client.js:160:34)
    at ServiceClientImpl.<anonymous> (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@grpc+grpc-js@1.8.13/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@google-cloud+firestore@6.5.0/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:227:29
    at /home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/google-gax@3.6.0/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
    at repeat (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/google-gax@3.6.0/node_modules/google-gax/build/src/normalCalls/retries.js:80:25)
    at /home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/google-gax@3.6.0/node_modules/google-gax/build/src/normalCalls/retries.js:118:13
    at OngoingCallPromise.call (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/google-gax@3.6.0/node_modules/google-gax/build/src/call.js:67:27)
    at NormalApiCaller.call (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/google-gax@3.6.0/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
    at /home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/google-gax@3.6.0/node_modules/google-gax/build/src/createApiCall.js:84:30
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Caused by: Error
    at WriteBatch.commit (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@google-cloud+firestore@6.5.0/node_modules/@google-cloud/firestore/build/src/write-batch.js:433:23)
    at DocumentReference.set (/home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@google-cloud+firestore@6.5.0/node_modules/@google-cloud/firestore/build/src/reference.js:393:27)
    at POST (/home/zied/Work/WS/linkedinCv-back/src/routes/cv/submit/+server.js:20:83)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Module.render_endpoint (/node_modules/.pnpm/@sveltejs+kit@1.15.2_svelte@3.58.0_vite@4.2.1/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20)
    at async resolve (/node_modules/.pnpm/@sveltejs+kit@1.15.2_svelte@3.58.0_vite@4.2.1/node_modules/@sveltejs/kit/src/runtime/server/respond.js:378:17)
    at async Object.handle (/src/hooks.server.js:7:19)
    at async Module.respond (/node_modules/.pnpm/@sveltejs+kit@1.15.2_svelte@3.58.0_vite@4.2.1/node_modules/@sveltejs/kit/src/runtime/server/respond.js:251:20)
    at async file:///home/zied/Work/WS/linkedinCv-back/node_modules/.pnpm/@sveltejs+kit@1.15.2_svelte@3.58.0_vite@4.2.1/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:514:22 {
  code: 13,
  details: 'Received RST_STREAM with code 2 triggered by internal client error: Protocol error',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} },
  note: 'Exception occurred in retry method that was not classified as transient'
}

package.json

{
    "name": "linkedincv-back",
    "version": "0.0.1",
    "private": true,
    "scripts": {
        "dev": "vite dev",
        "build": "vite build",
        "preview": "vite preview",
        "check": "svelte-kit sync && svelte-check --tsconfig ./jsconfig.json",
        "check:watch": "svelte-kit sync && svelte-check --tsconfig ./jsconfig.json --watch",
        "test": "playwright test",
        "test:unit": "vitest",
        "lint": "prettier --plugin-search-dir . --check .",
        "format": "prettier --plugin-search-dir . --write ."
    },
    "devDependencies": {
        "@playwright/test": "^1.28.1",
        "@sveltejs/adapter-auto": "^2.0.0",
        "@sveltejs/kit": "^1.5.0",
        "acorn": "^8.8.2",
        "estree-walker": "^3.0.3",
        "prettier": "^2.8.0",
        "prettier-plugin-svelte": "^2.8.1",
        "svelte": "^3.54.0",
        "svelte-check": "^3.0.1",
        "typescript": "^5.0.0",
        "vite": "^4.2.0",
        "vitest": "^0.25.3"
    },
    "type": "module",
    "dependencies": {
        "firebase-admin": "^11.6.0",
        "mongoose": "^7.0.3",
        "svelte-routing": "^1.6.0"
    }
}

Under Node.js /sveltekit server endpoint. Here are my lib versions:


pnpm view firebase-admin dependencies

{
  '@fastify/busboy': '^1.1.0',
  '@firebase/database-compat': '^0.3.0',
  '@firebase/database-types': '^0.10.0',
  '@types/node': '>=12.12.47',
  jsonwebtoken: '^9.0.0',
  'jwks-rsa': '^3.0.1',
  'node-forge': '^1.3.1',
  uuid: '^9.0.0',
  '@google-cloud/firestore': '^6.4.0',
  '@google-cloud/storage': '^6.5.2'
}

My use case is pretty simple: firebaseInit.js:

import admin from 'firebase-admin'
const useEmulator = true;

if (useEmulator){
    process.env['FIRESTORE_EMULATOR_HOST'] = 'localhost:4000';
}

// TODO: Add SDKs for Firebase products that you want to use
// https://firebase.google.com/docs/web/setup#available-libraries

// Your web app's Firebase configuration
// For Firebase JS SDK v7.20.0 and later, measurementId is optional
const firebaseConfig = {
    apiKey: "XXXXX",
    authDomain: "linkedincv-bdbec.firebaseapp.com",
    projectId: "linkedincv-bdbec",
    storageBucket: "linkedincv-bdbec.appspot.com",
    messagingSenderId: "XXXXX",
    appId: "XXXXX",
    measurementId: "XXXXX",
};

// Initialize Firebase
const app = admin.initializeApp(firebaseConfig);
// const db = getFirestore(app);

const db = admin.firestore();

export {app, db}

My code using the initialized db variable:

+server.js

import { json } from '@sveltejs/kit';
import {db} from '../../../lib/middleware/firebaseInit'

export async function POST({request}) {
    try {
        const payload = await request.json();
        const uniqueUrl = payload.uniqueUrl;

        const docRef = await db.collection("json_cv").doc("zied").set( payload)
        // Return the document ID of the stored object
        return json({
            body: JSON.stringify({ documentId: docRef.id }),
            headers: { 'Content-Type': 'application/json' },
            status: 200
        });
    } catch (error) {
        // Handle any errors that occur during processing
        console.error(error);
        return json({
            body: JSON.stringify({ error }),
            headers: { 'Content-Type': 'application/json' },
            status: 500
        });
    }
}

Just to be complete if you want to reproduce, here's a file specific to sveltekit allowing CORS hooks.server.js

import { Handle } from '@sveltejs/kit';

export const handle = async ({ resolve, event }) => {
    const response = await resolve(event);

    // Apply CORS header for API routes
    if (event.url.pathname.startsWith('/cv/submit')) {
        // Required for CORS to work
        if(event.request.method === 'OPTIONS') {
            return new Response(null, {
                headers: {
                    'Access-Control-Allow-Methods': 'POST, GET, OPTIONS, DELETE',
                    'Access-Control-Allow-Origin': '*',
                }
            });
        }

        response.headers.append('Access-Control-Allow-Origin', `*`);
    }

    return response;
};

ziedHamdi commented 1 year ago

That change has been released, so if you update your dependencies, you should get it. Please try it out to see if it helps.

I would like to note that @michaelAtCoalesce was able to share a consistent reproduction of the error, and that was a great help with tracking down the bug we found. So if anyone else encounters a similar error, a similar reproduction would probably be helpful for that too.

Do you want the reproduction to be in a git repo? or is the code I shared here sufficient?

ziedHamdi commented 1 year ago

I'm getting this error when using firebase from within cypress and connecting it to emulator for e2e testing. I can't make any requests to populate the database because of this error.

Same for me

ziedHamdi commented 1 year ago

Running finebase init again, things solved. I recommend doing it in case you forgot to correctly initialize something (I think I surely did, as I was discovering firebase)

marcianosr commented 1 year ago

@ziedHamdi What was your solution? I ran firebase base init also again, but to no avail.

jakeleventhal commented 1 year ago

Update: I ended up moving off of Firestore and moved to a Postgres DB, and ultimately everything from GCP to AWS because of this issue.

MorenoMdz commented 1 year ago

Hello, Same issue here, across all our prod servers. The error rate is extremely high, but it's in our snapshot listeners, not the writes. "firebase-admin": "~9.11.0", node 14

Same here, we have millions of error spans with microseconds duration in our server due to this. Our system initializes a snapshot listeners to get changes on a specific document we have to control our feature flags, it should only instantiate once when a new instance of our NestJs server boots up and then only be called when changes happen, but we get millions of those small 10 micro second errors due to DNS errors, connection errors and now RST_STREAM.

Our server is hosted in Render.com, ALL the normal Firestore calls work as expected, no requests ever failed because of this, the snapshot listener is just erroring out like this, all the time.

I tried contacting GCP support on this, they have zero clues about what is the problem or how to try to help as they don't see any errors on their side.

FilledStacks commented 1 year ago

I just finished my backend local development and everything work well. Then I deployed and none of my functions work. I'm getting the same error.

Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error
    at callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /workspace/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11)

My code is pretty simple.

this.#log.i(`create license`, JSON.stringify({license: license, id: id}));

const licenseDoc = this.ref.doc(id);

await licenseDoc.set(license); // <===== This is where the error originates from 
return licenseDoc.id;

Using Node.js 16

I don't get it once or twice, it literally happens everytime. My endpoint doesn't work at all. This is a brand new project on Firestore and my first deployment.

Package.json dependencies

 "dependencies": {
    "firebase-admin": "^11.10.1",
    "firebase-backend": "^0.2.5",
    "firebase-functions": "^4.4.1",
    "uuid": "^9.0.1"
  },

Is there anything I can try to get past this? We need to launch our product and I'd hate to need another week to rewrite all these endpoints on a different platform.

maylorsan commented 11 months ago

👋 @schmidt-sebastian,

We've been facing this error in our production environment for the past 3 days and it's occurred roughly 10,600 times:

Error: 13 INTERNAL: Received RST_STREAM with code 1

The error is triggered when executing the following code:

const writeResult = await admin
      .firestore()
      .collection(FirestoreCollections.Users)
      .doc(userID)
      .update(fieldsToUpdate); // This line throws the error

Do we have any updates or workarounds for this? It's affecting our users and we'd appreciate your guidance.

Note: Our Users collection has a significantly large number of documents. Could the volume of documents be a contributing factor to this issue?

CollinsVizion35 commented 11 months ago

@maylorsan

13 INTERNAL: Received RST_STREAM with code 2

Have you found a solution to this? because i am facing the same exact error.

maylorsan commented 11 months ago

Hello @CollinsVizion35,

We haven't found a solution to this issue yet.

We've attempted several methods, but none have resolved the problem:

Switched the update method to use set and set with {merge: true}, but this didn't work.
Created a new cloud function dedicated solely to this method. Surprisingly, it didn't work either. However, it's noteworthy that we have several other cloud functions that update user data using the same method, and they function as expected. What's even more perplexing is that this issue arises in about 60% of our http cloud function calls, and it seems to occur randomly. We haven't been able to identify a consistent pattern.

Interestingly, everything operates flawlessly in our development project. The only difference is that the development project has a smaller User collection.

I'm starting to suspect that this might be related to some undocumented limitation in Firestore...

I will stay in touch about updates!

CollinsVizion35 commented 11 months ago

@maylorsan

Okay, thank you. I tried using batch commit and it still didn't work.

edmilsonss commented 11 months ago

I've got a workaround/solution in my situation

See here: https://github.com/firebase/firebase-admin-node/issues/2345#issuecomment-1776090309

CollinsVizion35 commented 11 months ago

Hey @maylorsan, i think i have found a solution from @edmilsonss

I think it works with these changes

former code: admin.initializeApp({ credential: admin.credential.cert(serviceAccount), databaseURL: "https://project name.firestore.googleapis.com", });

// Create a Firestore instance const db = admin.firestore();

new code: admin.initializeApp({ credential: admin.credential.cert(serviceAccount), databaseURL: "https://project name.firestore.googleapis.com", });

// Create a Firestore instance const db = admin.firestore(); const settings = { preferRest: true, timestampsInSnapshots: true }; db.settings(settings);

maylorsan commented 11 months ago

@CollinsVizion35 Indeed, we haven't experimented with that solution just yet. As I mentioned in this comment, our primary approach was to optimize our algorithm logic between Firebase calls. Thankfully, this seems to have resolved the issue for now.

It's certainly an unusual behavior 😄

udnes99 commented 11 months ago

Any update on this issue? Happens sporadically for us in production, using @google-cloud/datastore": "8.2.2" and as I understood, googleapis/nodejs-datastore#679 has been closed in favor of using this issue as it is likely the same root cause. This has been happening for a long time..

This seems to be occurring when instantiating too many transactions simultaneously, perhaps it initiates too many gRPC connections to the google API?

sammyKhan commented 10 months ago

@maylorsan Setting preferRest: true fixes it for one of our simpler services, but not for others. We are not using firestore listeners in any of them, so I'm surprised it's switching to http streaming from REST at all. Could you give a list of situations in which preferRest will fall back to http streaming so that we can try to avoid them?

maylorsan commented 10 months ago

@sammyKhan, my apologies for the delayed reply!

I wanted to clarify that we don't employ preferRest: true in our services, as previously mentioned. Our main strategy has been to refine the logic of our algorithm, especially in the intervals between Firebase calls.

our primary approach was to optimize our algorithm logic between Firebase calls

In our case it appears that the issue we're encountering arises due to a significant delay occurring between the get and update methods in Firestore.

cherylEnkidu commented 10 months ago

Hi @maylorsan

Since your case is different to what is been reported in this issue, could you please open a new ticket and describe your problem in detail?

maylorsan commented 10 months ago

Hi @cherylEnkidu,

Thanks for the advice. I'll open a new ticket with all relevant details to address our specific Firestore issue.

adamkoch commented 7 months ago

We still see this intermittently, no real pattern that I can see. The Firestore code that triggers it is simple and runs successfully 99% of the time:

await ref.set(updateData, {
  merge: true,
});

But every so often we'll see the error. I've been progressively adding more debugging logging to the function to see if I can work out what may be causing it but there is nothing of note that I can see.

Using up-to-date dependencies and node version.

~/ $ node --version
v18.18.2

package.json dependencies:

  "dependencies": {
    "@google-cloud/firestore": "^7.3.0",
    "@google-cloud/logging": "^11.0.0",
    "firebase-admin": "^12.0.0",
    "firebase-functions": "^4.7.0",
    "googleapis": "^132.0.0",
    ...
  },

Stack trace:

Error: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Error: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)
    at callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /workspace/node_modules/@grpc/grpc-js/build/src/resolving-call.js:99:78
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:160:32)
    at ServiceClientImpl.<anonymous> (/workspace/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /workspace/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:231:29
    at /workspace/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
    at repeat (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:80:25)
    at /workspace/node_modules/google-gax/build/src/normalCalls/retries.js:118:13
    at OngoingCallPromise.call (/workspace/node_modules/google-gax/build/src/call.js:67:27)
    at NormalApiCaller.call (/workspace/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
    at /workspace/node_modules/google-gax/build/src/createApiCall.js:84:30
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Caused by: Error
    at WriteBatch.commit (/workspace/node_modules/@google-cloud/firestore/build/src/write-batch.js:432:23)
    at DocumentReference.set (/workspace/node_modules/@google-cloud/firestore/build/src/reference.js:398:27)
    at /workspace/lib/auth.js:201:19
    at Generator.next (<anonymous>)
    at fulfilled (/workspace/lib/auth.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 13,
  details: 'Received RST_STREAM with code 2 (Internal server error)',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} },
  note: 'Exception occurred in retry method that was not classified as transient'
}
    at console.error (/workspace/node_modules/firebase-functions/lib/logger/compat.js:31:23)
    at /workspace/lib/auth.js:207:17
    at Generator.throw (<anonymous>)
    at rejected (/workspace/lib/auth.js:6:65)

googleapis / nodejs-firestore

RST_STREAM error keeps showing up #1023

Environment details

Steps to reproduce