googleapis / nodejs-storage

Node.js client for Google Cloud Storage: unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving.
https://cloud.google.com/storage/
Apache License 2.0
897 stars 369 forks source link

Methods that perform a fetch should throw any socket errors directly (ECONNRESET) #2482

Closed ryanwilsonperkin closed 2 months ago

ryanwilsonperkin commented 5 months ago

Related: https://github.com/googleapis/google-cloud-node/issues/2254

Environment details

Steps to reproduce

When running multiple uploads in parallel, we frequently encounter ECONNRESET socket errors such as

FetchError: request to https://storage.googleapis.com/storage/v1/b/path-redacted failed, reason: write ECONNRESET
--
  | at ClientRequest.<anonymous> (/app/node_modules/.pnpm/node-fetch@2.7.0/node_modules/node-fetch/lib/index.js:1501:11)
  | at ClientRequest.emit (node:events:518:28)
  | at ClientRequest.emit (node:domain:488:12)
  | at TLSSocket.socketErrorListener (node:_http_client:495:9)
  | at TLSSocket.emit (node:events:518:28)
  | at TLSSocket.emit (node:domain:488:12)
  | at emitErrorNT (node:internal/streams/destroy:169:8)
  | at emitErrorCloseNT (node:internal/streams/destroy:128:3)
  | at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  | type: 'system',
  | errno: 'ECONNRESET',
  | code: 'ECONNRESET'
  | }

These happen for us using the Bucket#upload method, and we would like to be able to catch the error and handle it appropriately, but the error is thrown in such a way that it can't be caught by a try/catch on Bucket#upload an instead needs to be captured at the process level by an event handler.

When GCS is instantiating the node-fetch client, the client should capture any FetchError and reject the promise with it.

danielbankhead commented 3 months ago

Fascinating, do you mind sharing a reproducible snippet? This should be captured via try { await … } catch (e) {}

ryanwilsonperkin commented 3 months ago
const crypto = require("node:crypto");
const fs = require("node:fs/promises");

// Replace with organization-specific values
const PROJECT_ID = "";
const BUCKET_NAME = "";
const NUM_TEST_FILES = 10_000;
const SOURCE_DIRECTORY = "/tmp/gcs-example";

async function createTestFiles() {
  const files = [];
  await fs.mkdir(SOURCE_DIRECTORY, { recursive: true });
  for (let i = 0; i < NUM_TEST_FILES; i++) {
    const uuid = crypto.randomUUID();
    const file = `${SOURCE_DIRECTORY}/${uuid}`;
    await fs.writeFile(file, uuid);
    files.push(file);
  }
  return files;
}

(async function main() {
  const files = await createTestFiles();
  const gcs = new Storage({ projectId: PROJECT_ID });
  const bucket = gcs.bucket(BUCKET_NAME);
  try {
    await Promise.all(files.map((file) => bucket.upload(file)));
  } catch (err) {
    // This should be able to catch the socket error but it does not
  }
})();

Here's a snippet that should be able to reproduce it. It will attempt to perform 10k concurrent uploads which is likely sufficient to encounter the socket error, but you may find that the content size of the files that are being created needs to be increased to be sufficient to trigger the issue

danielbankhead commented 3 months ago

That's strange - I'm able to capture the error just fine locally with the same code provided, with a few minor mods:

index.mjs

import crypto from "node:crypto";
import fs from "node:fs/promises";

import { Storage } from "@google-cloud/storage";

// Replace with organization-specific values
const NUM_TEST_FILES = 10_000;
const SOURCE_DIRECTORY = "/tmp/gcs-example";

async function createTestFiles() {
  const files = [];
  await fs.mkdir(SOURCE_DIRECTORY, { recursive: true });
  for (let i = 0; i < NUM_TEST_FILES; i++) {
    const uuid = crypto.randomUUID();
    const file = `${SOURCE_DIRECTORY}/${uuid}`;
    await fs.writeFile(file, uuid);
    files.push(file);
  }
  return files;
}

async function main() {
  console.log(`Creating ${NUM_TEST_FILES} test file(s)...`);
  const files = await createTestFiles();
  console.log("...created.");

  const gcs = new Storage();
  const bucket = gcs.bucket(process.env.BUCKET_NAME);
  try {
    await Promise.all(files.map((file) => bucket.upload(file)));
  } catch (err) {
    // This should be able to catch the socket error but it does not
    console.dir({ err });
  }

  console.log("done.");
}

await main();

Which version of storage are you using?

danielbankhead commented 3 months ago

*ah, I'm assuming google-cloud-node version => @google-cloud/storage. This seems to work fine in @google-cloud/storage@7.11.2

ryanwilsonperkin commented 3 months ago

We're currently on @google-cloud/storage@7.10.0, anything that was meaningfully changed in the difference that would cause that to be the case?

Is it the ECONNRESET error that you're catching in the catch block?

danielbankhead commented 3 months ago

There were a number of changes upstream that may have resolved this; after digging I wasn't able to point to a particular change that would have fixed this.

Is it the ECONNRESET error that you're catching in the catch block?

Yep, I see that error being caught.

ddelgrosso1 commented 3 months ago

@ryanwilsonperkin @danielbankhead it sounds like this has been resolved?

ryanwilsonperkin commented 3 months ago

I intend to come back to this when work slows down a bit, but the issue still appears to be present in our environment. Going to see if I can get a more representative reproduction case

smcgivern commented 2 months ago

I can also reproduce this (inconsistently), but only in our application. Using the scripts above, I never get ECONNRESET - but I do get other errors, which are handled correctly.

Here's an example of the error we see:

node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^
GaxiosError: request to https://storage.googleapis.com/upload/storage/v1/b/bucket/o?name=filename&uploadType=resumable failed, reason: read ECONNRESET
    at Gaxios._request (/path/to/our/app/node_modules/.pnpm/gaxios@6.1.1_encoding@0.1.13/node_modules/gaxios/src/gaxios.ts:183:13)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Impersonated.requestAsync (/path/to/our/app/node_modules/.pnpm/google-auth-library@9.7.0_encoding@0.1.13/node_modules/google-auth-library/build/src/auth/oauth2client.js:405:18)
    at async Upload.makeRequest (/path/to/our/app/node_modules/.pnpm/@google-cloud+storage@7.9.0_encoding@0.1.13/node_modules/@google-cloud/storage/build/cjs/src/resumable-upload.js:743:21)
    at async uri.retries (/path/to/our/app/node_modules/.pnpm/@google-cloud+storage@7.9.0_encoding@0.1.13/node_modules/@google-cloud/storage/build/cjs/src/resumable-upload.js:410:29)
    at async Upload.createURIAsync (/path/to/our/app/node_modules/.pnpm/@google-cloud+storage@7.9.0_encoding@0.1.13/node_modules/@google-cloud/storage/build/cjs/src/resumable-upload.js:407:21) {
  config: {
    method: 'POST',
    url: 'https://storage.googleapis.com/upload/storage/v1/b/bucket/o?name=filename&uploadType=resumable',
    params: {
      name: 'filename',
      uploadType: 'resumable'
    },
    data: { contentEncoding: 'gzip' },
    headers: {
      'User-Agent': 'gcloud-node-storage/7.9.0 google-api-nodejs-client/9.7.0',
      'x-goog-api-client': 'gl-node/20.11.1 gccl/7.9.0-CJS gccl-invocation-id/a46ab3ae-18da-4e7e-b105-ebc1794756dd',
      'X-Upload-Content-Type': 'application/x-ndjson',
      Authorization: 'Bearer snip',
      'Content-Type': 'application/json'
    },
    validateStatus: [Function (anonymous)],
    paramsSerializer: [Function: paramsSerializer],
    body: '{"contentEncoding":"gzip"}',
    responseType: 'unknown',
    errorRedactor: [Function: defaultErrorRedactor]
  },
  response: undefined,
  error: FetchError: request to https://storage.googleapis.com/upload/storage/v1/b/bucket/o?name=filename&uploadType=resumable failed, reason: read ECONNRESET
      at ClientRequest.<anonymous> (/path/to/our/app/node_modules/.pnpm/node-fetch@2.6.11_encoding@0.1.13/node_modules/node-fetch/lib/index.js:1505:11)
      at ClientRequest.emit (node:events:518:28)
      at ClientRequest.emit (node:domain:488:12)
      at TLSSocket.socketErrorListener (node:_http_client:495:9)
      at TLSSocket.emit (node:events:518:28)
      at TLSSocket.emit (node:domain:488:12)
      at emitErrorNT (node:internal/streams/destroy:169:8)
      at emitErrorCloseNT (node:internal/streams/destroy:128:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
    type: 'system',
    errno: 'ECONNRESET',
    code: 'ECONNRESET'
  },
  code: 'ECONNRESET'
}
danielbankhead commented 2 months ago

@smcgivern I’m not too familiar with pnpm, however I’m seeing gaxios@6.1.1 in the log. Do you mind checking the gaxios version and potentially updating?

smcgivern commented 2 months ago

pnpm's just an npm replacement, but the library version point is a fair one - having upgraded, I haven't been able to reproduce yet (although I wasn't able to reproduce consistently in the first place). I will report back if I see the same problem on the latest versions of this library + gaxios.

danielbankhead commented 2 months ago

Thanks, closing for now.