nodeshift / opossum

Node.js circuit breaker - fails fast ⚡️
https://nodeshift.dev/opossum/
Apache License 2.0
1.27k stars 106 forks source link

AbortController break fetch after first timeout because aborted signal can't be re-used #861

Open lbeschastny opened 2 months ago

lbeschastny commented 2 months ago

Node.js Version:

↪ node --version
v20.12.2

Operating System:

↪ uname -a
Darwin Mac-WP9QH543YX 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64

The problem:

AbortSignal both stores it's state and provides an abort event to signal when abort is happening.

And AbortController newer updates it's signal, leaving it in aborted state forever:

> abortController = new AbortController()
AbortController { signal: AbortSignal { aborted: false } }
> abortController.abort()
undefined
> abortController
AbortController { signal: AbortSignal { aborted: true } }

It looks like undici fetch uses both aborted property and abort event to skip sending the request if it is already aborted.

So, once the action times out for the first time, it switches AbortController into aborted state and all subsequent fetch calls just abort immediately.

Minimal reproducible example:

const { describe, it, before, after, afterEach, mock } = require('node:test');
const assert = require('node:assert/strict');
const http = require('node:http');

const CircuitBreaker = require('opossum');

describe('Circuit breaker with abort controller', () => {
    const abortController = new AbortController();
    const get = mock.fn((timeout) => {
        const { port } = server.address();
        const { signal } = abortController;
        const headers = { 'x-timeout': timeout };
        return fetch(`http://localhost:${port}`, { signal, headers });
    });
    const breaker = new CircuitBreaker(get, {
        abortController,
        timeout: 1000,
        volumeThreshold: 3
    });

    const server = http.createServer((req, res) => {
        const timeout = Number(req.headers['x-timeout']);
        setTimeout(() => res.end(`delayed for ${timeout}ms`), timeout);
    });

    before(() => server.listen());
    after(() => server.close());
    afterEach(() => get.mock.resetCalls());

    it('Should work', async () => {
        const response = await breaker.fire(100);
        const text = await response.text();
        assert.equal(text, 'delayed for 100ms');
    });

    it('Should timeout and abort request', async () => {
        await assert.rejects(() => breaker.fire(2000), {
            message: 'Timed out after 1000ms'
        });
        await assert.rejects(() => get.mock.calls[0].result, {
            message: 'This operation was aborted'
        });
    });

    it('Should work again', async () => {
        try {
            await breaker.fire(100);
        } catch ({ name, message }) {
            throw Object.assign(new Error(message), { name });
        }
    });
});
↪ node --test
▶ Circuit breaker with abort controller
  ✔ Should work (153.779083ms)
  ✔ Should timeout (1008.148333ms)
  ✖ Should work again (1.391459ms)
    Error [AbortError]: This operation was aborted
        at TestContext.<anonymous> (/Users/leonbesc/Documents/viaplay/opossum-abort-controller-issue/test.js:42:33)
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
        at async Test.run (node:internal/test_runner/test:640:9)
        at async Suite.processPendingSubtests (node:internal/test_runner/test:382:7)

Possible ways to fix this:

It seems we cannot re-use the same AbortController and AbortSignal pair for all requests since AbortSignal preserves its aborted state.

I can only see one way to fix this problem - use new AbortSignal for every requests (see #780).

For example, opossum could pass the signal object as the first argument to the action handler.

cedrick-ah commented 1 month ago

Hello @lbeschastny, I would to work on this with you but it seem like it is sait that "The intent is that when a circuit times out, an assumption is made that all other ongoing requests to the same endpoint will time out as well". That is why the subsequent request failed. I think what we need is a reset timeout for the abort controller.

lbeschastny commented 1 month ago

@cedrick-ah the problem here is not about other ongoing requests since this limitation is clearly stated in the readme.

The problem is that all new requests are immediately aborted by fetch after the first timeout due to the AbortController state which is changed to aborted the first time the abort signal is sent.

cedrick-ah commented 1 month ago

Okay, maybe you can try this approach to change abort for each request.

const { describe, it, before, after } = require('node:test');
const assert = require('node:assert/strict');
const http = require('node:http');

const CircuitBreaker = require('opossum');

describe('Circuit breaker with abort controller', () => {
  let abortController;
  const get = timeout => {
    const { port } = server.address();
    abortController = new AbortController();
    const { signal } = abortController;
    const headers = { 'x-timeout': timeout };
    return fetch(`http://localhost:${port}`, { signal, headers });
  };
  const breaker = new CircuitBreaker(get, {
    abortController,
    timeout: 1000,
    volumeThreshold: 3
  });

  const server = http.createServer((req, res) => {
    const timeout = Number(req.headers['x-timeout']);
    setTimeout(() => res.end(`delayed for ${timeout}ms`), timeout);
  });

  before(() => server.listen());
  after(() => server.close());

  it('Should work', async () => {
    const response = await breaker.fire(100);
    const text = await response.text();
    assert.equal(text, 'delayed for 100ms');
  });

  it('Should timeout', async () => {
    await assert.rejects(() => breaker.fire(2000), {
      message: 'Timed out after 1000ms'
    });
  });

  it('Should work again', async () => {
    try {
      await breaker.fire(100);
    } catch ({ name, message }) {
      throw Object.assign(new Error(message), { name });
    }
  });
});
node --test
▶ Circuit breaker with abort controller
  ✔ Should work (286.944232ms)
  ✔ Should timeout (1006.156862ms)
  ✔ Should work again (117.404018ms)
▶ Circuit breaker with abort controller (1430.090547ms)

ℹ tests 3
ℹ suites 1
ℹ pass 3
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 7338.956067
lbeschastny commented 1 month ago

@cedrick-ah you create new AbortController for every request here, but for opossum the abortController option in your example is undefined.

So, requests just newer get aborted.

cedrick-ah commented 1 month ago

It will be defined after breaker.fire() is called. Also, abort controller and signal only come in pairs as you said earlier. I tested and the new requests was not aborted.

lbeschastny commented 1 month ago

@cedrick-ah I updated my Minimal reproducible example with an additional assertion to verify that the operation was actually aborted.

cedrick-ah commented 1 month ago

Okay, I understand.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 30 days with no activity.