temporalio / sdk-typescript

Temporal TypeScript SDK
Other
520 stars 104 forks source link

[Bug] Jest detects open handles #928

Open bergundy opened 1 year ago

bergundy commented 1 year ago

Getting this warning when running a simple test with the test environment:

 $ ./node_modules/.bin/jest --detectOpenHandles

...
Jest has detected the following 1 open handle potentially keeping Jest from exiting:

  ●  neon threadsafe function
...

Code for reproduction:

import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker, Runtime, DefaultLogger, LogEntry } from '@temporalio/worker';
import { someWorkflow } from './workflows';

let testEnv: TestWorkflowEnvironment;

beforeAll(async () => {
  // Use console.log instead of console.error to avoid red output
  // Filter INFO log messages for clearer test output
  Runtime.install({
    logger: new DefaultLogger('WARN', (entry: LogEntry) => console.log(`[${entry.level}]`, entry.message)),
  });

  testEnv = await TestWorkflowEnvironment.createTimeSkipping();
});

afterAll(async () => {
  await testEnv?.teardown();
});

test('some random test', async () => {
  const { client, nativeConnection } = testEnv;
  const worker = await Worker.create({
    connection: nativeConnection,
    taskQueue: 'test',
    workflowsPath: require.resolve('./workflows'),
    activities: {
      // ...
    },
  });

  await worker.runUntil(async () => {
    const result = await client.workflow.execute(someWorkflow, {
      args: [],
      workflowId: 'money-transfer-test-workflow',
      taskQueue: 'test',
    });

    expect(result).toEqual('test');
  });
});
lorensr commented 1 year ago

Val did some investigating in https://github.com/temporalio/samples-typescript/issues/189

mjameswh commented 1 year ago

Test case can be reduced to this:

test(`core-bridge registerErrors() doesn't leave open handles`, async () => {
    const native = require("@temporalio/core-bridge");

    native.registerErrors({
        IllegalStateError: () => ({}),
        ShutdownError: () => ({}),
        TransportError: () => ({}),
        UnexpectedError: () => ({}),
        GracefulShutdownPeriodExpiredError: () => ({}),
    });
});

The issue is specifically with registerErrors(). For example, that one doesn't trigger the error message:

test(`core-bridge getTimeOfDay() doesn't leave open handles`, async () => {
    const native = require("@temporalio/core-bridge");
    native.getTimeOfDay();
}
mjameswh commented 1 year ago

The issue relates to usage of the .root() function to keep references to error constructors from Rust land.

pub fn register_errors(mut cx: FunctionContext) -> JsResult<JsUndefined> {
    // ...

    let mapping = cx.argument::<JsObject>(0)?;
    let shutdown_error = mapping
        .get::<JsFunction, _, _>(&mut cx, "ShutdownError")?
        .root(&mut cx);   // <----

Neon's Root object internally register a ThreadSafeFunction, which is a special type of resource provided by Node's napi allowing native code to call a JavaScript function from any thread. It is that ThreadSafeFunction resource that Jest detects as being left open.

mjameswh commented 1 year ago

Opened a ticket in Neon: https://github.com/neon-bindings/neon/issues/948

igorsechyn commented 1 year ago

We hit the same issue in our test setup. Weirdly it causes issues in CircleCI as jest just won't exit despite using --forceExit flag. We upgraded to the latest version of sdk and the issue still persists. It is intermittent, so it is hard to pinpoint it to temporal tests, but this behaviour started around the same time as we added temporal tests

mandriv commented 1 year ago

Is there any known workaround?

Irvenae commented 1 year ago

Since the new version we are hitting an issue with Jest on CircleCI as well. Running into Too long with no output (exceeded 10m0s): context deadline exceeded

mjameswh commented 1 year ago

Is there any known workaround?

I'm not aware of any workaround regarding Jest's warning about open handle. This warning message can simply be ignored. This may also cause a 10 seconds delay on process termination, which should generally not be that much of an issue.

mjameswh commented 1 year ago

Since the new version we are hitting an issue with Jest on CircleCI as well. Running into Too long with no output (exceeded 10m0s): context deadline exceeded

We hit the same issue in our test setup. Weirdly it causes issues in CircleCI as jest just won't exit despite using --forceExit flag. We upgraded to the latest version of sdk and the issue still persists. It is intermittent, so it is hard to pinpoint it to temporal tests, but this behaviour started around the same time as we added temporal tests

I don't think this is related to the "detected open handle" bug. Wondering if this might be related, though there's unfortunately very little context to work with.

@mandriv @Irvenae Let assume this is a different issue... Could you please provide more details on these situations? Here are some ideas info that might help figure this out, but feel free to add whatever that you think could be useful:

Irvenae commented 1 year ago

This is the log we get in circleCI:

2023-02-28T09:49:27.110Z [WARN] This program is running inside a containerized environment with a memory constraint (eg. docker --memory 70226m). Node itself does not consider this memory constraint in how it manages its heap memory. There is consequently a high probability that the process will crash due to running out of memory. To increase reliability, we recommend adding '--max-old-space-size=52669' to your node arguments. Refer to https://docs.temporal.io/application-development/worker-performance for more advice on tuning your workers.
2023-02-28T09:49:27.145998Z  INFO temporal_sdk_core::ephemeral_server: Downloading https://temporal.download/assets/temporalio/sdk-java/releases/download/v1.17.0/temporal-test-server_1.17.0_linux_amd64.tar.gz to /tmp/temporal-test-server-sdk-typescript-1.6.0
2023-02-28T09:49:27.871Z [INFO] Creating worker {
  options: {
    namespace: 'default',
    identity: '454@f5f8c41c3a04',
    shutdownGraceTime: '10s',
    maxConcurrentActivityTaskExecutions: 100,
    maxConcurrentLocalActivityExecutions: 100,
    enableNonLocalActivities: true,
    maxConcurrentWorkflowTaskExecutions: 100,
    stickyQueueScheduleToStartTimeout: '5m',
    maxHeartbeatThrottleInterval: '60s',
    defaultHeartbeatThrottleInterval: '30s',
    isolateExecutionTimeout: '4294967295ms',
    workflowThreadPoolSize: 1,
    maxCachedWorkflows: 1761,
    enableSDKTracing: false,
    showStackTraceSources: true,
    reuseV8Context: true,
    debugMode: true,
    interceptors: { activityInbound: [Array], workflowModules: [Array] },
    sinks: { logger: [Object] },
    workflowsPath: '/root/repo/packages/temporal-features/src/tests/allWorkflows.ts',
    bundlerOptions: { webpackConfigHook: [Function: webpackConfigHook] },
    activities: {
      cancelJob: [Getter],
      createItem: [Getter],
      createItemWithData: [Getter],
      deleteItem: [Getter],
      findItems: [Getter],
      findItemsInAllVersions: [Getter],
      getFileJson: [Getter],
      getItem: [Getter],
      persistData: [Getter],
      setItem: [Getter],
      createOrMemoizeJob: [Getter],
      getJobResult: [Getter],
      getResult: [Getter],
      queryWorkflow: [Getter],
      signalWithStartWorkflowForItem: [Getter],
      signalWorkflow: [Getter],
      startAndQueryWorkflowForItem: [Getter]
    },
    taskQueue: 'test',
    connection: NativeConnection {
      nativeClient: [External: 7fe4b80045b0],
      referenceHolders: Set(0) {}
    },
    shutdownGraceTimeMs: 10000,
    stickyQueueScheduleToStartTimeoutMs: 300000,
    isolateExecutionTimeoutMs: 4294967295,
    maxHeartbeatThrottleIntervalMs: 60000,
    defaultHeartbeatThrottleIntervalMs: 30000,
    loadedDataConverter: {
      payloadConverter: [DefaultPayloadConverter],
      failureConverter: [DefaultFailureConverter],
      payloadCodecs: []
    }
  }
}
2023-02-28T09:49:36.018Z [INFO] asset workflow-bundle-a2ca801512c61451083d.js 22.2 MiB [emitted] [immutable] (name: main)
2023-02-28T09:49:36.018Z [INFO] runtime modules 1.25 KiB 6 modules
2023-02-28T09:49:36.018Z [INFO] modules by path ./packages/ 4.44 MiB 726 modules
2023-02-28T09:49:36.019Z [INFO] modules by path ./node_modules/ 3.54 MiB 293 modules
2023-02-28T09:49:36.019Z [INFO] modules by path ./tooling/tool-version-sync/lib/*.js 4.99 KiB
2023-02-28T09:49:36.019Z [INFO]   ./tooling/tool-version-sync/lib/index.js 1.2 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO]   ./tooling/tool-version-sync/lib/toolVersions.js 1.54 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO]   ./tooling/tool-version-sync/lib/toolMemoryLimits.js 2.25 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] __temporal_custom_payload_converter (ignored) 15 bytes [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] __temporal_custom_failure_converter (ignored) 15 bytes [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] webpack 5.75.0 compiled successfully in 7586 ms
2023-02-28T09:49:36.116Z [INFO] Workflow bundle created { size: '22.12MB' }
2023-02-28T09:49:37.170023Z  INFO temporal_sdk_core::worker: Initializing worker task_queue=test namespace=default
2023-02-28T09:49:37.172Z [INFO] Worker state changed { state: 'RUNNING' }
(node:454) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)

Too long with no output (exceeded 10m0s): context deadline exceeded

In this run it only runs 1 test file. In this test file we only create 1 timeSkipping test env, which we reuse for 12 tests. I checked the memory and we only use at a max 25% Mem of the instance so OOM is out of the question. The MaxListenersExceededWarning can be ignored, sth in our code we need to resolve.

A successful run would have 3 min later the Worker state changed { state: 'STOPPING' } and etc. We only print info logs I will try to reproduce with debug logs as well.

Irvenae commented 1 year ago

I might have been on the wrong track, I today reran this 10 times on CircleCI but no failure. I know for sure this happens when we have multiple tests running, but there it could be OOM as well (so setting node options and/or going to a single Jest worker to see if it is resolved). Anyway, I am also further investigating to potentially pinpoint it better.

igorsechyn commented 1 year ago

FWIW, we found that we were either running out CPU or Memory, when executing in CircleCI. The only thing that helped us is to reduce number of jest workers. we had not had a single failure since then.

mjameswh commented 1 year ago

I checked the memory and we only use at a max 25% Mem of the instance so OOM is out of the question.

Not sure if this is the issue here, but I personally recommend to always explicitly set Node's --max-old-space-size (see this page for some details).

Based on your log, it appears you are executing inside a containerized environment with a memory constraint. Node itself doesn't play great in such cases, because it doesn't know about those constraints and instead configure its heap allocation and garbage collection to based on the machine's total memory.

The opposite is also possible: by default, Node will configure its heap allocation limit and garbage collection algorithms to 25% of available memory, up to a limit of 4GB (assuming node 14 to 18, with 64 bits cpu). That means that your Node process may not be taking advantage of all the resources available to it, which could explain that it is not performing as it should.

Both cases can be resolved by explicitly setting and properly tuning Node's --max-old-space-size.

NivOclear commented 1 year ago

so any progress regarding this issue ? this not just a warning in our cicd its break the jest test and not passing them

Irvenae commented 1 year ago

We have resolved it by using a bigger machine and reducing the number of jest workers and setting --max-old-space-size (taking into account the nr of workers).

I am now evaluating AVA tbh, because attaching the debugger is spotty with Jest.

Kannndev commented 10 months ago

I am not sure if this is relevant, but i will add here just to get others opinion as well,

When I run the test with createLocal it works but when i use createTimeSkipping test times out even after 60s. The example test is

import { DefaultLogger, LogEntry, Runtime, Worker } from '@temporalio/worker';

import { TestWorkflowEnvironment } from '@temporalio/testing';
import { WorkflowCoverage } from '@temporalio/nyc-test-coverage';
import { example } from './workflow';

describe('Example workflow with mocks', () => {
  let testEnv: TestWorkflowEnvironment;
  const workflowCoverage = new WorkflowCoverage();

  beforeAll(async () => {
    // Use console.log instead of console.error to avoid red output
    // Filter INFO log messages for clearer test output
    Runtime.install({
      logger: new DefaultLogger('WARN', (entry: LogEntry) => console.log(`[${entry.level}]`, entry.message)),
    });

    testEnv = await TestWorkflowEnvironment.createLocal();
  });

  afterAll(async () => {
    await testEnv?.teardown();
  });

  afterAll(() => {
    workflowCoverage.mergeIntoGlobalCoverage();
  });

  it('successfully completes the Workflow with a mocked Activity', async () => {
    const { client, nativeConnection } = testEnv;
    const taskQueue = 'test';

    const worker = await Worker.create(
      workflowCoverage.augmentWorkerOptions({
        connection: nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('../workflows'),
        activities: {
          greet: async () => 'Hello, Temporal!',
        },
      })
    );

    await worker.runUntil(async () => {
      const result = await client.workflow.execute(example, {
        args: ['Temporal'],
        workflowId: 'test',
        taskQueue,
      });
      expect(result).toEqual('Hello, Temporal!');
    });
  });
});

Why is this behaviour happening ?,

bergundy commented 9 months ago

When I run the test with createLocal it works but when i use createTimeSkipping test times out even after 60s. The example test is

Thanks for the feedback.

We haven't had the capacity to investigate this further for now, we'll need more work to understand what's going on here.

jbsil commented 8 months ago

We've found a workaround for this when running Workers using the TestWorkflowEnvironment in jest with --detectOpenHandles. This does not appear to work for activities run using the MockActivityEnvironment, where the open handles still persist.

the summary of the worker workaround is:

  1. when you create your TestWorkflowEnvironment, store it on global, e.g. global.TemporalEnvironment: TestWorkflowEnvironment
  2. as you Worker.create each of your workers, store the Worker (await the call to create) into something on global, e.g. global.workers: Worker[]
  3. as you worker.run() each of your workers, store the run promise as well into something on global, e.g. global.workerPromises: Promise[]
  4. setup a jest globalTeardown to clean up all of this stuff
  5. on each of the global.workers, call .shutdown(), which will cause the workers to (eventually) shut down. do not try to await this
  6. instead, wait for all of the worker run promises, e.g. await Promise.all(global.workerPromises)
  7. finally, teardown the TWE, e.g. await global.TemporalEnvironment.teardown() e.g. teardown.ts
    export default async function teardown() {
    // ask each worker to start shutting down
    global.workers.forEach((worker) => {
    // do not wait for this, it is non-blocking
    worker.shutdown();
    });
    // the promises created by `Worker.run` will resolve once the worker has actually shutdown
    await Promise.all(global.workerPromises);
    // fyi - teardown the environment won't work until after the workers disconnect
    await global.TemporalEnvironment?.teardown();
    }

To get your typing correct, you can declare global, but note that you have to var your variables, let and const wont work. e.g.

declare global {
  var TemporalEnvironment: TestWorkflowEnvironment;
  var workers: Worker[];
  var workerPromises: Promise[];
}

We have found that the most "sane" configuration of this is to use a jest globalSetup to start the TestWorkflowEnvironment and all of the workers involved, because it more closely mirrors what will be happening in production (the server and workers will be running before a workflow payload arrives). But, you don't have to do that part.

In our worker repos, using Node 18 and temporalio/* @ 1.9.0, this closes all of our handles and lets jest exit gracefully.

We're still unable to work around the open handles from the MockActivityEnvironment, though, which is disappointing.

jbsil commented 8 months ago

It turns out that the handles opened by MockActivityEnvironment are in the Runtime. That means that you can jest config

{
  ...,
  "globalTeardown": "tests/teardown.ts"
}

teardown.ts

import { Runtime } from '@temporalio/worker';
export default async function() {
  await Runtime.getInstance().shutdown();
}

and your open handles will get cleaned up. There is some delay (a few seconds) after the tests finish while the Runtime shuts down, but it stops jest from complaining! Hurray!

mjameswh commented 8 months ago

Thanks a lot @jbsil for sharing your findings on this issue. It will certainly help a few of our users.

I have to admit that I'm very intrigued by why this solution actually resolves the symptom, as I'm pretty sure that this will not force unloading of Neon's global ThreadSafeFunction, but glad to know that it works. We're planning to dig into more into Jest related issues soon; I will experiment more with your workaround at that moment.

ilijaNL commented 4 months ago

This is still ongoing issue, sometimes it also takes sometime before everything is terminated according to jest runner.

jamespsterling commented 5 days ago

We are seeing this for MockActivityEnvironment,

Jest has detected the following 1 open handle potentially keeping Jest from exiting:

  ●  neon threadsafe function

      34 |
      35 |   beforeEach(async () => {
    > 36 |     env = new MockActivityEnvironment({ attempt: 2 });
         |           ^
      37 |
      38 |     const compiledSubject = await compileTemplate({
      39 |       mergeFields: allMergeFields,

      at ../../../../node_modules/@temporalio/core-bridge/index.js:16:14
      at Function.create (../../../../node_modules/@temporalio/worker/src/runtime.ts:202:31)
      at Function.instance (../../../../node_modules/@temporalio/worker/src/runtime.ts:194:17)
      at Activity.makeActivityLogger (../../../../node_modules/@temporalio/worker/src/activity.ts:82:34)
      at new Activity (../../../../node_modules/@temporalio/worker/src/activity.ts:61:12)
      at new MockActivityEnvironment (../../../../node_modules/@temporalio/testing/src/index.ts:416:21)
      at Object.<anonymous> (src/lib/actions/emails/prepareEmailAndMessageData.spec.ts:36:11)