launchdarkly / js-core

LaunchDarkly monorepo for JavaScript SDKs
Other
14 stars 16 forks source link

Debugging a server client holding process open #460

Open chadxz opened 4 months ago

chadxz commented 4 months ago

Looking for guidance and/or best-practices for trying to understand how best to make sure the persistent connection to Launchdarkly closes cleanly, as we're having an issue with our process being held open.

Describe the bug

When using node-server-sdk in a cron job, the persistent connection is holding the process open even after we call .close()

To reproduce

I don't know what specifically triggers this, as it doesn't appear to be consistent. When it happens, our logging shows that the close call has been made, but the process doesn't exit as it should.

Expected behavior

After calling close, the process exits instead of hanging open.

SDK version node-server-sdk@9.4.1

Language version, developer tools Docker image node:20.12.2-bookworm

Additional context I was looking at the code and noticed the types that my app has installed indicate the close method is synchronous, but the code appears to be async. Not sure what is going on there.

chadxz commented 4 months ago

Also been seeing this in our test suite. Maybe related?:

Jest has detected the following 1 open handle potentially keeping Jest from exiting:

  ●  TLSWRAP

      24 |     this.client =
      25 |       LDClient ??
    > 26 |       LaunchDarkly.init(process.env.LAUNCH_DARKLY_SDK_KEY as string, {
         |                    ^
      27 |         timeout: TIMEOUT_IN_SECONDS,
      28 |         logger: TraceLogger,
      29 |         updateProcessor: process.env.NODE_ENV === "test" ? launchDarklyTestData.getFactory() : undefined,

      at ../node_modules/@opentelemetry/instrumentation-http/src/http.ts:690:26
      at safeExecuteInTheMiddle (../node_modules/@opentelemetry/instrumentation/src/utils.ts:32:14)
      at ../node_modules/@opentelemetry/instrumentation-http/src/http.ts:689:67
      at AsyncHooksContextManager.with (../node_modules/@opentelemetry/context-async-hooks/src/AsyncHooksContextManager.ts:49:17)
      at ContextAPI.with (../node_modules/@opentelemetry/api/src/api/context.ts:77:42)
      at outgoingRequest (../node_modules/@opentelemetry/instrumentation-http/src/http.ts:679:26)
      at Object.httpsOutgoingRequest [as request] (../node_modules/@opentelemetry/instrumentation-http/src/http.ts:263:10)
      at ../node_modules/@launchdarkly/node-server-sdk/src/platform/NodeRequests.ts:124:24
      at NodeRequests.fetch (../node_modules/@launchdarkly/node-server-sdk/src/platform/NodeRequests.ts:123:12)
      at EventSender.tryPostingEvents (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventSender.ts:67:67)
      at EventSender.sendEventData (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventSender.ts:120:17)
      at EventProcessor.postDiagnosticEvent (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventProcessor.ts:165:22)
      at EventProcessor.start (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventProcessor.ts:145:12)
      at new EventProcessor (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventProcessor.ts:123:12)
      at new LDClientImpl (../node_modules/@launchdarkly/js-server-sdk-common/src/LDClientImpl.ts:177:29)
      at new LDClientNode (../node_modules/@launchdarkly/node-server-sdk/src/LDClientNode.ts:38:5)
      at new WithEvents (../node_modules/@launchdarkly/node-server-sdk/src/Emits.ts:13:10)
      at Object.init (../node_modules/@launchdarkly/node-server-sdk/src/index.ts:49:10)
      at new LaunchDarklyClient (src/integrations/launchDarkly/client.ts:26:20)
      at Object.<anonymous> (src/integrations/launchDarkly/index.ts:3:23)
      at Object.<anonymous> (test/setupAfterEnv.ts:1:1)
kinyoklion commented 4 months ago

Hello @chadxz,

Another user was encountering a similar situation recently, but as of yet we have been unable to reproduce. Could you provide details on the platform that you are experiencing the problem on?

Closing a client is synchronous, but any outstanding asynchronous tasks will keep the application running. There are a few such tasks, but all their handles are retained and should be cancelled when the client is closed. For instance there is a timer on the event processor and that timer gets cleared when closing the client. If it was not, then that timer could keep the process running.

So, in some condition, some tasks are potentially unaccounted for and then node doesn't shutdown cleanly.

We will look into it.

Filed internally as 245016

Thank you, Ryan

kinyoklion commented 4 months ago

In regards to the handle open in the unit tests I would make sure you are closing the client. Otherwise it will have recurring timers. Generally though we would suggest using an offline client, or a client configured using TestData, for testing purposes.

Thanks, Ryan

kinyoklion commented 4 months ago

@chadxz Is this happening frequently enough that you would be able to capture some additional logging. Node has a function that can show the resources that are keeping it open: https://nodejs.org/api/process.html#processgetactiveresourcesinfo

Logging the result of that function would at least provide the type of resource. So it could be determined if it was a timer or a network request, for example.

For much more specific information there are libraries like: https://www.npmjs.com/package/why-is-node-running

Which would provide callstacks of the specific resources.

Thanks, Ryan

chadxz commented 4 months ago

Another user was encountering a similar situation recently, but as of yet we have been unable to reproduce. Could you provide details on the platform that you are experiencing the problem on?

Debian docker container on Linux. (base image is node:20.12.2-bookworm)

In regards to the handle open in the unit tests I would make sure you are closing the client.

Yeah we are closing it in our afterAll() callback in Jest like this:

await client.flush();
client.close();

Is this happening frequently enough that you would be able to capture some additional logging?

It was, but one of our devs stuck a process.exit(0) call at the end of the cronjob that was hanging and as far as I know the hang stopped. Obviously this is a hack, but the fire is out at least. If it happens again i'll instrument it with the why-is-node-running call and see if we can get some better leads.

Thanks for your thoughts!

kinyoklion commented 4 months ago

Thank you @chadxz.

We did find a couple issues that could have left some handles open with streaming error conditions and have released fixes for those.

We will continue investigation. We currently do not cancel pending network requests when the client is closed, so it is possible that those requests prevent graceful shutdown in some situations. (For instance posting a batch of analytic events.)

As you mention process.exit will exit regardless of pending work.

Thank you, Ryan

hysmio commented 2 months ago

Just ran into this when using https://bun.sh instead of Node. More than likely this is on the Bun side of things, but Bun also exits normally with other libraries.

Tested using: Bun: 1.0.30 Node: v20.12.2

This is the following code I was using to test the LD Relay Proxy:

const ld = require("@launchdarkly/node-server-sdk");

const relayUri = "http://localhost:8030";
const sdkKey = ``;
let launchDarklyCleanedUp = false;

(async () => {
  const client = await ld
    .init(sdkKey, {
      baseUri: relayUri,
      streamUri: relayUri,
      eventsUri: relayUri,
    })
    .waitForInitialization({
      timeout: 5,
    });

  console.log("Initialised LD client");

  const testFlag = await client.boolVariation(
    "test-flag",
    { key: "user1" },
    true,
  );
  console.log(testFlag);

  const cleanup = async () => {
    if (!launchDarklyCleanedUp) {
      launchDarklyCleanedUp = true;
      await client.flush();
      await client.close();
      console.log("Closed LD client");
    }
  };

  await cleanup();

  process.on("SIGINT", async () => {
    await cleanup();
    process.exit(1);
  });
})();
image
kinyoklion commented 2 months ago

Hello @hysmio,

Thank you for the report.

In node you can use process.__getActiveRequests and process._getActiveHandles to determine what is holding the event loop open. (Which I can no longer find any open handles in normal usage in node.)

I do not see any comments or docs about it in bun, but it appears that those functions are implemented as stubs that always return an empty array. So unfortunately I do not see a good starting point for debugging this issue in bun. Unless there is some way to inspect the event loop and I am having a hard time discovering it.

Thank you, Ryan