launchdarkly / js-sdk-common

Code shared between all LaunchDarkly client-side JS-based SDKs
Other
4 stars 27 forks source link

Node process hangs before exiting #99

Closed bvaughn closed 5 months ago

bvaughn commented 7 months ago

Describe the bug In the event of no Internet connection, the js-sdk-common module seems to prevent process.exit() from completing for several seconds (up to ~15s in my own testing). This seems to be related to pending async calls (and removing this module from the NodeJS app I have been developing fixes this problem completely).

To reproduce Run the following code in NodeJS while disconnected from the Internet:

import { initialize } from "launchdarkly-node-client-sdk";

async function repro() {
  let client;
  try {
    console.log("Initializing LD client ...");
    client = initialize("...key...", {
      anonymous: true,
    });
  } catch (error) {
    console.log("ERROR: LD client initialization failed");
  }

  try {
    console.log("Calling LD client.waitForInitialization() ...");

    client.waitForInitialization();
  } catch (error) {
    console.log("ERROR: client.waitForInitialization failed");
  }

  console.log("Calling process.exit()");
  process.exit(0);
}

repro();

Expected behavior The process would be terminated immediately.

Logs The app prints the following, and then hangs for several seconds before exiting:

Initializing LD client ...
Calling LD client.waitForInitialization() ...
Calling process.exit()

SDK version The latest version as of the current time:

Language version, developer tools Node v20.2

OS/platform MacOS 14.3

Additional context I believe the problematic setTimeout calls are in these two locations:

https://github.com/launchdarkly/js-sdk-common/blob/0c4a8c0f10cfd22c089b875f62fc0a2e0ba6c504/src/diagnosticEvents.js#L170

https://github.com/launchdarkly/js-sdk-common/blob/0c4a8c0f10cfd22c089b875f62fc0a2e0ba6c504/src/EventProcessor.js#L163-L169

kinyoklion commented 7 months ago

@bvaughn Have you tried closing the SDK before exiting?

bvaughn commented 7 months ago

@kinyoklion Yes. I didn't include that in my repro because I was trying to trim it down as much as possible, but in our larger app– yes, we call close.

In this smaller project, it seems like calling close() is sufficient to fix the hang though, which is interesting.

kinyoklion commented 7 months ago

Thank you @bvaughn,

Interesting. I would expect there could be a delay without closing the SDK. Closing should cancel most outstanding tasks. There could be tasks that are still running, like finishing up an in-progress http request, but I wouldn't expect it to consistently stall. I've filed a task to look into this.

Thank you, Ryan

Filed internally as 240109

kinyoklion commented 7 months ago

@bvaughn Are you only seeing this when the SDK was unable to initialize (no internet connection)? Or are you seeing it in longer running applications that have initialized as well?

Also have you only seen this on macOS?

bvaughn commented 7 months ago

@kinyoklion Only when there was no Internet to initialize in the first place. (Truthfully I haven’t tested the second scenario. Our use case is a CLI that is not typically a long-running process.)

kinyoklion commented 7 months ago

Thank you @bvaughn

Unfortunately I have not been able to reproduce anything similar under any conditions so far. With or without networking or with or without closing the client. The above example always exits without delay.

Is this something you are seeing widely? Is this running on fleet machines where they all have uniform software? I am curious if a VPN or some other software could be needed in combination to exhibit the problem.

Apple Silicon or Intel mac?

Thank you, Ryan

bvaughn commented 7 months ago

@kinyoklion No, it's not wide spread. We're only running this on a few machines so far, and I think we've seen it on two of them (both Apple M3 Pros).

I guess maybe the best thing for us to do for now is for us (me) to keep an eye on it to see if it's more widespread. I appreciate you taking a look though.

kinyoklion commented 7 months ago

Thank you @bvaughn

I don't have an M3 to readily test, but I can at least try on an M1 (I was testing on an intel mac).

I have some suspicions around DNS resolution on macOS, but I don't know of a way to illicit the bad behavior. Theoretically it should fail nearly immediately in the no network case, but I have seen some oddities when multiple networks are available, for instance wifi is off, but there is a virtual network with a virtual machine or docker container.

Thanks, Ryan

kinyoklion commented 6 months ago

Hello @bvaughn,

I have released a new version of launchdarkly-eventsource which could impact this. https://github.com/launchdarkly/js-eventsource

A release of the node-client-sdk will be coming which incorporates this updated version. The issue was that in some situations a callback would get executed twice, which could schedule two reconnect timers and potentially additional network requests that were not closed when closing the client.

I am not 100% sure that this will impact your issue, but it at least can result in similar behavior, especially in cases with no network connection available.

Thank you, Ryan

bvaughn commented 6 months ago

Thank you for the update, @kinyoklion!

kinyoklion commented 6 months ago

Version 3.2.1 has been released which includes the event source changes.

Thank you, Ryan

kinyoklion commented 5 months ago

I am closing this for now. If the problem resurfaces, then it may be re-opened.