launchdarkly / node-server-sdk

LaunchDarkly Server-side SDK for Node
Other
79 stars 65 forks source link

The timeout option for the client won't work because there's no listener for the event #282

Open homer0 opened 1 year ago

homer0 commented 1 year ago

Describe the bug The timeout in the requests is being ignored because there's no listener for the timeout event. The problem is here. The code is listening for error, but when a request times out, it emits a timeout event, that it's being ignored, so until the server fails to respond, the request will continue.

I copied the httpRequest into an isolated scenario and tried adding the following things and it seems to work:

+  let finished = false;
   const req = http.request(allOptions, resp => {
     let body = '';
     resp.on('data', chunk => {
       body += chunk;
     });
     resp.on('end', () => {
+      if (finished) {
+        return;
+      }
       callback(null, resp, body);
     });
   });
   req.on('error', err => {
     callback(err);
   });
+  req.on('timeout', () => {
+    finished = true;
+    callback(new Error('timeout'));
+    req.end();
+  });

Yeah, it could probably use a better exception :P

To reproduce Set the timeout to a very low value and throttle your network connection.

Expected behavior The timeout event will be handled, the connection terminated, and the promise rejected.

SDK version Latest

Language version, developer tools Node 16 and 18

OS/platform MacOS 12

kinyoklion commented 1 year ago

Hello @homer0,

This says SDK version latest, but the latest SDK is not located in this repository.

The latest SDK is here: https://github.com/launchdarkly/js-core/tree/main/packages/sdk/server-node

And is published under @launchdarkly/node-server-sdk.

That said it does appear the latest version would operate the same way, and will need fixed.

The problem now residing in this method: https://github.com/launchdarkly/js-core/blob/63697e15395dfb861e67528818799e90375ce80e/packages/sdk/server-node/src/platform/NodeRequests.ts#L109

What is the external behavior that is causing you a problem. While requests not timing out is a problem, mostly those requests will not block operation, but instead will cause some missed events or potentially miss logging error conditions, for example.

Thank you, Ryan

homer0 commented 1 year ago

Hi @kinyoklion !

Sorry, I didn't know I should've switched to the scoped version; I saw the last release of this one being last month and thought it was the official one.

Now I don't know if the example applies with the other package, but the scenario I had was a call to .variation inside a lambda, so I set the timeout with a just a couple of seconds (as the lambda should respond fast); if, for some reason, your API is down, or the request doesn't connect, or my local connection when testing is slow, since timeout is not executed, the request keeps going until it actually fails, without emitting error, and my code would be still awaiting the call.

Let me know if you want me to create the issue in the other repo.

Thanks!

kinyoklion commented 1 year ago

Hey @homer0,

Thanks for the info. We can go ahead and work from the issue here.

Thank you, Ryan

Filed internally as 211416

aballman commented 6 months ago

Any updates on this issue? I think it may be related to some truly ridiculous request times I'm seeing from my applications. This was mostly stable for me until the recent LaunchDarkly outage on 2/22. Since then I have seen a lot of requests take an excessive amount of time.

Screenshot 2024-03-05 at 11 50 26 AM

kinyoklion commented 6 months ago

@aballman These are streams and it is expected that they remain open forever. Incremental updates are delivered over the stream. These requests are not intended to timeout unless they are not receiving data on a regular interval, but is very different from a REST request.

kinyoklion commented 6 months ago

I am not sure what this trace is demonstrating, but if all of these streams are from a single SDK instance, then that would be representative of some kind of problem, but unrelated to this issue.

aballman commented 6 months ago

I am not sure what this trace is demonstrating, but if all of these streams are from a single SDK instance, then that would be representative of some kind of problem, but unrelated to this issue.

OK, fair enough. Appreciate the quick response. Just seemed odd that this was new-ish behavior. I have some alerts on p90 request time of outbound connections that has been quiet for ages and then this recently came up. If it's unrelated then I'll figure something else out. Thanks!

kinyoklion commented 6 months ago

@aballman That is interesting. I am unsure why that would be unless something was somehow excluding these, because it understood that it was an incremental transfer, and now somehow that isn't correctly being identifier.

An individual SDK should have one of these streams open at all times, and it basically reconnects when our back-end deploys occur.

If an individual SDK has multiple concurrent streams, then something has gone wrong.

Thank you, Ryan