launchdarkly / node-server-sdk

LaunchDarkly Server-side SDK for Node
Other
79 stars 65 forks source link

Retriable HTTP errors are treated as fatal errors #155

Closed jkehres closed 5 years ago

jkehres commented 5 years ago

I'm using version 5.7.4 of the SDK and I have code like this inside an async function:

const ld = require('ldclient-node');
const ldClient = ld.init(sdkKey);
await ldClient.waitForInitialization();
const value = await ldClient.variation(...);

Every once in a great while I get a rejected promise from the function with an error like this:

{
  "message": "Received error 502 for streaming connection - will retry",
  "name": "LaunchDarklyStreamingError",
  "stack": [
    "LaunchDarklyStreamingError: Received error 502 for streaming connection - will retry",
    "    at EventSource.es.onerror (\/var\/task\/node_modules\/ldclient-node\/streaming.js:26:10)",
    "    at emitOne (events.js:116:13)",
    "    at EventSource.emit (events.js:211:7)",
    "    at _emit (\/var\/task\/node_modules\/ldclient-node\/eventsource.js:191:17)",
    "    at ClientRequest.<anonymous> (\/var\/task\/node_modules\/ldclient-node\/eventsource.js:110:9)",
    "    at Object.onceWrapper (events.js:315:30)",
    "    at emitOne (events.js:116:13)",
    "    at ClientRequest.emit (events.js:211:7)",
    "    at HTTPParser.parserOnIncomingClient (_http_client.js:519:21)",
    "    at HTTPParser.parserOnHeadersComplete (_http_common.js:115:23)"
  ]
}

If the client is going to retry, why does it return a fatal error? I would expect it to silently retry with exponential backoff instead and only emit a fatal error after exceeding a maximum number of retries.

The error appears to originate here:

https://github.com/launchdarkly/node-server-sdk/blob/a770805e138285de9941d1fcc38e0c954e9ef0ff/eventsource.js#L114

and a few lines later there appears to be some code that will do a retry with backoff but it's not working for my use case - presumably because the error event is immediately turned into a fatal error that terminates the client.

eli-darkly commented 5 years ago

When you say "a fatal error that terminates the client", could you be more specific about a few things:

  1. Which one of the two functions you're awaiting in your example is throwing the error? I'm guessing it is waitForInitialization()?
  2. You said "I have code like this inside an async function" which implies that all of the example code (except presumably the require) is inside a function. Does that mean you are frequently calling init to create a new client instance, rather than keeping a single instance active?
  3. Besides the fact that an error was thrown, what does "terminates the client" specifically look like? Do you mean that the client instance is unusable after that point?

I do think this is unintended behavior, but I just want to understand the circumstances better.

jkehres commented 5 years ago
  1. I don't know. This is a very transient error that I cannot reproduce as it's the LD service returning a 502 response. All I have are some logs which contain the error I referenced and unfortunately it's stack trace does not indicate which method of the client is returning a rejected promise.
  2. My code runs in an AWS lambda function so it is not a long running process. I'm aware that separately initializing the client and keeping it around would be ideal. I did try this early on as lambda has a way to preserve some state between invocations but I seem to remember I had some problems - although, I don't remember the specifics now. I should probably revisit this.
  3. I just meant that the client throws an error. I don't attempt to continue using the client after it throws as I have no idea if the client is still in a usable state.
eli-darkly commented 5 years ago

Thanks, we'll look into this.

HistoricallyCorrect commented 5 years ago

I think I'm experiencing similar issue.

const ldClient = LaunchDarkly.init(sdkKey);
const ldPromise = ldClient.waitForInitialization();
ldPromise.then(() => { /*our code blah*/ });

I could reproduce by disconnecting wifi locally to generate error (node:1) UnhandledPromiseRejectionWarning: LaunchDarklyStreamingError: Connection closed, reconnecting.

I tried catching the promise error by

ldPromise
  .then(() => { /*our code blah*/ })
  .catch(() => { /*blah*/ }; <--- catch NEVER gets called

but code still blows up unexpected.

After hours, I realizsed that in order to fail / catch gracefully I must place the catch() before my custom code then(), which got me confused.

const ldPromise = ldClient.waitForInitialization()
  .catch(() => { /*blah*/ }) <--- catch NOW does get CALLED
  .then(() => { /*our code blah*/ });

So a simple example would be able to demonstrate, copy / paste below snippets into the demo window on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/then

var promise1 = new Promise(function(resolve, reject) {
  reject('manually rejected')
});

promise1
  .then(function(value) {
    console.log('from then', value);
  }).catch(function(value) {
    console.log('from catch', value);
  });

//> "from catch" "manually rejected"

And

var promise1 = new Promise(function(resolve, reject) {
  return Promise.reject('manually rejected');
});

promise1
  .then(function(value) {
    console.log('from then', value);
  }).catch(function(value) {
    console.log('from catch', value);
  });

//> NOTHING GETS PRINTED OUT IN CONSOLE

And after browsing node-server-sdk source code, I think this line is probably causing the the unexpected catch behavior.

https://github.com/launchdarkly/node-server-sdk/blob/master/index.js#L161

  client.waitForInitialization = function() {
    if (initComplete) {
      return Promise.resolve(client);
    }
    if (failure) {
      return Promise.reject(failure); <--- this line!
    }

    return new Promise(function(resolve, reject) {
...
...

Would love some feedback from the author and any potential updates to fix this weird behavior.

eli-darkly commented 5 years ago

@HistoricallyCorrect I'm not sure I completely understand your point. I have two thoughts:

  1. What you're reporting is different from what jkehres was reporting, in that this isn't a case of receiving a 502 error from the server—in which case it would make sense for the SDK to keep trying, rather than throwing an error right away, so jkehres has a good point. Instead, in your case the network is unavailable; you turned off the wi-fi. So it is appropriate for the SDK to say "there's an error, we can't initialize the SDK" rather than continuing to retry indefinitely.

  2. What you're saying about the behavior of Promise.catch() is somewhat confusing and is not, I think, an issue specific to this SDK:

I tried catching the promise error by

ldPromise
.then(() => { /*our code blah*/ })
.catch(() => { /*blah*/ }; <--- catch NEVER gets called

but code still blows up unexpected. After hours, I realized that in order to fail / catch gracefully I must place the catch() before my custom code then(), which got me confused.

In my experience, that is not how Promise works. If p is a Promise that has not yet been rejected, then p.then(f).catch(g) and p.catch(g).then(f) will both do exactly the same thing when it is rejected: it will run g, and not run f, and you will not get an "unhandled rejection" error. The only case I can think of where that would not be true is if you did not call then() and catch() in the same pass through the event loop, but instead called then() and then yielded up execution so that another piece of async code could run that rejected the promise before the catch() handler had been attached. But that's not the case in your code sample: nothing gets a chance to execute in between the time you called then() and the time you called catch().

Assuming the catch() handler did get attached to the Promise before the error happened, I don't see how you could get an unhandled rejection error unless the code inside of catch() threw another exception of its own.

But, again, this is not anything specific to the SDK; it's just about how Promise works. I'm not sure what is going on in your code, but there is no way for the SDK to throw some kind of special error that would not be caught by catch().

HistoricallyCorrect commented 5 years ago

@eli-darkly Thanks for the reply. I can start a new issue if you think the issue I'm reporting is totally unrelated to the original reported issue. But the error I'm getting is also LaunchDarklyStreamingError: Connection closed, reconnecting.

I think the issue I'm reporting is due to the use of streaming from SDK, when connection is unstable or reconnecting, from within the SDK which will cause exception from ldClient.waitForInitialization() and the user has to manually handle the exception. https://github.com/launchdarkly/node-server-sdk/blob/master/index.js#L161

There's nothing funny going on in my code. I've pasted all the code I have, except the inner bit of then(). But my .then() never gets hit in the case of unstable connection / disconnection / re-connection. And above, my 2 slightly different versions of promise demonstrates exactly the behavior I'm describing.

If you say that I can only put catch right after waitForInitialization, fine. But I think (I could be wrong) usually, when I use SDK, I wouldn't expect to put 2 catches like waitForInitialization().catch().then().catch(). The first catch to catch the potential SDK connection exception and the 2nd catch that catches error coming out of my code. And the reason for the first catch is so that even when there's connection issue to LD.com I don't want my own code to fail and I would return some default value, but if I have to first catch waitForInitialization error, then that seems bit strange. Because in streaming, it's likely that the connection is retried. I think the assumption the SDK is making is when attempting to connect or re-connect fails, the user would catch the error and do something about it, but I think the SDK should internally keep re-trying without causing any exception to the end user of the SDK.

eli-darkly commented 5 years ago

@HistoricallyCorrect I understand that the general error message is the same, but you are definitely reporting a different issue, since the original reporter did not say anything about the catch() handler not working.

Again, I'm not sure how a Promise can behave the way you're saying it is behaving— that is, calling your catch handler only if you did .catch().then(), and not if you did .then().catch(). I have not been able to reproduce that behavior with a basic piece of Promise-based Node code, and I can't think how the specific way the Promise is used in the SDK would make any difference to that.

eli-darkly commented 5 years ago

Fixed in the 5.9.2 release (fixed the original issue, that is - not the other discussion threads here).