launchdarkly / node-server-sdk

LaunchDarkly Server-side SDK for Node
Other
79 stars 65 forks source link

LaunchDarkly no longer reconnects to service after connection is terminated #178

Closed mwksl closed 4 years ago

mwksl commented 4 years ago

Is this a support request? This issue tracker is maintained by LaunchDarkly SDK developers and is intended for feedback on the SDK code. If you're not sure whether the problem you are having is specifically related to the SDK, or to the LaunchDarkly service overall, it may be more appropriate to contact the LaunchDarkly support team; they can help to investigate the problem and will consult the SDK team if necessary. You can submit a support request by going here and clicking "submit a request", or by emailing support@launchdarkly.com.

Note that issues filed on this issue tracker are publicly accessible. Do not provide any private account information on your issues. If your problem is specific to your account, you should submit a support request as described above.

Describe the bug A clear and concise description of what the bug is.

To reproduce Steps to reproduce the behavior.

Expected behavior LaunchDarkly NodeSDK should attempt to retry connection after disconnecting.

Logs If applicable, add any log output related to your problem.

SDK version Anything after 5.11.0

Language version, developer tools Node.js 10.18.0 with TypeScript

OS/platform Dockerized Alpine linux and OSX

Additional context Add any other context about the problem here.

mwksl commented 4 years ago

This issue actually appears to be rooted in the etag request, and not the callback in the polling function.

Tadwork commented 4 years ago

After further investigation, it looks like this fails when a call is made to refresh the feature store but gets a 304. That causes https://github.com/Belema/request-etag/blob/master/lib/request-etag.js#L69 to try to look it up in the cache but the cache-hit is empty and the request fails and the server does not continue polling.

debug: Elapsed: 4 ms, sleeping for 29996 ms
warn: Received error getaddrinfo ENOTFOUND app.launchdarkly.com app.launchdarkly.com:443 for polling request - will retry
debug: Polling LaunchDarkly for feature flag updates
error: [UNCAUGHT EXCEPTION]
{ message: 'Cannot read property \'data\' of undefined',
  stack: 'TypeError: Cannot read property \'data\' of undefined\n    at Request._callback (/Users/xxxxx/git/app-name/node_modules/request-etag/lib/request-etag.js:69:37)\n    at Request.self.callback (/Users/xxxxx/git/app-name/node_modules/request/request.js:185:22)\n    at emitTwo (events.js:126:13)\n    at Request.emit (events.js:214:7)\n    at Request.<anonymous> (/Users/xxxxx/git/app-name/node_modules/request/request.js:1161:10)\n    at emitOne (events.js:116:13)\n    at Request.emit (events.js:211:7)\n    at IncomingMessage.<anonymous> (/Users/xxxxx/git/app-name/node_modules/request/request.js:1083:12)\n    at Object.onceWrapper (events.js:313:30)\n    at emitNone (events.js:111:20)' }

The relevant line of code in request-etag is https://github.com/Belema/request-etag/blob/master/lib/request-etag.js#L69

eli-darkly commented 4 years ago

@mwksl Thanks for the problem report, and we'd like to figure out what's going on, but we're going to need a bit more information than you've provided. The reason we specifically ask in the issue template for a clear description of the bug, log output, and especially steps to reproduce, is that otherwise we have to guess what you mean by a general term like "connection is terminated" (and guess what kind of error messages you might be seeing), and our guesses might not be accurate.

I assume from your follow-up comment that you've configured the SDK to use polling rather than streaming, so that is a slight clue, but that's an example of the kind of thing that really needs to be in the original problem report. The log output in @Tadwork's comment is another example (assuming that the two of you are working together so that that is really output from the same failure). But this is still very hard to interpret because I don't know exactly what "connection is terminated" means in this context— polling requests are not long-lived like streaming mode, they are individual requests. If what you mean is that the initial poll was successful, but then the network became unavailable (as the ENOTFOUND error suggests), then I don't understand the statement that the request "gets a 304"— it can't get any HTTP response status if the network is unavailable. Again, including steps to reproduce would clarify this.

I'm told by the support team that you also have an active support request, so it's probably simplest to continue follow-up via that channel rather than here, but again I'd just like to emphasize that it's much easier for us to investigate these things with clear information. Filing a minimal issue isn't really going to save you any time because we'll end up having to ask all these things anyway.

Tadwork commented 4 years ago

thanks @eli-darkly , you are correct in assuming that @mwksl and I work together and are investigating the same issue. I will follow up via the support request