launchdarkly / js-core

LaunchDarkly monorepo for JavaScript SDKs
Other
12 stars 12 forks source link

Multiplying reconnection attempts leading to container being OOMKilled when there is no internet connectivity #458

Closed raverin7 closed 1 month ago

raverin7 commented 1 month ago

Describe the bug Containarized application keeps consuming memory until OOMKilled, when there is no internet connectivity. It is a result of two event handlers (one for timeout and one for error) being invoked on a single reconnection attempt, and each of them produces another reconnection, and so on.

To reproduce Environment:

Windows 10 with WSL: WSL version: 2.1.5.0 Kernel version: 5.15.146.1-2 WSLg version: 1.0.60 MSRDC version: 1.2.5105 Direct3D version: 1.611.1-81528511 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.19045.4291

Ubuntu installed in WSL:

wsl -l -v
  NAME            STATE           VERSION
* Ubuntu-24.04    Running         2

Docker Engine installed on Ubuntu Docker version 26.1.1, build ac2de55

and Contaiarized Application is run on Docker. Its base image is node:20.9.0-alpine3.17

So we have: Windows 10 (WSL) -> Ubuntu 24.04 (Docker) -> Containarized Application (Base image: node:20.9.0-alpine3.17)

For reproduction purposes I have prepared simple test project with application that creates LDClient and then waits 5 minutes.

index.js

const LaunchDarkly = require('@launchdarkly/node-server-sdk');
const { memoryUsage } = require('node:process');

const sleep = (miliseconds) => {
    return new Promise(resolve => setTimeout(resolve, miliseconds))
}

async function main() {
    try {
        console.log(`Service starting...`)

        const ldSdkKey = process.env.LD_SDK_KEY || 'wrongLdSdkKey';
        const ldClient = LaunchDarkly.init(ldSdkKey)

        const initialMemoryUsage = memoryUsage();

        const waitTimeInMin = 5;
        for (let i = 0; i < waitTimeInMin; i++) {
            await sleep(60000)
            console.log(`Waited ${i + 1} out of ${waitTimeInMin} minutes`)
        }

        const finalMemoryUsage = memoryUsage();

        console.log(`Initial memory usage: ${JSON.stringify(initialMemoryUsage)}`)
        console.log(`Final memory usage:   ${JSON.stringify(finalMemoryUsage)}`)

    } catch (error) {
        console.log(error.message)
    } finally {
        console.log(`Service stopping...`)
        process.exit(0)
    }
}

main()

Dockerfile

FROM node:20.9.0-alpine3.17

COPY package.json /
COPY index.js /

RUN npm install --verbose

CMD ["node", "index.js"]

package.json

{
  "name": "testproject",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "@launchdarkly/node-server-sdk": "9.4.1"
  }
}
  1. Build image: sudo docker build . -t testproject

  2. Disable internet acces either from Windows: Go to the taskbar and clik disconnect on the interface that provides internet or from Ubuntu 24.04: sudo ip link set <internet-access-interface> down and run containerized application: sudo docker run testproject

  3. or just run containerized application without network: sudo docker run --network none testproject

One can monitor memory usage with: docker stats. Please observe that memory usage keeps increasing. Containarized application completes after 5 minutes and prints initial and final memory usage.

This is the simplest, fastest way to show retry mechanism that keeps consuming more and more memory.

The other scenarios would be to run the application to successfully connect to LaunchDarkly and then disconnect internet, but in that case one would have to wait 300 000 ms (config.readTimeoutMillis) for the issue to start happening.

In case of internet disconnect for more than 5 minutes and then reconnection, LD Client might store number of streams at a time.

Expected behavior Reconnect mechanism in case of broken internet access does not spawn two connection attempts at a time, but only one.

Logs Please find here snippet of logs from begging and end of aforementioned application being ran.

2024-05-06T14:28:40.586969346Z  Service starting...
2024-05-06T14:28:45.673648108Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:45.674880540Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:45.678820695Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:45.678838049Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:52.574698885Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:52.575039922Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:52.577759052Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:52.577811212Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:52.580667030Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:52.580732352Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:57.585263265Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:57.585346038Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:58.581861360Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:58.582371887Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:58.582840642Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:58.582876722Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:58.583730398Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:58.583936364Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:58.585626544Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:58.585638313Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:29:02.590233812Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:29:02.590290656Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:29:03.589050885Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:29:03.592849829Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:29:03.592870746Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:29:03.592873845Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
(...)
2024-05-06T14:33:48.511116307Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.511140927Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.511145208Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.511148712Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.512325319Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.512342154Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.512346436Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.512363174Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.512366774Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.512370375Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.512373489Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.512376798Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.512380009Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.512383220Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.513720199Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.513737229Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.513741608Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.513745208Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.516128688Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.516151362Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.520404412Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.520426599Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.521619556Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.521634444Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.521637850Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.521640672Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.524050523Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.524071348Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.525230148Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.525254573Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.526479740Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.526490542Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:48.526496380Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:33:48.526499300Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:33:49.234883322Z  Waited 5 out of 5 minutes
2024-05-06T14:33:49.235133999Z  Initial memory usage: {"rss":57806848,"heapTotal":11313152,"heapUsed":7534024,"external":2132046,"arrayBuffers":24663}
2024-05-06T14:33:49.235185088Z  Final memory usage:   {"rss":159645696,"heapTotal":38051840,"heapUsed":36372152,"external":2131158,"arrayBuffers":24611}
2024-05-06T14:33:49.235201340Z  Service stopping...

My observations:

2024-05-06T14:28:40.586969346Z  Service starting...
2024-05-06T14:28:45.673648108Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:45.674880540Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:45.678820695Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:45.678838049Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:52.574698885Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:52.575039922Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:52.577759052Z  warn: [LaunchDarkly] Received I/O error (Read timeout, received no data in 300000ms, assuming connection is dead) for streaming request - will retry
2024-05-06T14:28:52.577811212Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:52.580667030Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:52.580732352Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
2024-05-06T14:28:57.585263265Z  warn: [LaunchDarkly] Received I/O error (getaddrinfo EAI_AGAIN stream.launchdarkly.com) for streaming request - will retry
2024-05-06T14:28:57.585346038Z  info: [LaunchDarkly] Will retry stream connection in 1000 milliseconds
  1. There is a connection attempt.
  2. Event handler for timeout is being invoked: timeout is ~5 seconds after service started, but in log message is that the stream did not received data for 300 seconds. Might be incorrect log message. I think node request timeouts here. This event handler would schedule another retry attempt in 1 second.
  3. At nearly the same time event handler for error EAI_AGAIN is invoked which would schedule another retry attempt in 1 second. So now we have two concurrent reconnection attempts.
  4. Next there are two log messages indicating timeout of those two attempts
  5. They are followed by another two errors. Meaning here from a single connection attempt we will have four concurrent retry attempts by now, and they keep multiplying. If the connection with internet would be reestablished now, LD Client would have 4 streams.

SDK version

"dependencies": {
    "@launchdarkly/node-server-sdk": "9.4.1"
  }

Language version, developer tools Base image: node:20.9.0-alpine3.17

OS/platform Please read reproduction paragraph: Windows 10 (WSL) -> Ubuntu 24.04 (Docker) -> Containarized Application (Base image: node:20.9.0-alpine3.17)

Additional context One can add a log printing instance of req in launchdarkly-eventsource eventsource.js:connect() in handlers for timeout and error events. In such logs it should be visible that both of them are invoked from single connection attempt leading to snowballing:

  1. Timeout comes from req timeout
  2. Error comes from DNS error and with time they keep multiplying causing more and more memory to be consumed.
kinyoklion commented 1 month ago

Hello @raverin7,

Thank you for reporting the issue. We will investigate.

Filed internally as 245014

kinyoklion commented 1 month ago

Hello @raverin7,

It is unclear why the request triggers the timeout in these conditions, but it is clear that it does, so I am changing our underlying event source to ensure that each connection can only 'fail' a single time.

Additionally I did a release today that ensures that our exponential backoff logic is being applied, which helps to mitigate this until the event source changes are released.

Thank you, Ryan

kinyoklion commented 1 month ago

@raverin7 This should be fixed in version 9.4.3.

Thank you for the report and detailed observations!