launchdarkly / node-server-sdk

LaunchDarkly Server-side SDK for Node
Other
79 stars 65 forks source link

Redis Feature Store + LDD Not able to connect to redis instance #193

Closed fernyettheplant closed 4 years ago

fernyettheplant commented 4 years ago

Describe the bug We are trying to configure the Redis Feature Store + LDD in the latest version of the NodeJS SDK. However, seems that is not possible to make the connection with the Redis Store even when the connection is possible using the redis-cli

The configuration of the LD client is:

const redisConfig: ClientOpts = {
  host: 'host',
  port: 6379
};

const store: LaunchDarkly.LDFeatureStore = LaunchDarkly.RedisFeatureStore(redisConfig, 30);
const ldConfig: LaunchDarkly.LDOptions = {
  featureStore: store,
  useLdd: true
};

const ldUser: LaunchDarkly.LDUser = {
  key: 'requestID',
};

const client = LaunchDarkly.init(
  'sdkKey',
  ldConfig
);

await client.waitForInitialization();

const flag = await client.variationDetail('myFlag', ldUser, 'false');

console.log(flag.value) // false

Doing some investigation, it seems that there's a race condition in the redis_feature_store.js at the moment of trying to get the key in redis.

When we do a Variation Detail, LD uses this method from the redis feature store to get the key

  function doGet(kind, key, maybeCallback) {
    const cb = maybeCallback || noop;

    if (!connected) {
      logger.warn('Attempted to fetch key ' + key + ' while Redis connection is down');
      cb(null);
      return;
    }

    client.hget(itemsKey(kind), key, (err, obj) => {
      if (err) {
        logger.error('Error fetching key ' + key + " from Redis in '" + kind.namespace + "'", err); // eslint-disable-line quotes
        cb(null);
      } else {
        const item = JSON.parse(obj);
        cb(item);
      }
    });
  }

the problem is that the validation:

if (!connected) {}

will always evaluate to true because after this doGet method is executed, will execute this callback/event

  client.on('connect', () => {
    if (!initialConnect) {
      logger.warn('Reconnected to Redis');
    }
    initialConnect = false;
    connected = true;
  });

in this case, seems that is trying to connect to Redis AFTER the validation of the redis connection has failed making it impossible to connect to the redis store.

After some experimentation, I bypassed the connection checking in doGet like this:

  function doGet(kind, key, maybeCallback) {
    const cb = maybeCallback || noop;

    if (false) { // bypassed on purpose
      logger.warn('Attempted to fetch key ' + key + ' while Redis connection is down');
      cb(null);
      return;
    }

and that bypass worked and i was able to communicate to the Redis server:

{ value: true, variationIndex: 0, reason: { kind: 'FALLTHROUGH' } }

If you need more details on the bug, i'll be happy to expand more on this

To reproduce Configure the NodeJS SDK using the Redis Feature Store and use LDD for it:

const redisConfig: ClientOpts = {
  host: 'host',
  port: 6379
};

const store: LaunchDarkly.LDFeatureStore = LaunchDarkly.RedisFeatureStore(redisConfig, 30);
const ldConfig: LaunchDarkly.LDOptions = {
  featureStore: store,
  useLdd: true
};

const ldUser: LaunchDarkly.LDUser = {
  key: 'requestID',
};

const client = LaunchDarkly.init(
  'sdkKey',
  ldConfig
);

await client.waitForInitialization();

const flag = await client.variationDetail('myFlag', ldUser, 'false');

and try to get a variation detail

Expected behavior Being able to connect to the redis client and get the feature flag from redis using the LDD connection

Logs

{"message":"Attempted to fetch key myFlag while Redis connection is down","level":"warn"}
error: [LaunchDarkly] Unknown feature flag "myFlag"; returning default value

SDK version NodeJS SDK 5.13.3

Language version, developer tools NodeJS v12

OS/platform macOS Catalina

Additional context Add any other context about the problem here.

eli-darkly commented 4 years ago

Sorry for the delay in responding.

I appreciate the detailed problem description, but I'm still having a bit of trouble following your explanation. For one thing, I can't tell if you are saying that this always fails, or that it fails only under some circumstances. If it always fails then there must be some factor besides the code path you're pointing to, because that logic is not new and this integration has been used quite a bit without any such problems.

I also don't understand your summary of the sequence of events related to the connected state. You said that !connected will always be true (i.e. connected will always be false) because "after this doGet method is executed, will execute this callback/event" where "this event" is the 'connect' event. But those two things aren't related. The 'connect' event fires as soon as the Redis client connects to the Redis server, and that connection is initiated by redis.createClient, not by doGet.

And I don't know what you mean by "seems that is trying to connect to Redis AFTER the validation of the redis connection has failed." That sentence just does not make sense to me; perhaps you could rephrase it as a series of steps identified by specifically what you think it is trying to do in each step?

eli-darkly commented 4 years ago

OK, having looked over it again, I think I have a clearer picture.

I think this is probably a timing issue. It's not that it "is not possible to make the connection with the Redis Store", it's that it is trying to do a query before that connection has been made.

When using Redis without LDD mode— that is, if the SDK is connecting to LaunchDarkly— it does not report that the SDK client is ready to use until it has 1. obtained the flags from LaunchDarkly and 2. stored them in Redis. Because step 1 takes a little time, it's likely that the Redis client has already finished connecting before 2 happens. But in LDD mode, the startup of the SDK client does not depend on any Redis interactions, so if you make a variationDetail call immediately after starting the SDK client, it's possible that the asynchronous operation of making the Redis connection is still in progress. So the fact that we haven't seen this sort of problem reported before is probably just a timing issue, that is, no one else who's been using it has been doing flag evaluations immediately on startup.

That's worth changing, but I'll have to think about the best way to change it. There is a good reason why we have the !connected check: if the Redis connection has failed, trying to do a flag query could cause it to hang for an unknown amount of time, and we don't want to risk blocking application code like that. So if we know that it has failed and has not yet started up again, we do want to short-circuit the query. But if it just hasn't finished starting up in the first place yet, that's another matter and we would want to wait... which would require more sophisticated logic than what we have now.

eli-darkly commented 4 years ago

I think the solution might be fairly simple and if so, we should be able to put a patch out soon. We don't want to completely bypass that connected check as you did in your workaround, but we can bypass it in the case where initialConnect is true— that is, if connected is false specifically because we have only just now started up. Based on my testing and what I know about how this Redis client is implemented, if we go ahead and do a query in that scenario it will be queued so that it executes after the connection has been made; I don't think we would need to implement any queueing logic of our own to make that happen.

As I said, we would not want to have it unexpectedly block like that due to a connection failure after the app has already started, but if this is the very first query that's being done it's reasonable for it to wait for a connection. And I think I'm right that the reason we haven't seen this before is simply the timing: in an app that's not just a very simple piece of test code like this, it'd be likely for startup to take long enough that the Redis client would've already finished connecting before the first flag query.

eli-darkly commented 4 years ago

OK, we've released version 5.13.4 which should fix this issue.