launchdarkly / ld-relay

LaunchDarkly Relay Proxy
Other
113 stars 81 forks source link

Timeout errors to LaunchDarkly with Redis Sentinel enabled #114

Open andrewwyee opened 3 years ago

andrewwyee commented 3 years ago

Describe the bug I am running an ld-relay service from the ld-relay prebuilt docker image. When I start up the service with my SDK key configured via env, the service works fine, but when I point it towards a redis sentinel host, I get a the Timeout encountered waiting for LaunchDarkly client initialization warning in the docker logs.

I am able to set key values into sentinel via the Predis php client. I am unsure why the redis sentinel affects the connection to launch darkly

To reproduce

  1. Start the ld-relay docker image (v6) with environment variables defined for:
    LD_ENV_*={SDK_KEY}
    USE_REDIS=1
    REDIS_TLS=true
    REDIS_HOST={SENTINEL_HOST}
    REDIS_PORT={SENTINEL_PORT}
    CACHE_TTL=30s
    LOG_LEVEL=debug
  2. Tail the docker logs

Expected behavior LD-relay should successfully connect to sentinel and set the environment's feature flag keys.

Logs

2020/11/30 21:27:18.424679 INFO: Starting LaunchDarkly relay version 6.1.0 with configuration from environment variables
2020/11/30 21:27:18.424962 INFO: Using Redis feature store: rediss://REDACTED_HOST:REDACTED_PORT with prefix:
2020/11/30 21:27:18.426017 INFO: [env: ...1387] Starting LaunchDarkly client 5.0.2
2020/11/30 21:27:18.426028 INFO: RedisDataStore: Using URL: rediss://REDACTED_HOST:REDACTED_PORT
2020/11/30 21:27:18.426283 INFO: [env: ...1387] Starting LaunchDarkly streaming connection
2020/11/30 21:27:18.426296 INFO: [env: ...1387] Waiting up to 10000 milliseconds for LaunchDarkly client to start...
2020/11/30 21:27:18.426314 INFO: [env: ...1387] Connecting to LaunchDarkly stream
2020/11/30 21:27:18.427054 DEBUG: [env: ...1387] Sending diagnostic event: {"kind":"diagnostic-init","id":{"diagnosticId":"67c9d367-e916-409d-a7b8-ada0928a2fed","sdkKeySuffix":"**"},"creationDate":1606771638426,"sdk":{"name":"go-server-sdk","version":"5.0.2"},"configuration":{"startWaitMillis":10000,"connectTimeoutMillis":3000,"customBaseURI":false,"diagnosticRecordingIntervalMillis":900000,"usingRelayDaemon":false,"userKeysFlushIntervalMillis":300000,"customEventsURI":false,"socketTimeoutMillis":3000,"eventsCapacity":10000,"eventsFlushIntervalMillis":5000,"allAttributesPrivate":false,"usingProxy":false,"streamingDisabled":false,"customStreamURI":false,"reconnectTimeMillis":1000,"dataStoreType":"custom","inlineUsersInEvents":false,"userKeysCapacity":1000},"platform":{"name":"Go","goVersion":"go1.15.2","osName":"Linux","osArch":"amd64"}}
2020/11/30 21:27:18.427154 INFO: Starting server listening on port 8030
2020/11/30 21:27:19.154240 DEBUG: [env: ...1387] Received all feature flags
2020/11/30 21:27:28.465941 WARN: [env: ...1387] Timeout encountered waiting for LaunchDarkly client initialization
2020/11/30 21:27:28.469060 ERROR: Error initializing LaunchDarkly client for "REDACTED": timeout encountered waiting for LaunchDarkly client initialization

Relay version https://hub.docker.com/layers/launchdarkly/ld-relay/v6/images/sha256-2a3bdc8d8adf774a1631d7f13bacac5a8e9e89cdb2ce84f662104a3bb4be11fa?context=explore

andrewwyee commented 3 years ago

I was able to get it working by connecting to a redis host directly. Is it possible that ld-relay just doesn't support sentinel redis connections?

hroederld commented 3 years ago

Hi @andrewwyee

Relay does not support Redis Sentinel. We use https://github.com/gomodule/redigo for Redis which would require an additional library to specifically handle Sentinel.

andrewwyee commented 3 years ago

@hroederld Thanks.

Do you think it would be possible to build in sentinel support in ld-relay? I was under the impression that it was supported as the php-sdk allows you to provide an existing predis instance, which I had configured to point to sentinel

eli-darkly commented 3 years ago

@andrewwyee We're looking into switching to https://github.com/go-redis/redis which does support Sentinel, but we haven't tested this yet.

The PHP SDK isn't relevant to this. Each of our SDKs that has some type of Redis support is relying on third-party libraries to handle the Redis connection; we can't guarantee that all of those libraries support all of the same Redis configurations.

andrewwyee commented 3 years ago

Thanks @eli-darkly.

Is there an estimation as to when you guys would make the decision to utilize go-redis/redis or not?

Also do you have recommendations for redis HA support in the meantime with ld-relay? I'm aware that the ld-relay docs recommend tinkering with the CACHE_TTL option, however the LD PHP sdk will solely read through redis, so if redis is down we essentially cannot evaluate any keys.

joe-udwin-lrn commented 3 years ago

This is also a blocker for us purchasing LD. @eli-darkly Any update when this might be implemented?

hroederld commented 3 years ago

Hi @Joe-U-Questionmark I don't think we have any updates about sentinel support but as to HA reads:

You can configure Redis replication without Relay being aware (1). Predis supports configuring a leader and multiple followers for reads without sentinel (2). You can pass a configured Predis client to the SDK bypassing the standard options(3).

  1. https://redis.io/topics/replication
  2. https://github.com/predis/predis#replication
  3. https://github.com/launchdarkly/php-server-sdk/blob/master/src/LaunchDarkly/Integrations/PHPRedis.php#L30.

I have not tested this configuration but I think this would provide HA reads if the leader node goes down, without any modification to the SDK or Relay.

joe-udwin-lrn commented 3 years ago

@hroederld

Thanks for the reply and confirming that you think that the PHP SDK can work with a Sentinel. However, this thread is discussing the lack of sentinel support in the LD Relay which is a Go application. Do you have an update on when Sentinel support in the Relay proxy might be implemented?

eli-darkly commented 3 years ago

@Joe-U-Questionmark I believe @hroederld's reply was actually meant for @andrewwyee, who had specifically mentioned the PHP SDK.

eli-darkly commented 3 years ago

@Joe-U-Questionmark And I realize that that didn't answer your question, but on the question of time frame I have to defer to management who determines priorities; I've brought this to their attention. I just wanted to avoid confusion on the PHP point.

bwoskow-ld commented 3 years ago

Hi all,

I work with @eli-darkly and @hroederld and am partially responsible for prioritizing the SDK roadmap. While we can't commit to any particular timelines at this time, we are aware of the feature request for adding Redis Sentinel support to Relay and will keep you posted on any relevant updates we have.

Cheers, @bwoskow-ld

joe-udwin-lrn commented 3 years ago

@bwoskow-ld are we talking about this year? If I found a Go developer would you accept a PR?

eli-darkly commented 3 years ago

@Joe-U-Questionmark This can't (or shouldn't) be done with a single PR to the Relay Proxy code itself. The Redis integration is not in this repo, it's an add-on to the Go SDK, and as we described in other comments above, what was needed was for us to write a new one that uses a different Redis client. That's not hard to do and we had already mostly done it, but a large number of other SDK team priorities intervened. The obstacle hasn't been the lack of a Go developer but just the development pipeline in general, which wouldn't be speeded up by having to review an external PR for work we had already started. I'm not sure if "are we talking about this year" was meant as a serious question, but you can see that we haven't abandoned development on Relay Proxy or the Go SDK as we fairly recently released major version rewrites of both.

eli-darkly commented 3 years ago

@Joe-U-Questionmark Another thing that was delaying the release of a new Go SDK Redis integration is that the point of it was to support both clustered Redis and Redis Sentinel, but those things are not as straightforward to test against as a simple Redis instance. Constructing automated tests for this functionality is a somewhat larger task than actually writing the integration, and we try to avoid relying on manual tests.

bwoskow-ld commented 3 years ago

To follow up on Eli's response: we don't need any external contributions at this time to make this request a reality.

As for the resolution timeframe: we generally don't commit to timeframes publicly as we seek to maintain flexibility across maintaining our SDKs and adding new functionality to them. That said, the new Go SDK Redis integration has been requested a few times now, and so I do expect us to circle back to it sooner than later -- yes, I expect it'll be this year.

joe-udwin-lrn commented 3 years ago

@bwoskow-ld @eli-darkly Hopefully you can see I am a bit stuck: We need Redis Sentinel support in the relay to be able to purchase and use LD (we are stuck in the purchasing process at the moment with Jason Tran@LD). I understand you don't want external contributions which make sense if you have made internal progress, but equally you wont give me an idea of how long I might need to wait. It sounds like QA/automated testing is your main issue, but is there a way you can publish the code "un-supported" in the meantime so we can make progress? Or perhaps we can be your private preview customer? Perhaps we can talk privately about how to make progress that benefits both parties? You can reach out to me via Jason.

eli-darkly commented 3 years ago

Here's an update on the revised Redis integration. Cluster support using the go-redis library seems to be pretty straightforward, and we've been able to do some testing with that. However, when it comes to Redis Sentinel support we have some open questions.

Some of the comments here seem to imply that connecting via Sentinel ought to work exactly the same as connecting to a regular Redis instance, from the point of view of the application. But that's not what the design of the go-redis library seems to imply. It has four different methods for creating a Redis client instance: NewClient (for connecting to a plain Redis server), NewClusterClient (for connecting to a cluster), NewUniversalClient (for connecting to either a plain server or a cluster, providing a common API that works with both), and NewFailoverClient (for connecting via Sentinel). This implies that you do need to know ahead of time whether you're using Sentinel or not; also, the configuration options are fairly different for NewFailoverClient, and it's not totally clear whether all of these options would need to be surfaced in the Relay Proxy configuration.

That last part is an issue that doesn't affect the PHP/Predis use case mentioned at the top of this thread, because there the application is configuring the Predis library itself and has full access to whatever options it supports, so the LaunchDarkly SDK code doesn't need to know about those options in detail— whereas, with the Relay Proxy, the LaunchDarkly code is responsible for setting all the options and if we want them to be configurable by whoever is deploying the Relay Proxy, then we'll need to provide corresponding Relay Proxy settings. That is, unless most of those options are irrelevant and can be left at default values except for the host address.

alan-mccabe-lrn commented 3 years ago

@eli-darkly :

it's not totally clear whether all of these options would need to be surfaced in the Relay Proxy configuration.

They all have value of course, but to me the critical ones are all the ones before retry settings.

That last part is an issue that doesn't affect the PHP/Predis use case mentioned at the top of this thread, because there the application is configuring the Predis library itself and has full access to whatever options it supports, so the LaunchDarkly SDK code doesn't need to know about those options in detail— whereas, with the Relay Proxy, the LaunchDarkly code is responsible for setting all the options and if we want them to be configurable by whoever is deploying the Relay Proxy, then we'll need to provide corresponding Relay Proxy settings. That is, unless most of those options are irrelevant and can be left at default values except for the host address.

Anyone who needs this (like us) has indeed already configured Redis for the application(s) consuming the SDK and should have a good understanding of the values that need to be set. If I can suggest solving this by adding a single setting at the "Proxy levell" for something like 'RedisFailoverConfig', put all the settings FailOverOptions requires with no defaults and absolve the LDRelay of having to make recommendations to end users?

eli-darkly commented 3 years ago

If I can suggest solving this by adding a single setting at the "Proxy levell" for something like 'RedisFailoverConfig', put all the settings FailOverOptions requires with no defaults and absolve the LDRelay of having to make recommendations to end users?

I'm not sure I understand the suggestion. How would "a single setting" represent everything in FailoverOptions? The options have to be representable in a configuration file.

My point wasn't that users would not have a good understanding of the options, but that there has to actually be code in Relay to say "parse such-and-such options from the config file, which have such-and-such data types, and then copy those values into the programmatic Redis configuration."

joe-udwin-lrn commented 3 years ago

@eli-darkly cc @qm-alan-mccabe A number of libraries I have seen require a "Redis Connection String" which removes the need to try and model the full configuration capabilities down stream. Some also have a separate "Sentinel Configuration String" (and some combine it). Our recommendation would be to do something similar. Perhaps a serialized JSON string that represents the object you are trying to populate? Or any format of your choosing as long as its documented...

eli-darkly commented 3 years ago

@Joe-U-Questionmark - I'll try to restate my comment more clearly. The issue isn't that it is impossible in principle to have a string that contains multiple properties. It's that the go-redis library we are using does not itself define such a format— and its FailoverOptions struct does not use only JSON-serializable field types. Therefore, the Relay Proxy code would need to include some code along the lines of "parse the string into such-and-such a struct, copy field value 1, copy field value 2, convert field value 3 to the desired type like so, etc.", which means it would need to have specific knowledge of each field. We can't just blindly hand a string over to go-redis and say "use whatever values are in this", because they did not write the library that way. So that's why I was asking whether you would need support for all the options or not.

joe-udwin-lrn commented 3 years ago

@eli-darkly cc @qm-alan-mccabe

Understood, the most important ones are strings, ints and bools (and durations but perhaps make them durationInSeconds and make them an int.)

Here is my MoSCoW analysis: IMO I would probably write the code to marshal and unmarshal all bool, string, string[], and ints (after covering time.duration to durationInSeconds) to avoid someone moaning about one you missed...

MasterName string Must
SentinelAddrs []string Must
SentinelPassword string Must
RouteByLatency bool Should
RouteRandomly bool Should
SlaveOnly bool Should
UseDisconnectedSlaves bool Could
QuerySentinelRandomly bool Could

Dialer    func(ctx context.Context, network, addr string) (net.Conn, error) Wont
OnConnect func(ctx context.Context, cn *Conn) error Wont

Username string Must
Password string Must
DB       int Must

MaxRetries      int  Should
MinRetryBackoff time.Duration Should
MaxRetryBackoff time.Duration Should

DialTimeout  time.Duration Could (assuming a sensible default)
ReadTimeout  time.Duration Could  (assuming a sensible default)
WriteTimeout time.Duration Could (assuming a sensible default)

PoolSize           int Could (assuming a sensible default)
MinIdleConns       int Could (assuming a sensible default)
MaxConnAge         time.Duration Could (assuming a sensible default)
PoolTimeout        time.Duration Could (assuming a sensible default)
IdleTimeout        time.Duration Could (assuming a sensible default)
IdleCheckFrequency time.Duration Could (assuming a sensible default)

TLSConfig *tls.Config Wont
eli-darkly commented 3 years ago

@Joe-U-Questionmark Again, the issue isn't whether it is technically possible to come up with parsing logic for specific field types (btw, if we did, we wouldn't represent durations in seconds— the Relay Proxy already has a different convention for durations, as described here). It's just that if we are writing any logic to parse any fields in particular, then that means the Relay code has to say specifically which fields it is supporting, which contradicts your original statement that "a Redis connection string ... removes the need to try and model the full configuration capabilities down stream." That is, unless what you're suggesting is that we use reflection to inspect the names and types of the config struct fields dynamically, which would be a fairly elaborate solution. Also, this would mean the Relay Proxy configuration is very closely tied to this one specific Redis library, go-redis, which may not always be desirable.

I would also question the assumption that TLSConfig would not be a desired part of the configuration; we specifically added TLS support for Redis because it was a frequently requested feature.

joe-udwin-lrn commented 3 years ago

@eli-darkly We were attempting to help you keep the LD relay proxy's config file uncluttered of the details of the redis setup (just like most applications use a DB connection string instead of 200 config settings). However we don't mind how you implement it as long as we can configure it. We look forward to see what you come up with!

joe-udwin-lrn commented 3 years ago

@eli-darkly Have you made any progress on the relay's support of redis sentinel? Thanks in advance!

eli-darkly commented 3 years ago

@Joe-U-Questionmark No, this has been on the back burner due to other projects.

joe-udwin-lrn commented 3 years ago

@eli-darkly do you plan to pick this up again soon? I hope so as we are now using LD for some components but want to use it for all components.

eli-darkly commented 3 years ago

@Joe-U-Questionmark I'm sorry I can't be more specific about our project planning, but I just can't. If I could have told you whether we'll be working on this soon, I would have. We always have a large number of feature requests and fixes in progress as well as longer-term projects; repeatedly asking about one is understandable but doesn't make it happen sooner.

joe-udwin-lrn commented 2 years ago

@eli-darkly I managed to not bug you for a whole year and we have avoided this part of the project but are coming back round to tackle this. Seeing as there has been no progress in this direction is a DIY approach our best bet? Thank you

duerra commented 2 years ago

👍 request for supporting this.

cwaldren-ld commented 1 year ago

Hi folks, I know this has been a painfully long wait. We are hoping to put out a beta version of the go-redis integration and obtain your feedback on it. This will be the first step towards adding support in Relay Proxy.

I can't commit to a timeline, but know that this request has been heard loud and clear.

cwaldren-ld commented 1 year ago

Just an update, we've put out a beta of the go-redis persistent-store integration for the Go Server SDK.

There are still design questions as to how configuration should be exposed, but this is an initial step towards supporting Sentinel in Relay Proxy.

louis-launchdarkly commented 1 year ago

Marking this as an enhancement as there is no official support for Sentinel yet. We recently looked into SDK persistent store design on how to ensure SDKs and Relay work correctly with Sentinel.