[Documentation] Behavior when instance dies

sgringwe commented 6 years ago

Hey all! We are looking into using ld-relay in a Kubernetes cluster using a standard Service and Deployment setup. We expect pods of deployment to be SIGTERMed and KILLed from time to time.

What is the behavior we can expect to see when an instance is SIGTERMed? In particular with streaming with our rails app servers?

More information on k8s pod termination: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

ashanbrown commented 6 years ago

@sgringwe Glad to hear about your interest in ld-relay. Relay has several of roles that may be relevant to this:

1) It serves streaming connections to your servers and mobile/js clients. 2) It serves flag evaluation responses for you mobile/js clients. 3) It aggregates events and sends them back to the LaunchDarkly service.

With regard to "streaming" (1), clients should be able to recover and will retry connecting to the service until they succeed.

With regard to item (2), serving flag evaluations (relevant if you are using mobile clients), you might want to make sure that you remove any ld-relay instances that you want to tear down from the load balancer so that you don't get failed requests. If their requests do fail, they should serve data cached from the last successful connection, so that shouldn't really be a problem. For php, this is a bigger problem as the client would fall back to the default value in code unless you are using memcache or redis to store previously read values.

For item (3), events, the relay flushes it's even queue back to the LaunchDarkly service every 5 seconds by default. This means that if you don't want to lose event data, you'll want to give the relay at least 5 seconds (ideally more) between removing it from the load balancer and tearing it down. ~~Things are a little more complicated when the relay is collecting events from php clients, because we aggregate these and only flush events every 30 seconds.~~ (<- I was wrong about this, php also has a 5 second flush frequency by default) All this said, dropping a few events may not matter, particularly if you are not doing a/b testing or using our firehose and expecting to see all events.

My guess is that you know more about kubernetes then we do, so if you have any suggestions for how we can make this work better, please let us know. For example, are there hooks we should have in ld relay for teardown to flush the event queue? We can describe this behavior in the README but first please let us know if it answers your question. Thanks.

sgringwe commented 6 years ago

Thanks @ashanbrown - would it be possible to organize a call? I think it'd be a productive way to learn how to properly use this tool, and I can share info on proper k8s handling. I have a support ticket open that we can share contact info over.

bwoskow-ld commented 5 years ago

I'm going to close this issue as the conversation moved over to your support request and hopefully to a subsequent call.

Feel free to add a comment here or open another issue if you have further feature requests or bug reports, or to reach out to our support team on support.launchdarkly.com if you need assistance using LD Relay.

launchdarkly / ld-relay

[Documentation] Behavior when instance dies #44