launchdarkly / go-server-sdk

LaunchDarkly Server-side SDK for Go
Other
41 stars 18 forks source link

Flush doesn't seems to flush everything #56

Closed jpatters closed 2 years ago

jpatters commented 3 years ago

Is this a support request? No

Describe the bug Calling flush doesn't seem to flush 'things'. Running the code below results in the user not being added to the launch darkly dashboard.

To reproduce

package main

import (
    "time"

    "gopkg.in/launchdarkly/go-sdk-common.v2/lduser"
    ld "gopkg.in/launchdarkly/go-server-sdk.v5"
)

func main() {
    ldClient, err := ld.MakeClient("sdk-my-key", 5*time.Second)
    if err != nil {
        panic(err)
    }

    user := lduser.NewUser("test@example.com")

    ldClient.BoolVariation("my-flag", user, true)
    ldClient.Flush()
}

Expected behavior I expected the user to be added to launch darkly

Logs

[LaunchDarkly] 2021/05/18 12:18:10 INFO: Starting LaunchDarkly client 5.3.0
[LaunchDarkly] 2021/05/18 12:18:10 INFO: Starting LaunchDarkly streaming connection
[LaunchDarkly] 2021/05/18 12:18:10 INFO: Waiting up to 5000 milliseconds for LaunchDarkly client to start...
[LaunchDarkly] 2021/05/18 12:18:10 INFO: Connecting to LaunchDarkly stream
[LaunchDarkly] 2021/05/18 12:18:10 INFO: LaunchDarkly streaming is active
[LaunchDarkly] 2021/05/18 12:18:10 INFO: Initialized LaunchDarkly client

SDK version v5

Language version, developer tools go version go1.16.3 darwin/amd64

OS/platform macOS 11.3.1

Additional context This is an example that I pulled from our lambda function. In a typical environment, I would call ldClient.Close() on exit (as that seems to work as expected) but don't really have a clean way of doing that in lambda since sometimes functions receive multiple requests and sometimes they don't. Any help would be greatly appreciated.

eli-darkly commented 3 years ago

If you're going to run a one-shot process that immediately exits, the only way to ensure that the events are delivered is either to call Close, or to add a delay. Flush, in the current version of the SDK, is not a synchronous method— it returns when it has queued an instruction to flush the events as soon as possible, it does not wait until that's been completed. I'm sorry that there isn't a better workaround at present but that is how it works. It was not designed with this use case in mind.

As more people are using the SDK in lambdas these days, we have heard of this issue before, and we'll consider ways to support this use case better. However I will say that this behavior is exactly as described in the documentation: "Flush tells the client that all pending analytics events (if any) should be delivered as soon as possible. Flushing is asynchronous, so this method will return before it is complete."

jpatters commented 3 years ago

@eli-darkly Thanks for your response. This is disappointing. It would be great if there was some documentation (probably on this page: https://docs.launchdarkly.com/guides/best-practices/serverless) that said this is how it has to work currently. It would have saved me a lot of time.

jpatters commented 3 years ago

@eli-darkly given that Flush works asynchronously, I can't figure out how to use this properly on lambda. Your documentation says not to initialize a new client on every request but that feels like the only thing that will work correctly in this situation. What are the drawbacks to that? Just the overhead of the connection time?

eli-darkly commented 3 years ago

Your documentation says not to initialize a new client on every request but that feels like the only thing that will work correctly in this situation

The most common use case for the LD SDKs is in long-running webapps. In that context, where a server will be sticking around long-term and handling thousands of requests a second, it's definitely undesirable to incur the overhead of a new connection to LD every time.

In a one-shot process that is going to just start up, evaluate some feature flags, do something based on those flags, and exit, then creating a new SDK instance every time makes sense.

The gray area in between is if a serverless environment like Lambda will be creating instances with an indeterminate lifespan and possibly giving more than one request to each of them. In that case you can keep an SDK instance around for multiple requests, as long as there are hooks for the beginning and end of the service's lifecycle— that is, if it is possible to do one thing (creating the SDK instance) at startup that won't be repeated if there are multiple requests, and do another thing (close the SDK) if you are about to be shut down. I was under the impression that Lambda does have such a mechanism but we have not tried this ourselves.

eli-darkly commented 3 years ago

One possible complication there is that the Lambda docs say it will "freeze" the execution environment immediately after each request has been handled. That might mean that any pending analytics events that haven't finished flushing yet will remain in the queue until either another request comes in, or you get a shutdown signal— but, if I'm reading this correctly, they would still eventually get delivered as long as you are able to close the SDK client on shutdown. It would just make it harder to predict how soon they would be delivered.

jpatters commented 3 years ago

The shutdown signal is sent to the runtime and to the extensions. It doesn't make it to the function itself. Those are essentially support docs for building custom runtimes. I'm not interested in building out a custom runtime just to support launch darkly working properly on lambda. I think you need to implement a FlushSync() method. I don't really understand the point of explicitly calling Flush() if its just going to flush in the background. I assume that is happening in the background automatically anyway. I would have expected that Flush() was a blocking operation.

eli-darkly commented 3 years ago

Again, it is clearly documented as being a non-blocking operation; the point of calling it, even if that isn't useful in your case, is to cause an earlier flush if the background flush interval is not short enough to ensure it'll happen soon otherwise. As I said, we will definitely consider providing better support in the SDKs for this kind of use case.

eli-darkly commented 2 years ago

Closing this because the original question was answered and it is now a feature request (adding a synchronous Flush method) which we are already planning to do in the next major version release.