launchdarkly / rust-server-sdk

LaunchDarkly Server-Side SDK for Rust
https://docs.launchdarkly.com/sdk/server-side/rust
Other
18 stars 13 forks source link

High volume of "Failed to send events" errors #73

Closed samscott89 closed 2 weeks ago

samscott89 commented 4 months ago

Describe the bug

We're seeing a high volume of:

Failed to send events. Some events were dropped: hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) })

errors logged in production

To reproduce

Not sure beyond "run the SDK for a while"?

Expected behavior

If these errors are benign then I wouldn't expect them to log an ERROR event. Otherwise we'll need to silence all logging from the launchdarkly sdk which would be a shame.

If there errors aren't benign, then I guess I would expect maybe a retry?

Logs

That's all I have I'm afraid

log.file    /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/launchdarkly-server-sdk-2.1.0/src/events/sender.rs

log.line    140

log.module_path launchdarkly_server_sdk::events::sender

log.target  launchdarkly_server_sdk::events::sender

SDK version

First happened on 1.1, we upgraded and still happening on 2.1

Language version, developer tools

Rust 1.78.0

OS/platform

Amazon Linux 6.1.87-99.174.amzn2023.x86_64

keelerm84 commented 4 months ago

Thank you for bringing this to our attention. We will investigate and let you know once we have a resolution!

samscott89 commented 2 months ago

Hey @keelerm84 , any update on this? We're on 2.1.0 but seeing a similar problem:

error on event stream: Eof

image

Rust Tracing Fields
log.file          /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/launchdarkly-server-sdk-2.1.0/src/data_source.rs
log.line          151
log.module_path   launchdarkly_server_sdk::data_source
log.target        launchdarkly_server_sdk::data_source
keelerm84 commented 2 months ago

@samscott89 a couple of questions for you.

  1. The initial error was about Failed to send events. Some events were dropped [...]. Are you still experiencing that problem?
  2. For the error on event stream: Eof error, does the included graph represent ONLY occurrences of that specific error?
  3. Can you describe your setup? Are you connecting directly to LaunchDarkly APIs or are you using the relay proxy?
samscott89 commented 2 months ago

The initial error was about Failed to send events. Some events were dropped [...]. Are you still experiencing that problem?

Ah good catch, it seems like those stopped around May 7th. Happy to reopen as a different issue or change title if that would be helpful.

And to reiterate a point from the first message: we're not observing any erroneous behaviour, but the current impact is that we're silencing all errors from launchdarkly since these seem non-actionable.

For the error on event stream: Eof error, does the included graph represent ONLY occurrences of that specific error?

Yes that's correct. These errors started on April 30th. We've seen 14k events since then. Seems to be paired with this log event in case that's helpful:

image

Can you describe your setup? Are you connecting directly to LaunchDarkly APIs or are you using the relay proxy?

Pretty vanilla setup I think, connecting directly:

        // By fiat, tests will not be allowed to hit LaunchDarkly.
        let offline_mode = cfg!(test) || matches!(mode, FlagClientMode::Offline);

        let client = {
            let config = ConfigBuilder::new(sdk_key).offline(offline_mode);

            tracing::info!("Starting the LaunchDarkly client in {mode:?} mode");
            Client::build(config.build().expect("valid config")).expect("build launchdarkly client")
        };

        client.start_with_default_executor();

        let start = std::time::Instant::now();

        // Wait to ensure the client has fully initialized.
        // Offline mode clients will always be immediately initialized, so this
        // will be a no-op for them.
        let init_ld_span = tracing::info_span!("init_ld");
        let initialized = client
            // NOTE(Sam): max observed time in production for the last two months
            // is 25s
            //
            // If this is failing and we're unable to connect to launchdarkly in 2 minutes
            // consider deploying with flags in offline mode
            .wait_for_initialization(std::time::Duration::from_secs(120))
            .instrument(init_ld_span)
            .await;
        if initialized != Some(true) {
            panic!("Couldn't start the LaunchDarkly client");
        }
        tracing::info!("LaunchDarkly client startup took {:?}", start.elapsed());
keelerm84 commented 1 month ago

It looks like we are simply printing that log message when we don't need to. I am making a change to suppress that message when it's an EOF response since that's an expected condition and we handle it fine, as you noted.

I will let you know once a release with the fix has been cut.

Thank you for your help and patience with this.

keelerm84 commented 1 month ago

v2.2.1 has been released which I believe should quiet down that error for you. Please let us know!

samscott89 commented 1 month ago

Perfect, thank you! Will give it a try

github-actions[bot] commented 3 weeks ago

This issue is marked as stale because it has been open for 30 days without activity. Remove the stale label or comment, or this will be closed in 7 days.