launchdarkly / go-server-sdk

LaunchDarkly Server-side SDK for Go
Other
41 stars 17 forks source link

Streaming connection error causes endless console logs (WARN: Error in stream connection (will retry): EOF) #74

Closed Chennoy closed 11 months ago

Chennoy commented 2 years ago

Description Hi, we're encountering around 700,000 logs per day, and its the same log from this SDK:

[LaunchDarkly] 2022/04/07 13:23:43 WARN: Error in stream connection (will retry): EOF

I searched online and found that a similar issue was reported about the react SDK, to which the solution was:

A quick update on this issue -- I've implemented a change so that the "Error on stream connection..." message 1) mentions that the SDK is going to retry establishing the connection and 2) is only logged on the first failed attempt. More specifically, this is the first failed attempt in each series of failures; if the SDK fails to connect one or more times, then successfully connects, then disconnects and again fails to connect one or more times, the message will be logged twice-- at the start of each failure series.

This change affects the behavior in each of our client-side JavaScript-related SDKs: JavaScript, React, Node.js (client-side), and Electron. It'll be included in an upcoming release of each.

I think this solution is also required in the GO SDK. It would be super helpful if you could add it.

To reproduce I'm not sure how to reproduce the issue, but it doesn't make sense that it's being printed out so many times per day.

Expected behavior More descriptive logs, and less of this log in general.

Logs [LaunchDarkly] 2022/04/07 13:23:43 WARN: Error in stream connection (will retry): EOF

SDK version V5

Language version, developer tools For instance, Go 1.18.3

OS/platform EKS

Additional context I'm referring to this issue: https://github.com/launchdarkly/react-client-sdk/issues/2

eli-darkly commented 2 years ago

It is very abnormal for the SDK to have this error condition so frequently, and my instinct is that suppressing the log message is not necessarily the best way to address this, because it's likely that the message represents a real underlying problem. If you are seeing this error 700,000 times per day (it's unclear from your report whether this is still consistently happening, since the date in your example log message is April 7), it's possible that something in your network infrastructure is interfering with streaming connections between your application and LaunchDarkly, and you may not be receiving feature flag updates. I would recommend following up with our support team at support.launchdarkly.com, since a public GitHub issue isn't a good forum for trying to diagnose such issues that may require us to ask more specific questions about your operating environment.

But, more generally to the suggestion of suppressing the logging: I think that the React SDK issue that you linked to was in a somewhat different context and isn't exactly applicable here. In browser-based JS code, our SDKs normally do less logging than they would in a server-side environment, because it's unlikely that the log messages will be useful to an end user— they're only likely to be reported via a monitoring product such as Sentry, at which point it would be hard to diagnose why the end user had a connection problem, since the developer does not control that environment.

But in a server-side environment, it's generally desirable to find out about abnormal conditions as soon as they occur. We use a log level system to provide a rough idea of severity, where conditions like this that are abnormal but might resolve on their own are logged at WARN level, whereas things that might require more intervention are logged at ERROR level. If you are not interested in WARN messages, you can use the SDK's logging configuration to disable them (SetMinLevel(ldlog.Error)). The SDK will log an additional message at ERROR level if it finds that the network connection is still not working after 1 minute (that interval is also configurable, with LogDataSourceOutageAsErrorAfter).

github-actions[bot] commented 11 months ago

This issue is marked as stale because it has been open for 30 days without activity. Remove the stale label or comment, or this will be closed in 7 days.