splitio / go-client

Go SDK client for Split Software
https://split.io
Other
18 stars 7 forks source link

Synchronization task fails #130

Closed farshadahmadi closed 1 year ago

farshadahmadi commented 3 years ago

After a while, GO SDK logs the following:


ERROR - 2020/12/11 13:01:43 EOF
ERROR - 2020/12/11 13:01:43 Error reading streaming buffer
ERROR - 2020/12/11 13:01:43 Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc0007d41d0 code:0xc0007e07f8 statusCode:0xc0007e0800 href:0xc0007d41e0}

and after that synchronization task will not work anymore; meaning that If splits' setting change via split dashboard, split sdk will not consider those changes anymore. I tried both v6 and legacy v5 and both has the same problem. It does not even fallback to use polling mechanism.

chillaq commented 3 years ago

Hi @farshadahmadi , looks like a streaming feature error, is it possible to restart the app and enable debug logging for GO SDK?

Thanks Bilal

farshadahmadi commented 3 years ago

The debug log:

ERROR - 2020/12/17 00:48:19 EOF
INFO - 2020/12/17 00:48:19 SSE streaming exiting
ERROR - 2020/12/17 00:48:19 Error reading streaming buffer
INFO - 2020/12/17 00:48:19 Stopping Push Services
INFO - 2020/12/17 00:48:19 Stopped streaming
DEBUG - 2020/12/17 00:48:19 [GET]  https://auth.split.io/api/auth
DEBUG - 2020/12/17 00:48:19 Authorization [ApiKey]:  a72t...sbml
DEBUG - 2020/12/17 00:48:19 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]]
INFO - 2020/12/17 00:48:19 SSE OK
DEBUG - 2020/12/17 00:48:19 Started SplitUpdateWorker
DEBUG - 2020/12/17 00:48:19 Started SegmentUpdateWorker
INFO - 2020/12/17 00:48:19 SSE Streaming is ready
DEBUG - 2020/12/17 00:48:19 [GET]  https://sdk.split.io/api/splitChanges?since=1608147764506
DEBUG - 2020/12/17 00:48:19 Authorization [ApiKey]:  a72t...sbml
DEBUG - 2020/12/17 00:48:19 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]]
DEBUG - 2020/12/17 00:48:19 LINE: data: {\"id\":\"MEJGTem2aE:0:0\",\"timestamp\":1608166099927,\"encoding\":\"json\",\"channel\":\"[?occupancy=metrics.publishers]control_sec\",\"data\":\"{\\\"metrics\\\":{\\\"publishers\\\":0}}\",\"name\":\"[meta]occupancy\"}

So after enabling the debug log, streaming issue still happens, but it somehow manages to survive!

chillaq commented 3 years ago

Hi @farshadahmadi, I think this is expected default error and can be ignored, I will double check with developers.

chillaq commented 3 years ago

Hi @farshadahmadi, I checked SDK version 5.3.0 and don't see the error on my end, as Streaming worked without errors.

Which version are you using?

farshadahmadi commented 3 years ago

I am using the latest legacy 5 version (5.3.0)

    github.com/splitio/go-client v0.0.0-20201006181645-1804d30cc2c2
    github.com/splitio/go-split-commons v0.0.0-20201002201936-3e00dab75688 // indirect
    github.com/splitio/go-toolkit v0.0.0-20200908212815-d49116c52af2

And as I mentioned previously, I used both legacy 5 and v6; Both have the same problem! It is strange that without debug log, synchronization fails, and with it, synchronization works! Moreover, How the following two ERROR lines are expected behaviour when they are clearly logged as errors?

ERROR - 2020/12/17 00:48:19 EOF
INFO - 2020/12/17 00:48:19 SSE streaming exiting
ERROR - 2020/12/17 00:48:19 Error reading streaming buffer
chillaq commented 3 years ago

Thanks @farshadahmadi , this is indeed strange, could you provide the code that uses the SDK?

chillaq commented 3 years ago

Hi @farshadahmadi , we released go sdk 6.0.1 to eliminate this error message, could you please try it out and let us know if it fixes the issue?

Thanks Bilal

chillaq commented 3 years ago

Hi @farshadahmadi , I am closing out this ticket. If there is still an open issue or if we can be of further assistance on this topic, feel free to respond to this email and the ticket will be re-opened.

Thanks Bilal

farshadahmadi commented 3 years ago

Hi. back after Christmas holidays :D

After upgrading to v6.0.1, I still get the error:

ERROR - 2021/01/07 12:22:51 EOF
ERROR - 2021/01/07 12:22:51 Error reading streaming buffer
ERROR - 2021/01/07 12:22:51 Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc00070b2b0 code:0xc00087acf0 statusCode:0xc00087acf8 href:0xc00070b2d0}

Unfortunately I cannot share the code. But I basically use it the same way as official documentation. If you really need it, I can develop a mini application and share the code.

farshadahmadi commented 3 years ago

The mini app is now available. Please follow readme to run the app. After a while you will see sth like:

2021/01/08 10:21:58 Serving miniapp at http://[::]:8050
ERROR - 2021/01/08 11:21:58 Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc001e50190 code:0xc000390178 statusCode:0xc000390180 href:0xc001e501a0}
ERROR - 2021/01/08 11:21:58 EOF
ERROR - 2021/01/08 11:21:58 Error reading streaming buffer

As you can see from the time stamp, most of the times it happens exactly after 1 hour.

farshadahmadi commented 3 years ago

Hi.

I tried the latest version but I had the same problem. I also developed a mini app that has the same problem. Two more comments are already added to the issue ticker. Please re-open the ticket.

BR, Farshad

On Thu, Jan 7, 2021 at 12:46 AM Bilal Al-Shahwany notifications@github.com wrote:

Hi @farshadahmadi https://github.com/farshadahmadi , I am closing out this ticket. If there is still an open issue or if we can be of further assistance on this topic, feel free to respond to this email and the ticket will be re-opened.

Thanks Bilal

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/splitio/go-client/issues/130#issuecomment-755763233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJZJHWIO635AMKOIQC2YRTSYTR5ZANCNFSM4U36L2CQ .

-- Farshad Ahmadi Ghohandizi

chillaq commented 3 years ago

Thanks @farshadahmadi will checkout the app.

chillaq commented 3 years ago

Hi @farshadahmadi , I don't see the issue on my end running your mini service, I used my "sdk" type API key and tried calculating a split treatment with default rule set to "off", then changed the rule to "on", both treatments were calculated correctly, see log below.

Can you provide your org name? I can run the same app against your org and see if there is an issue there.

Bilal-A-MacBook-Pro:miniapp-master bilalal-shahwany$ go run cmd/main.go DEBUG - 2021/01/12 10:36:47 [GET] https://sdk.split.io/api/splitChanges?since=-1 DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:36:47 [GET] https://sdk.split.io/api/splitChanges?since=1610476386851 DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:36:47 Segment Sync [] DEBUG - 2021/01/12 10:36:47 SyncAll Ready INFO - 2021/01/12 10:36:47 Start Streaming DEBUG - 2021/01/12 10:36:47 No impressions fetched from queue. Nothing to send DEBUG - 2021/01/12 10:36:47 [GET] https://auth.split.io/api/auth DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 No events fetched from queue. Nothing to send DEBUG - 2021/01/12 10:36:47 [POST] https://events.split.io/api/metrics/times DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json] Splitsdkmachineip:[192.168.1.9] Splitsdkmachinename:[ip-192-168-1-9] Splitsdkversion:[go-6.0.1]] DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] 2021/01/12 10:36:47 Serving miniapp at http://[::]:8050 DEBUG - 2021/01/12 10:36:48 [POST] https://events.split.io/api/metrics/counters DEBUG - 2021/01/12 10:36:48 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:48 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json] Splitsdkmachineip:[192.168.1.9] Splitsdkmachinename:[ip-192-168-1-9] Splitsdkversion:[go-6.0.1]] INFO - 2021/01/12 10:36:48 SSE OK DEBUG - 2021/01/12 10:36:48 Started SplitUpdateWorker DEBUG - 2021/01/12 10:36:48 Started SegmentUpdateWorker INFO - 2021/01/12 10:36:48 SSE Streaming is ready DEBUG - 2021/01/12 10:36:48 [GET] https://sdk.split.io/api/splitChanges?since=1610476386851 DEBUG - 2021/01/12 10:36:48 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:48 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:36:48 Segment Sync [] DEBUG - 2021/01/12 10:36:48 LINE: data: {"id":"-1bc5lwBo7:0:0","timestamp":1610476608492,"encoding":"json","channel":"[?occupancy=metrics.publishers]control_pri","data":"{\"metrics\":{\"publishers\":2}}","name":"[meta]occupancy"}

DEBUG - 2021/01/12 10:36:48 LINE: data: {"id":"Xo7zmvAB02:0:0","timestamp":1610476608492,"encoding":"json","channel":"[?occupancy=metrics.publishers]control_sec","data":"{\"metrics\":{\"publishers\":0}}","name":"[meta]occupancy"}

DEBUG - 2021/01/12 10:36:48 Presence event received DEBUG - 2021/01/12 10:36:48 Presence event received DEBUG - 2021/01/12 10:36:56 Building WhitelistMatcher with negate=false, whitelist=[5af0d23bf5d7df5cc4782575 dave], attributeName= DEBUG - 2021/01/12 10:36:56 Building WhitelistMatcher with negate=false, whitelist=[john], attributeName= DEBUG - 2021/01/12 10:36:56 Building AllKeysMatcher with negate=false DEBUG - 2021/01/12 10:37:16 LINE: data: {"id":"zZoMlGpN4D:0:0","clientId":"pri:MzEyMDkxNzY4","timestamp":1610476636552,"encoding":"json","channel":"MzE2MzAwMjk5Mg==_Mzk3OTY3Mjg5NQ==_splits","data":"{\"type\":\"SPLIT_UPDATE\",\"changeNumber\":1610476636036}"}

DEBUG - 2021/01/12 10:37:16 Update event received DEBUG - 2021/01/12 10:37:16 Incoming Notification: &{MzE2MzAwMjk5Mg==_Mzk3OTY3Mjg5NQ==_splits 0xc001ed6568 SPLIT_UPDATE} DEBUG - 2021/01/12 10:37:16 Received Split update and proceding to perform fetch DEBUG - 2021/01/12 10:37:16 ChangeNumber: 1610476636036 DEBUG - 2021/01/12 10:37:16 [GET] https://sdk.split.io/api/splitChanges?since=1610476386851 DEBUG - 2021/01/12 10:37:16 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:37:16 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:37:23 Building WhitelistMatcher with negate=false, whitelist=[5af0d23bf5d7df5cc4782575 dave], attributeName= DEBUG - 2021/01/12 10:37:23 Building WhitelistMatcher with negate=false, whitelist=[john], attributeName= DEBUG - 2021/01/12 10:37:23 Building AllKeysMatcher with negate=false DEBUG - 2021/01/12 10:37:47 No events fetched from queue. Nothing to send DEBUG - 2021/01/12 10:37:48 [POST] https://events.split.io/api/metrics/times DEBUG - 2021/01/12 10:37:48 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:37:48 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json] Splitsdkmachineip:[192.168.1.9] Splitsdkmachinename:[ip-192-168-1-9] Splitsdkversion:[go-6.0.1]] DEBUG - 2021/01/12 10:37:48 [POST] https://events.split.io/api/metrics/counters DEBUG - 2021/01/12 10:37:48 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:37:48 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json] Splitsdkmachineip:[192.168.1.9] Splitsdkmachinename:[ip-192-168-1-9] Splitsdkversion:[go-6.0.1]]

farshadahmadi commented 3 years ago

Thank you for checking the app. Two notes:

  1. It seems that you enabled debug log level. As mentioned before, with debug log level enabled, split.io go SDK somehow manages to survive the error. So please test it without it being enabled.
  2. Also as mentioned before, it happens most of the times 1 hour after running the application. However from what logs say, you test it exactly after running the application! So please wait at least for 1 hour after running the application.
mredolatti commented 3 years ago

Hi @farshadahmadi, sorry to hear you're having issues, could you please share the contents of your go.mod file after updating to v6.0.1? Thank you very much for your patience.

mredolatti commented 3 years ago

Also, an example of how you import our sdk in your code would be appreciated. Thanks!

chillaq commented 3 years ago

Hi @mredolatti , customer provided a mini app, check it out here: https://github.com/farshadahmadi/miniapp

chillaq commented 3 years ago

Hi @farshadahmadi, I ran the app for 2 hours, with logging set to Info, here is the output: NFO - 2021/01/19 14:21:31 Start Streaming 2021/01/19 14:21:31 Serving miniapp at http://[::]:8050 INFO - 2021/01/19 14:21:32 SSE OK INFO - 2021/01/19 14:21:32 SSE Streaming is ready INFO - 2021/01/19 15:11:32 Token expired INFO - 2021/01/19 15:11:32 Stopping Push Services INFO - 2021/01/19 15:11:32 Stopped streaming INFO - 2021/01/19 15:11:32 Shutting down listener INFO - 2021/01/19 15:11:32 SSE streaming exiting INFO - 2021/01/19 15:11:32 SSE OK INFO - 2021/01/19 15:11:32 SSE Streaming is ready INFO - 2021/01/19 16:01:32 Token expired INFO - 2021/01/19 16:01:32 Stopping Push Services INFO - 2021/01/19 16:01:32 Stopped streaming INFO - 2021/01/19 16:01:32 Shutting down listener INFO - 2021/01/19 16:01:32 SSE streaming exiting INFO - 2021/01/19 16:01:32 SSE OK INFO - 2021/01/19 16:01:32 SSE Streaming is ready

I do see the streaming token getting expired every 50 minutes, and get renewed, at the same time. I also kept running the curl to get the treatment and it always worked.

chillaq commented 3 years ago

Hi @farshadahmadi, which region are you running your app? I just want to confirm if this is not related to network latency.

Thanks Bilal

farshadahmadi commented 3 years ago

It seems that you enabled the INFO log level. I have not yet tested that, but I assume enabling any log level except the default one will somehow make the DSK survive the error; as we already saw that with DEBUG log level enabled it worked.

I am in Finland. But I tested it in both my local computer and also in cloud (running in Europe). And both had the same issue.

chillaq commented 3 years ago

Thanks @farshadahmadi , if I disable logging, I do see the error, will check with the team.

2021/01/27 11:26:34 Serving miniapp at http://[::]:8050 ERROR - 2021/01/27 12:26:34 EOF ERROR - 2021/01/27 12:26:34 Error received: {id: timestamp: encoding: data: name: clientID: event:error channel: message:0xc0001e80c0 code:0xc0001ec090 statusCode:0xc0001ec098 href:0xc0001e80d0} ERROR - 2021/01/27 12:26:34 Error reading streaming buffer

AndriiIlika commented 3 years ago

Really strange bug. Confirm setting logging to any level besides default "error" let pushManager resume sync after token expires. Also, setting a custom logger with the workaround leads to the same buggy behavior. Still no luck with finding a cause. I will glad to participate and fix it asap.

AndriiIlika commented 3 years ago

Good day @chillaq . Any news on the defect? So far I can state that setting the logging level affects the only created logger itself. So I switched back to a custom implementation having the same positive outcomes. Also added a workaround to reinitialize the client factory. It helps but unfortunately, there are more cases SSE does not resume correctly. Seems the problem in runtime latencies.

Here is some trace with mitigation steps succeed:

Info    2021-02-09 00:20:57.863 GMT splitio: SSE Streaming is ready
Info    2021-02-09 00:20:57.868 GMT splitio: Start periodic polling in Streaming
Info    2021-02-09 00:20:57.869 GMT splitio: Stop periodic polling
Info    2021-02-09 01:10:57.521 GMT splitio: Token expired
Error   2021-02-09 01:20:57.049 GMT splitio: EOF
github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do.func2(0xc001cca790, 0xc0014760e0, 0xc000f93a40, 0xc000f93aa0)
    /usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:177 +0x1bf
created by github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do
    /usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:172 +0x6f1
Warning 2021-02-09 01:20:57.049 GMT Malfunction detected: reinitializing splitio clients
Info    2021-02-09 01:20:57.049 GMT splitio: STOPPING MANAGER TASKS
Info    2021-02-09 01:20:57.049 GMT splitio: Stopping Push Services
Info    2021-02-09 01:20:57.049 GMT splitio: Stopped streaming
Info    2021-02-09 01:20:57.049 GMT splitio: Shutting down listener
Info    2021-02-09 01:20:57.049 GMT splitio: SSE streaming exiting
Error   2021-02-09 01:20:57.073 GMT splitio: Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc0017b4250 code:0xc000581b68 statusCode:0xc000581b80 href:0xc0017b4260}
github.com/splitio/go-split-commons/v2/push.(*EventHandler).HandleIncomingMessage(0xc00146b5c0, 0xc0016b9380)
    /usr/go-workspace/pkg/mod/github.com/splitio/go-split-commons/v2@v2.0.1/push/eventhandler.go:84 +0x3f1
github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do.func3(0xc001504c80, 0xc001cca770, 0xc0016b9380)
    /usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:208 +0x58
created by github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do
    /usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:206 +0x905
Info    2021-02-09 01:20:58.965 GMT splitio: Start Streaming
Info    2021-02-09 01:20:59.437 GMT splitio: SSE OK
Info    2021-02-09 01:20:59.437 GMT splitio: SSE Streaming is ready

And one that stalled:

Info  2021-02-05 00:30:58.044 GMT splitio: SSE Streaming is ready
Info  2021-02-05 00:30:58.046 GMT splitio: Start periodic polling in Streaming
Info  2021-02-05 00:30:58.046 GMT splitio: Stop periodic polling
Error 2021-02-05 01:12:58.113 GMT splitio: read tcp x.x.x.x:50158->13.249.13.41:443: read: connection reset by peer
github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do.func2(0xc001199cc0, 0xc000276150, 0xc001c3e6c0, 0xc001c3e720)
    /usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:177 +0x1bf
created by github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do
    /usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:172 +0x6f1
Info  2021-02-05 01:12:58.113 GMT splitio: SSE streaming exiting
Error 2021-02-05 01:12:58.113 GMT splitio: Error reading streaming buffer
github.com/splitio/go-split-commons/v2/service/api/sse.(*StreamingClient).ConnectStreaming.func2(0xc00287a550, 0xc0010a7920)
    /usr/go-workspace/pkg/mod/github.com/splitio/go-split-commons/v2@v2.0.1/service/api/sse/client.go:97 +0x462
created by github.com/splitio/go-split-commons/v2/service/api/sse.(*StreamingClient).ConnectStreaming
    /usr/go-workspace/pkg/mod/github.com/splitio/go-split-commons/v2@v2.0.1/service/api/sse/client.go:70 +0x251
Info  2021-02-05 01:20:57.699 GMT splitio: Token expired

<no further attempts to reconnect>
mredolatti commented 3 years ago

Hello @farshadahmadi @AndriiIlika,

We're currently working on a fix for this. We're expecting it to be released by the end of this week or monday. Thank you very much for your patience, Martin.

mredolatti commented 3 years ago

Hi @farshadahmadi and @AndriiIlika , Sorry for the late check-in. We've been doing quite a revamp of the sse/push subsystem of our SDK trying to minimise all possible issues. While we are not in a position to an official release yet (we're reworking our test suites as well), we would like to let you know that we have an RC that's been working for days (both in our staging environment and QoS applications, and in the miniapp that @farshadahmadi sent us. The SDK has been working proerly, performing syncrhonization as expected regardless of the SDK config.

If you want to test it in your staging environments, (feedback is always more than welcome), you can get it by issuing go get github.com/splitio/go-client/v6@v6.0.2-rc4 in your repos.

Please keep in mind that we've bumped the major versions of our internal libraries (toolkit and commons) for this release. So if you're setting up the logger, you might want to replace "github.com/splitio/go-toolkit/v3/logging" for github.com/splitio/go-toolkit/v4/logging" as well.

In case you're interested in taking a look at the changes, this are the PRs in our libraries:

Thank you very much for all your patience. We are hoping to do an official release by wednesday if everything is ok. Should you find anything that's causing you trouble or not working as expected, please let us know.

Thanks! Martín.

AndriiIlika commented 3 years ago

Hi @mredolatti.

Testing release candidate and it works well so far. Waiting for release!

mredolatti commented 3 years ago

Hi guys, sorry for the delay. I got sick and had to stay in bed for a couple of days. I've resumed work today and just pushed the last pending changes. I've sent this new rc to our staging QoS and if everything is well we will most likely be releasing tomorrow.

Thank you very much for your patience Regards, Martin.

mredolatti commented 3 years ago

Hi @AndriiIlika @farshadahmadi , we've just released v6.0.2 which should fix these problems you've been having as well as some other stuff.

Thank you VERY much for your patience and sorry for all the inconveniences caused. Please let me know if you're still having issues after upgrading.

Regards, Martin.

jwilander commented 3 years ago

Thanks @farshadahmadi for the report and @mredolatti for the fix. Seems we were running into this too.

mredolatti commented 3 years ago

Hi @jwilander , sorry to hear you had issues as well. Please let me know if after updateing something's still not working as expected. Cheers, Martin.

jwilander commented 3 years ago

@mredolatti I'm not sure if it's the exact same issue but we're running into an issue where the Go SDK isn't getting updates to feature flags when the synchronizer is configured with nil for Log.

For example, here https://github.com/mattermost/mattermost-server/blob/master/app/feature_flags.go#L77 when log is nil.

When it's set to nil, the synchronizer doesn't seem to get any updates to the feature flags. As soon as we give the synchronizer a logger that's not nil it seems to work fine.

Is a non-nil value for Log required? The reason we don't always provide one is because we find the logs a bit noisy and we're trying to keep our logs a little quieter.

Let me know if I should open a separate issue for this.

mredolatti commented 3 years ago

Hi @jwilander, sorry to hear that. a non-nil value for log should definitely not required.

The following POC is equivalent to your intialization snippet (logging.LevelNone == math.MinInt32) and it seems to work properly:

package main

import (
    "fmt"
    "math"
    "time"

    "github.com/splitio/go-client/v6/splitio/client"
    "github.com/splitio/go-client/v6/splitio/conf"
)

func main() {

    apikey := ""
    cfg := conf.Default()
    cfg.LoggerConfig.LogLevel = logging.LevelNone
    factory, err := client.NewSplitFactory(apikey, nil)
    if err != nil {
        panic(err.Error())
    }

    factory.BlockUntilReady(10)
    client := factory.Client()
    for {
        fmt.Println(client.Treatment("sarasa", "feature", nil))
        time.Sleep(1 * time.Second)
    }
}

Providing a proper apikey and toggling changes from the UI results in the correct treatment being printed. Does this issue happen immediately? does it happen always (are you fully unable to get flags if you don't provide a logger) ? can you provide more details on where this is being run (OS, Arch, container infrastructure, etc) so we can try to replicate as close as possible? Also if you can provide us with instructions on how to build/run and validate the repo you sent us, we can try to replicate the issue with it.

Thanks, Martin.

jwilander commented 3 years ago

Thanks for the quick response @mredolatti. Much appreciated.

We're running our application on Kubernetes (currently 1.19) using Docker images based off Alpine. You can see the full Dockerfile here. All our MM app servers run in a deployment with multiple replicas where one of the pods is the leader and is responsible for running jobs such as the feature flag synchronizer.

How we set up and use the split SDK is the following:

  1. We configure our MM app server with a Split API key which gets passed through to create a FeatureFlagSynchronizer which in turns creates a factory from the SDK here https://github.com/mattermost/mattermost-server/blob/master/config/feature_flags.go#L45
  2. We have an additional config setting that we call DebugSplit that we can enable or disable for our MM app servers. All this setting does is decide whether we initialize a logger for the Split Factory or whether we set it the LogLevel to off as seen here https://github.com/mattermost/mattermost-server/blob/master/app/feature_flags.go#L60

What I'm observing:

When we have the API key configured but DebugSplit set to false, the MM app server gets the treatment for the feature flag no problem the first time but if I come back a few hours or so later and try to update one of the feature flags in the Split console, the app server isn't detecting those changes in the feature flags. Sometimes making feature flag updates immediately after the app server starts works fine. It's not really until I come back hours or even a day later when the updates to the flags stop getting picked up.

If I then go and set DebugSplit to true, the problem seems to disappear and I can update the feature flags and the app server picks up on it no problem.

It's possible that the issue is on our side and the DebugSplit being set to true or false is a red herring but it has reproduced fairly consistently and I have yet to see the issue when DebugSplit was set to false.

I wonder if you left your POC running for a day or so, and then updated the flag, if the issue would reproduce or not.

jwilander commented 3 years ago

Also we're on 6.0.2 of the SDK https://github.com/mattermost/mattermost-server/blob/master/go.mod#L109

mredolatti commented 3 years ago

@jwilander Fair enough.

We have QoS servers which run our sdk over a long period of time and are continuously validating that the correct flags are being served. I will update those to run with logging set to LevelNone so we can monitor this scenario for a couple of days and let you know if something goes wrong. Also FYI, we are investigating possible delays for changes to reflect on CDN (haven't 100% confirmed yet). There's a possible edge case where changes propagation may take some time for isolated instances, so there's a chance that this issue might be related to that as well.

Still, the though of our sdk not working properly in certain log levels freaks me out, so i'll make sure that scenario gets validated properly.

Thank you very much for all the information, and once again, sorry for all the inconveniences caused! Martin

jwilander commented 3 years ago

Thanks @mredolatti. It'll be very useful to know if you're able to reproduce the issue or not. If not then there's likely something specific to our implementation causing the issue and at that point we can dig into it some more on our side.

jwilander commented 3 years ago

@mredolatti were you able to see any issues after monitoring your QoS servers?

mredolatti commented 3 years ago

Hi @jwilander , we've kept an app running for 5 days now, which gets updates periodically and evaluates treatments to validate everything's working as expected (with logging set to None), and haven't been able to reproduce. Does it happen if instead of logging to stdout you do it to a file for example?

agustinona commented 2 years ago

Hi @jwilander,

This issue has been inactive for a long time now and we haven't been able to reproduce in our testing environment. Could you please provide further information that may help us reproduce this?

There have been multiple releases with several improvements in the meantime. Are you still seeing this issue in the current (6.1.7) version?

If you are no longer seeing this issue or have no way to reproduce it we'll close.

Best, Agus

sanzmauro commented 1 year ago

I'm going to close the issue since it's inactive.

Feel free to re open it if you need further assistance.