Closed farshadahmadi closed 1 year ago
Hi @farshadahmadi , looks like a streaming feature error, is it possible to restart the app and enable debug logging for GO SDK?
Thanks Bilal
The debug log:
ERROR - 2020/12/17 00:48:19 EOF
INFO - 2020/12/17 00:48:19 SSE streaming exiting
ERROR - 2020/12/17 00:48:19 Error reading streaming buffer
INFO - 2020/12/17 00:48:19 Stopping Push Services
INFO - 2020/12/17 00:48:19 Stopped streaming
DEBUG - 2020/12/17 00:48:19 [GET] https://auth.split.io/api/auth
DEBUG - 2020/12/17 00:48:19 Authorization [ApiKey]: a72t...sbml
DEBUG - 2020/12/17 00:48:19 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]]
INFO - 2020/12/17 00:48:19 SSE OK
DEBUG - 2020/12/17 00:48:19 Started SplitUpdateWorker
DEBUG - 2020/12/17 00:48:19 Started SegmentUpdateWorker
INFO - 2020/12/17 00:48:19 SSE Streaming is ready
DEBUG - 2020/12/17 00:48:19 [GET] https://sdk.split.io/api/splitChanges?since=1608147764506
DEBUG - 2020/12/17 00:48:19 Authorization [ApiKey]: a72t...sbml
DEBUG - 2020/12/17 00:48:19 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]]
DEBUG - 2020/12/17 00:48:19 LINE: data: {\"id\":\"MEJGTem2aE:0:0\",\"timestamp\":1608166099927,\"encoding\":\"json\",\"channel\":\"[?occupancy=metrics.publishers]control_sec\",\"data\":\"{\\\"metrics\\\":{\\\"publishers\\\":0}}\",\"name\":\"[meta]occupancy\"}
So after enabling the debug log, streaming issue still happens, but it somehow manages to survive!
Hi @farshadahmadi, I think this is expected default error and can be ignored, I will double check with developers.
Hi @farshadahmadi, I checked SDK version 5.3.0 and don't see the error on my end, as Streaming worked without errors.
Which version are you using?
I am using the latest legacy 5 version (5.3.0)
github.com/splitio/go-client v0.0.0-20201006181645-1804d30cc2c2
github.com/splitio/go-split-commons v0.0.0-20201002201936-3e00dab75688 // indirect
github.com/splitio/go-toolkit v0.0.0-20200908212815-d49116c52af2
And as I mentioned previously, I used both legacy 5 and v6; Both have the same problem! It is strange that without debug log, synchronization fails, and with it, synchronization works! Moreover, How the following two ERROR lines are expected behaviour when they are clearly logged as errors?
ERROR - 2020/12/17 00:48:19 EOF
INFO - 2020/12/17 00:48:19 SSE streaming exiting
ERROR - 2020/12/17 00:48:19 Error reading streaming buffer
Thanks @farshadahmadi , this is indeed strange, could you provide the code that uses the SDK?
Hi @farshadahmadi , we released go sdk 6.0.1 to eliminate this error message, could you please try it out and let us know if it fixes the issue?
Thanks Bilal
Hi @farshadahmadi , I am closing out this ticket. If there is still an open issue or if we can be of further assistance on this topic, feel free to respond to this email and the ticket will be re-opened.
Thanks Bilal
Hi. back after Christmas holidays :D
After upgrading to v6.0.1, I still get the error:
ERROR - 2021/01/07 12:22:51 EOF
ERROR - 2021/01/07 12:22:51 Error reading streaming buffer
ERROR - 2021/01/07 12:22:51 Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc00070b2b0 code:0xc00087acf0 statusCode:0xc00087acf8 href:0xc00070b2d0}
Unfortunately I cannot share the code. But I basically use it the same way as official documentation. If you really need it, I can develop a mini application and share the code.
The mini app is now available. Please follow readme to run the app. After a while you will see sth like:
2021/01/08 10:21:58 Serving miniapp at http://[::]:8050
ERROR - 2021/01/08 11:21:58 Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc001e50190 code:0xc000390178 statusCode:0xc000390180 href:0xc001e501a0}
ERROR - 2021/01/08 11:21:58 EOF
ERROR - 2021/01/08 11:21:58 Error reading streaming buffer
As you can see from the time stamp, most of the times it happens exactly after 1 hour.
Hi.
I tried the latest version but I had the same problem. I also developed a mini app that has the same problem. Two more comments are already added to the issue ticker. Please re-open the ticket.
BR, Farshad
On Thu, Jan 7, 2021 at 12:46 AM Bilal Al-Shahwany notifications@github.com wrote:
Hi @farshadahmadi https://github.com/farshadahmadi , I am closing out this ticket. If there is still an open issue or if we can be of further assistance on this topic, feel free to respond to this email and the ticket will be re-opened.
Thanks Bilal
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/splitio/go-client/issues/130#issuecomment-755763233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJZJHWIO635AMKOIQC2YRTSYTR5ZANCNFSM4U36L2CQ .
-- Farshad Ahmadi Ghohandizi
Thanks @farshadahmadi will checkout the app.
Hi @farshadahmadi , I don't see the issue on my end running your mini service, I used my "sdk" type API key and tried calculating a split treatment with default rule set to "off", then changed the rule to "on", both treatments were calculated correctly, see log below.
Can you provide your org name? I can run the same app against your org and see if there is an issue there.
Bilal-A-MacBook-Pro:miniapp-master bilalal-shahwany$ go run cmd/main.go DEBUG - 2021/01/12 10:36:47 [GET] https://sdk.split.io/api/splitChanges?since=-1 DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:36:47 [GET] https://sdk.split.io/api/splitChanges?since=1610476386851 DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:36:47 Segment Sync [] DEBUG - 2021/01/12 10:36:47 SyncAll Ready INFO - 2021/01/12 10:36:47 Start Streaming DEBUG - 2021/01/12 10:36:47 No impressions fetched from queue. Nothing to send DEBUG - 2021/01/12 10:36:47 [GET] https://auth.split.io/api/auth DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 No events fetched from queue. Nothing to send DEBUG - 2021/01/12 10:36:47 [POST] https://events.split.io/api/metrics/times DEBUG - 2021/01/12 10:36:47 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json] Splitsdkmachineip:[192.168.1.9] Splitsdkmachinename:[ip-192-168-1-9] Splitsdkversion:[go-6.0.1]] DEBUG - 2021/01/12 10:36:47 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] 2021/01/12 10:36:47 Serving miniapp at http://[::]:8050 DEBUG - 2021/01/12 10:36:48 [POST] https://events.split.io/api/metrics/counters DEBUG - 2021/01/12 10:36:48 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:48 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json] Splitsdkmachineip:[192.168.1.9] Splitsdkmachinename:[ip-192-168-1-9] Splitsdkversion:[go-6.0.1]] INFO - 2021/01/12 10:36:48 SSE OK DEBUG - 2021/01/12 10:36:48 Started SplitUpdateWorker DEBUG - 2021/01/12 10:36:48 Started SegmentUpdateWorker INFO - 2021/01/12 10:36:48 SSE Streaming is ready DEBUG - 2021/01/12 10:36:48 [GET] https://sdk.split.io/api/splitChanges?since=1610476386851 DEBUG - 2021/01/12 10:36:48 Authorization [ApiKey]: f9sf...i32t DEBUG - 2021/01/12 10:36:48 Headers: map[Accept-Encoding:[gzip] Content-Type:[application/json]] DEBUG - 2021/01/12 10:36:48 Segment Sync [] DEBUG - 2021/01/12 10:36:48 LINE: data: {"id":"-1bc5lwBo7:0:0","timestamp":1610476608492,"encoding":"json","channel":"[?occupancy=metrics.publishers]control_pri","data":"{\"metrics\":{\"publishers\":2}}","name":"[meta]occupancy"}
DEBUG - 2021/01/12 10:36:48 LINE: data: {"id":"Xo7zmvAB02:0:0","timestamp":1610476608492,"encoding":"json","channel":"[?occupancy=metrics.publishers]control_sec","data":"{\"metrics\":{\"publishers\":0}}","name":"[meta]occupancy"}
DEBUG - 2021/01/12 10:36:48 Presence event received
DEBUG - 2021/01/12 10:36:48 Presence event received
DEBUG - 2021/01/12 10:36:56 Building WhitelistMatcher with negate=false, whitelist=[5af0d23bf5d7df5cc4782575 dave], attributeName=
DEBUG - 2021/01/12 10:37:16 Update event received
DEBUG - 2021/01/12 10:37:16 Incoming Notification: &{MzE2MzAwMjk5Mg==_Mzk3OTY3Mjg5NQ==_splits 0xc001ed6568
Thank you for checking the app. Two notes:
Hi @farshadahmadi, sorry to hear you're having issues, could you please share the contents of your go.mod file after updating to v6.0.1? Thank you very much for your patience.
Also, an example of how you import our sdk in your code would be appreciated. Thanks!
Hi @mredolatti , customer provided a mini app, check it out here: https://github.com/farshadahmadi/miniapp
Hi @farshadahmadi, I ran the app for 2 hours, with logging set to Info, here is the output: NFO - 2021/01/19 14:21:31 Start Streaming 2021/01/19 14:21:31 Serving miniapp at http://[::]:8050 INFO - 2021/01/19 14:21:32 SSE OK INFO - 2021/01/19 14:21:32 SSE Streaming is ready INFO - 2021/01/19 15:11:32 Token expired INFO - 2021/01/19 15:11:32 Stopping Push Services INFO - 2021/01/19 15:11:32 Stopped streaming INFO - 2021/01/19 15:11:32 Shutting down listener INFO - 2021/01/19 15:11:32 SSE streaming exiting INFO - 2021/01/19 15:11:32 SSE OK INFO - 2021/01/19 15:11:32 SSE Streaming is ready INFO - 2021/01/19 16:01:32 Token expired INFO - 2021/01/19 16:01:32 Stopping Push Services INFO - 2021/01/19 16:01:32 Stopped streaming INFO - 2021/01/19 16:01:32 Shutting down listener INFO - 2021/01/19 16:01:32 SSE streaming exiting INFO - 2021/01/19 16:01:32 SSE OK INFO - 2021/01/19 16:01:32 SSE Streaming is ready
I do see the streaming token getting expired every 50 minutes, and get renewed, at the same time. I also kept running the curl to get the treatment and it always worked.
Hi @farshadahmadi, which region are you running your app? I just want to confirm if this is not related to network latency.
Thanks Bilal
It seems that you enabled the INFO log level. I have not yet tested that, but I assume enabling any log level except the default one will somehow make the DSK survive the error; as we already saw that with DEBUG log level enabled it worked.
I am in Finland. But I tested it in both my local computer and also in cloud (running in Europe). And both had the same issue.
Thanks @farshadahmadi , if I disable logging, I do see the error, will check with the team.
2021/01/27 11:26:34 Serving miniapp at http://[::]:8050
ERROR - 2021/01/27 12:26:34 EOF
ERROR - 2021/01/27 12:26:34 Error received: {id:
Really strange bug. Confirm setting logging to any level besides default "error" let pushManager resume sync after token expires. Also, setting a custom logger with the workaround leads to the same buggy behavior. Still no luck with finding a cause. I will glad to participate and fix it asap.
Good day @chillaq . Any news on the defect? So far I can state that setting the logging level affects the only created logger itself. So I switched back to a custom implementation having the same positive outcomes. Also added a workaround to reinitialize the client factory. It helps but unfortunately, there are more cases SSE does not resume correctly. Seems the problem in runtime latencies.
Here is some trace with mitigation steps succeed:
Info 2021-02-09 00:20:57.863 GMT splitio: SSE Streaming is ready
Info 2021-02-09 00:20:57.868 GMT splitio: Start periodic polling in Streaming
Info 2021-02-09 00:20:57.869 GMT splitio: Stop periodic polling
Info 2021-02-09 01:10:57.521 GMT splitio: Token expired
Error 2021-02-09 01:20:57.049 GMT splitio: EOF
github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do.func2(0xc001cca790, 0xc0014760e0, 0xc000f93a40, 0xc000f93aa0)
/usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:177 +0x1bf
created by github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do
/usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:172 +0x6f1
Warning 2021-02-09 01:20:57.049 GMT Malfunction detected: reinitializing splitio clients
Info 2021-02-09 01:20:57.049 GMT splitio: STOPPING MANAGER TASKS
Info 2021-02-09 01:20:57.049 GMT splitio: Stopping Push Services
Info 2021-02-09 01:20:57.049 GMT splitio: Stopped streaming
Info 2021-02-09 01:20:57.049 GMT splitio: Shutting down listener
Info 2021-02-09 01:20:57.049 GMT splitio: SSE streaming exiting
Error 2021-02-09 01:20:57.073 GMT splitio: Error received: {id:<nil> timestamp:<nil> encoding:<nil> data:<nil> name:<nil> clientID:<nil> event:error channel:<nil> message:0xc0017b4250 code:0xc000581b68 statusCode:0xc000581b80 href:0xc0017b4260}
github.com/splitio/go-split-commons/v2/push.(*EventHandler).HandleIncomingMessage(0xc00146b5c0, 0xc0016b9380)
/usr/go-workspace/pkg/mod/github.com/splitio/go-split-commons/v2@v2.0.1/push/eventhandler.go:84 +0x3f1
github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do.func3(0xc001504c80, 0xc001cca770, 0xc0016b9380)
/usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:208 +0x58
created by github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do
/usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:206 +0x905
Info 2021-02-09 01:20:58.965 GMT splitio: Start Streaming
Info 2021-02-09 01:20:59.437 GMT splitio: SSE OK
Info 2021-02-09 01:20:59.437 GMT splitio: SSE Streaming is ready
And one that stalled:
Info 2021-02-05 00:30:58.044 GMT splitio: SSE Streaming is ready
Info 2021-02-05 00:30:58.046 GMT splitio: Start periodic polling in Streaming
Info 2021-02-05 00:30:58.046 GMT splitio: Stop periodic polling
Error 2021-02-05 01:12:58.113 GMT splitio: read tcp x.x.x.x:50158->13.249.13.41:443: read: connection reset by peer
github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do.func2(0xc001199cc0, 0xc000276150, 0xc001c3e6c0, 0xc001c3e720)
/usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:177 +0x1bf
created by github.com/splitio/go-toolkit/v3/sse.(*SSEClient).Do
/usr/go-workspace/pkg/mod/github.com/splitio/go-toolkit/v3@v3.0.1/sse/sse.go:172 +0x6f1
Info 2021-02-05 01:12:58.113 GMT splitio: SSE streaming exiting
Error 2021-02-05 01:12:58.113 GMT splitio: Error reading streaming buffer
github.com/splitio/go-split-commons/v2/service/api/sse.(*StreamingClient).ConnectStreaming.func2(0xc00287a550, 0xc0010a7920)
/usr/go-workspace/pkg/mod/github.com/splitio/go-split-commons/v2@v2.0.1/service/api/sse/client.go:97 +0x462
created by github.com/splitio/go-split-commons/v2/service/api/sse.(*StreamingClient).ConnectStreaming
/usr/go-workspace/pkg/mod/github.com/splitio/go-split-commons/v2@v2.0.1/service/api/sse/client.go:70 +0x251
Info 2021-02-05 01:20:57.699 GMT splitio: Token expired
<no further attempts to reconnect>
Hello @farshadahmadi @AndriiIlika,
We're currently working on a fix for this. We're expecting it to be released by the end of this week or monday. Thank you very much for your patience, Martin.
Hi @farshadahmadi and @AndriiIlika , Sorry for the late check-in. We've been doing quite a revamp of the sse/push subsystem of our SDK trying to minimise all possible issues. While we are not in a position to an official release yet (we're reworking our test suites as well), we would like to let you know that we have an RC that's been working for days (both in our staging environment and QoS applications, and in the miniapp that @farshadahmadi sent us. The SDK has been working proerly, performing syncrhonization as expected regardless of the SDK config.
If you want to test it in your staging environments, (feedback is always more than welcome), you can get it by issuing go get github.com/splitio/go-client/v6@v6.0.2-rc4
in your repos.
Please keep in mind that we've bumped the major versions of our internal libraries (toolkit and commons) for this release. So if you're setting up the logger, you might want to replace "github.com/splitio/go-toolkit/v3/logging"
for github.com/splitio/go-toolkit/v4/logging"
as well.
In case you're interested in taking a look at the changes, this are the PRs in our libraries:
Thank you very much for all your patience. We are hoping to do an official release by wednesday if everything is ok. Should you find anything that's causing you trouble or not working as expected, please let us know.
Thanks! Martín.
Hi @mredolatti.
Testing release candidate and it works well so far. Waiting for release!
Hi guys, sorry for the delay. I got sick and had to stay in bed for a couple of days. I've resumed work today and just pushed the last pending changes. I've sent this new rc to our staging QoS and if everything is well we will most likely be releasing tomorrow.
Thank you very much for your patience Regards, Martin.
Hi @AndriiIlika @farshadahmadi , we've just released v6.0.2
which should fix these problems you've been having as well as some other stuff.
Thank you VERY much for your patience and sorry for all the inconveniences caused. Please let me know if you're still having issues after upgrading.
Regards, Martin.
Thanks @farshadahmadi for the report and @mredolatti for the fix. Seems we were running into this too.
Hi @jwilander , sorry to hear you had issues as well. Please let me know if after updateing something's still not working as expected. Cheers, Martin.
@mredolatti I'm not sure if it's the exact same issue but we're running into an issue where the Go SDK isn't getting updates to feature flags when the synchronizer is configured with nil
for Log
.
For example, here https://github.com/mattermost/mattermost-server/blob/master/app/feature_flags.go#L77 when log
is nil
.
When it's set to nil, the synchronizer doesn't seem to get any updates to the feature flags. As soon as we give the synchronizer a logger that's not nil it seems to work fine.
Is a non-nil value for Log
required? The reason we don't always provide one is because we find the logs a bit noisy and we're trying to keep our logs a little quieter.
Let me know if I should open a separate issue for this.
Hi @jwilander, sorry to hear that. a non-nil value for log should definitely not required.
The following POC is equivalent to your intialization snippet (logging.LevelNone == math.MinInt32) and it seems to work properly:
package main
import (
"fmt"
"math"
"time"
"github.com/splitio/go-client/v6/splitio/client"
"github.com/splitio/go-client/v6/splitio/conf"
)
func main() {
apikey := ""
cfg := conf.Default()
cfg.LoggerConfig.LogLevel = logging.LevelNone
factory, err := client.NewSplitFactory(apikey, nil)
if err != nil {
panic(err.Error())
}
factory.BlockUntilReady(10)
client := factory.Client()
for {
fmt.Println(client.Treatment("sarasa", "feature", nil))
time.Sleep(1 * time.Second)
}
}
Providing a proper apikey and toggling changes from the UI results in the correct treatment being printed. Does this issue happen immediately? does it happen always (are you fully unable to get flags if you don't provide a logger) ? can you provide more details on where this is being run (OS, Arch, container infrastructure, etc) so we can try to replicate as close as possible? Also if you can provide us with instructions on how to build/run and validate the repo you sent us, we can try to replicate the issue with it.
Thanks, Martin.
Thanks for the quick response @mredolatti. Much appreciated.
We're running our application on Kubernetes (currently 1.19) using Docker images based off Alpine. You can see the full Dockerfile here. All our MM app servers run in a deployment with multiple replicas where one of the pods is the leader and is responsible for running jobs such as the feature flag synchronizer.
How we set up and use the split SDK is the following:
What I'm observing:
When we have the API key configured but DebugSplit set to false, the MM app server gets the treatment for the feature flag no problem the first time but if I come back a few hours or so later and try to update one of the feature flags in the Split console, the app server isn't detecting those changes in the feature flags. Sometimes making feature flag updates immediately after the app server starts works fine. It's not really until I come back hours or even a day later when the updates to the flags stop getting picked up.
If I then go and set DebugSplit to true, the problem seems to disappear and I can update the feature flags and the app server picks up on it no problem.
It's possible that the issue is on our side and the DebugSplit being set to true or false is a red herring but it has reproduced fairly consistently and I have yet to see the issue when DebugSplit was set to false.
I wonder if you left your POC running for a day or so, and then updated the flag, if the issue would reproduce or not.
Also we're on 6.0.2 of the SDK https://github.com/mattermost/mattermost-server/blob/master/go.mod#L109
@jwilander Fair enough.
We have QoS servers which run our sdk over a long period of time and are continuously validating that the correct flags are being served. I will update those to run with logging set to LevelNone
so we can monitor this scenario for a couple of days and let you know if something goes wrong. Also FYI, we are investigating possible delays for changes to reflect on CDN (haven't 100% confirmed yet). There's a possible edge case where changes propagation may take some time for isolated instances, so there's a chance that this issue might be related to that as well.
Still, the though of our sdk not working properly in certain log levels freaks me out, so i'll make sure that scenario gets validated properly.
Thank you very much for all the information, and once again, sorry for all the inconveniences caused! Martin
Thanks @mredolatti. It'll be very useful to know if you're able to reproduce the issue or not. If not then there's likely something specific to our implementation causing the issue and at that point we can dig into it some more on our side.
@mredolatti were you able to see any issues after monitoring your QoS servers?
Hi @jwilander , we've kept an app running for 5 days now, which gets updates periodically and evaluates treatments to validate everything's working as expected (with logging set to None), and haven't been able to reproduce. Does it happen if instead of logging to stdout you do it to a file for example?
Hi @jwilander,
This issue has been inactive for a long time now and we haven't been able to reproduce in our testing environment. Could you please provide further information that may help us reproduce this?
There have been multiple releases with several improvements in the meantime. Are you still seeing this issue in the current (6.1.7) version?
If you are no longer seeing this issue or have no way to reproduce it we'll close.
Best, Agus
I'm going to close the issue since it's inactive.
Feel free to re open it if you need further assistance.
After a while, GO SDK logs the following:
and after that synchronization task will not work anymore; meaning that If splits' setting change via split dashboard, split sdk will not consider those changes anymore. I tried both v6 and legacy v5 and both has the same problem. It does not even fallback to use polling mechanism.