What happens if statsig is down?

twk commented 3 years ago

I didn't see this addressed in the docs at all, but what happens if statsig is down?

For the client libraries, I see https://api.statsig.com/v1/initialize being fetched with what looks like user-specific data. What happens if that doesn't load?

For the server side, I assume that the initialize call doesn't finish until the library has actually downloaded some rules from your servers. Does everything just default to OFF until that happens?

tore-statsig commented 3 years ago

Hi @twk - good question. I've been meaning to add specific callouts for things like this in the docs as this isn't the first time it's been asked. I'll split my answer into two parts, server and client SDKs.

I’ll answer for server SDKs first. Server SDKs download the set of rules and conditions for each gate and dynamic config when you call initialize. If Statsig is offline at that point, the SDK call statsig.initialize() will resolve and the SDK will keep trying to connect in the background. During this period, Gates will return false, and Configs will return an empty object.

Once a successful initialization happens, if Statsig goes down at any point after that, it will continue to evaluate gates and configs in the SDK itself. It’s not reliant on that network connection. Event logs will continue to batch and retry sending over time as well. Any updates you make to a gate in that time (e.g. flipping something off, making a new one) obviously wont be reflected until a network connection is restored. But the SDK will handle that automatically, and get all the updates once Statsig is back online.

Client SDKs are similar. If the first initialize call fails, the SDK will retry in the background. The benefit to client SDKs is we use local storage to cache previous results - so if this is a user we have seen before on this browser/mobile app, we can serve results from a previous successful initialization while we retry the initialize call in the background. Those could be stale, but they are likely a better default than false and {}.

In terms of availability, refer to Statsig's status page: https://status.statsig.com/

twk commented 3 years ago

Great, thank you! Have you thought about providing the ability to maintain a cache of rules in the customer's cluster somehow? I can imagine that our services might not function with just the defaults (particularly if we have complex config stored in statsig), so it worries me that I effectively might not be able to deploy new code if statsig is down.

I'm assuming the rulesets are versioned somehow, so keeping the latest one in a local redis or other database might be pretty straightforward. You could even just expose the ability to register some load/store callbacks that customers could implement.

tore-statsig commented 3 years ago

We have definitely considered a server-side caching solution, but just have not gotten around to implementing one yet. I agree that with a complex dynamic config, it makes it even more important (designing around false for gates is much more straightforward). How important of a feature is this for you?

Also, which server SDK language are you looking at? We will discuss as a team and can prioritize that language first

twk commented 3 years ago

We're still trying to pick a feature gating service, but this is very important to us. We're using node/typescript. What kind of timeframe do you think this might happen in?

jkw-statsig commented 3 years ago

@twk we can provide a way to load/export rules (in json) this week if needed. Right now we are thinking about two new things:

a function to export the rules, which you can just store as a json string somewhere
a new parameter in our initialize() call, which takes the json string in 1) to be used to initialize the rules. This will allow us to be still functional in the very rare case that we are down when your server is starting, and also resolve the initialize call sooner. We do still make the http calls to fetch and update the latest rules from our server even if this parameter is provided, but they will happen in the background and not block initialize.

Does this sound good to you?

twk commented 3 years ago

Wow, fast! It isn't essential it happens this week as long as we have a commitment it will happen at some point in the next month or so.

This sounds right as a starting point, assuming I get some kind of version along with the rules that I can use to know what is latest.

The ideal would be something like: give me the ability to register an async callback that takes two parameters, a version (eg, logical timestamp) and a json string of rules. This callback would be triggered whenever your library fetches rules from the server, and we would use the version to decide whether to update our cache or discard the update.

Does that make sense? You could potentially just have a callback that triggers when there are new rules and a separate function to actually fetch them. I just don't want to have to poll the library for updates if we can avoid that.

jkw-statsig commented 3 years ago

Great suggestions, love it!

So the change will be that in the options object in our initialize() call, we will add 2 new options:

bootstrapValue - a JSON string for rules
rulesUpdatedCallback - a callback that's called whenever we have a update for the rules, and it provides the two values you mentioned, a logical timestamp and a JSON string

jkw-statsig commented 3 years ago

Hi @twk - we just released the v3.2.0 of the Node SDK and it includes the 2 new parameters for options to be included in initialize(). Feel free to give it a try whenever you get a chance and let us know if you have any feedback or encounter any issues!

twk commented 3 years ago

Amazing! We will give this a shot when we start integrating, hopefully very soon.

jkw-statsig commented 3 years ago

Awesome! Closing this issue for now.

statsig-io / statsig-feedback

What happens if statsig is down? #4