open-feature / spec

OpenFeature specification
https://openfeature.dev
Apache License 2.0
595 stars 35 forks source link

feat: add events #182

Closed toddbaert closed 1 year ago

toddbaert commented 1 year ago

Adds events discussed in DRAFT client PR, and OFEPs.

Be sure to review specification.json for the net changes.

Most of this is implemented in the web-sdk already. Some conforming (demo) providers can be found in the playground:

dabeeeenster commented 1 year ago

I wonder if there might be need for an event type named something like PROVIDER_STALE.

Flagsmith server-side SDKs can run in a mode (local evaluation mode we call it) where it polls the API for the entire environment flag configuration and then runs the flag evaluation engine locally within the SDK runtime. By default this poll occurs every 60 seconds. If the SDK does not receive a response from the API it will fall back to the most recent configuration.

At some point this flag configuration could/should be considered stale, which would trigger the PROVIDER_STALE event. This would probably then be bubbled up to some logging/alerting system to let administrators know that something isnt right/stale flags are being served.

This event could also be used on application startup if the SDK is not able to make an initial connection to the API. This is generally a worse state to be in as you would generally be serving hard-coded defaults at this point.

I think several other providers work on a similar basis; some stream changes over websockets/server sent events.

toddbaert commented 1 year ago

I wonder if there might be need for an event type named something like PROVIDER_STALE.

Flagsmith server-side SDKs can run in a mode (local evaluation mode we call it) where it polls the API for the entire environment flag configuration and then runs the flag evaluation engine locally within the SDK runtime. By default this poll occurs every 60 seconds. If the SDK does not receive a response from the API it will fall back to the most recent configuration.

At some point this flag configuration could/should be considered stale, which would trigger the PROVIDER_STALE event. This would probably then be bubbled up to some logging/alerting system to let administrators know that something isnt right/stale flags are being served.

This event could also be used on application startup if the SDK is not able to make an initial connection to the API. This is generally a worse state to be in as you would generally be serving hard-coded defaults at this point.

I think several other providers work on a similar basis; some stream changes over websockets/server sent events.

@dabeeeenster I think that the pattern and mode of operation you describe certainly makes sense. Can you explain how you might expect an application author to react to this event type? What sort of handler would I attach for PROVIDER_STALE events that would be distinct from PROVIDER_ERROR?

dabeeeenster commented 1 year ago

@dabeeeenster I think that the pattern and mode of operation you describe certainly makes sense. Can you explain how you might expect an application author to react to this event type? What sort of handler would I attach for PROVIDER_STALE events that would be distinct from PROVIDER_ERROR?

PROVIDER_ERROR would generally be very bad, and would normally mean the application is running on either default flag values if provided, or someting like false|0|"" for all flag values. PROVIDER_ERROR would most likely cause immediate "This service is broken" alerts and require immediate operational assistance.

PROVIDER_STALE would communicate that flags are being served, and the engine is running, but the values themselves might well have drifted from the flag/provider dashboard. I would expect these errors could be ignored if they were on something like the 99.99% percentile.

toddbaert commented 1 year ago

@dabeeeenster I think that the pattern and mode of operation you describe certainly makes sense. Can you explain how you might expect an application author to react to this event type? What sort of handler would I attach for PROVIDER_STALE events that would be distinct from PROVIDER_ERROR?

PROVIDER_ERROR would generally be very bad, and would normally mean the application is running on either default flag values if provided, or someting like false|0|"" for all flag values. PROVIDER_ERROR would most likely cause immediate "This service is broken" alerts and require immediate operational assistance.

PROVIDER_STALE would communicate that flags are being served, and the engine is running, but the values themselves might well have drifted from the flag/provider dashboard. I would expect these errors could be ignored if they were on something like the 99.99% percentile.

I think I'm convinced of the value of adding PROVIDER_STALE (especially after the community meeting today). I've done that.