feat: internal provider state, new client events

toddbaert commented 4 months ago

moves provider state into SDK (SDK now maintains provider state/lifecycle; provider interface is now "stateless")
refine semantics around client state when context is pending/reconciled by using new events/states, instead of STALE
add PROVIDER_FATAL error code

Resolves: https://github.com/open-feature/spec/issues/238

For background discussion, see https://github.com/open-feature/spec/issues/238

This should make provider authorship less error prone, and simpler. Providers essentially become stateless as far as the SDK is concerned. The provider is only obligated to optionally implement the initialize, shutdown and onContextChange functions and maintain is own private state (caches, connection, etc).

This also refines the semantics around context reconciliation, which is particularly important for client SDKs where providers need to reconcile their flags or ruleset(s) when context changes.

I believe these changes can all be implemented by SDKs in a non-breaking fashion.

beeme1mr commented 4 months ago

I think we can reduce the implementation burden on provider developers and ensure greater consistency by leveraging the provider state in a few additional ways. If we add a FATAL state to the SDK, we could short circuit evaluations within the SDK itself. Hooks would still run, but the underlying provider would not be called. The SDK could return the value specified by the developer within the evaluation itself. The reason returned to the user would be ERROR, the error type would be FATAL, and the message would be whatever the provider defined (e.g. 'invalid auth token`). This would be useful because many existing feature flag SDKs will return a default value on error, which may look like a successful evaluation in OpenFeature unless the provider explicitly handles this scenario. I believe this will lead to inconsistent behavior across implementations unless the SDK handles it.

It may be worth performing the same short circuiting within the SDK if the provider isn't ready yet. This wouldn't only affect providers that have implemented an initialization function that hasn't completed prior to a flag evaluation. This would provide many of the same benefits as described above.

Finally, the error events should also include error code. Currently, only the error message is defined in the spec.

What do you think? @liran2000 @kinyoklion @lukas-reining @thomaspoignant @fabriziodemaria @toddbaert @jonathannorris

liran2000 commented 4 months ago

I think we can reduce the implementation burden on provider developers and ensure greater consistency by leveraging the provider state in a few additional ways. If we add a FATAL state to the SDK, we could short circuit evaluations within the SDK itself. Hooks would still run, but the underlying provider would not be called. The SDK could return the value specified by the developer within the evaluation itself. The reason returned to the user would be ERROR, the error type would be FATAL, and the message would be whatever the provider defined (e.g. 'invalid auth token`). This would be useful because many existing feature flag SDKs will return a default value on error, which may look like a successful evaluation in OpenFeature unless the provider explicitly handles this scenario. I believe this will lead to inconsistent behavior across implementations unless the SDK handles it.

It may be worth performing the same short circuiting within the SDK if the provider isn't ready yet. This wouldn't only affect providers that have implemented an initialization function that hasn't completed prior to a flag evaluation. This would provide many of the same benefits as described above.

Finally, the error events should also include error code. Currently, only the error message is defined in the spec.

What do you think? @liran2000 @kinyoklion @lukas-reining @thomaspoignant @fabriziodemaria @toddbaert @jonathannorris

Sounds good to me

lukas-reining commented 4 months ago

It may be worth performing the same short circuiting within the SDK if the provider isn't ready yet. This wouldn't only affect providers that have implemented an initialization function that hasn't completed prior to a flag evaluation. This would provide many of the same benefits as described above.

This sounds good to me!

JamieSinn commented 4 months ago

I think we can reduce the implementation burden on provider developers and ensure greater consistency by leveraging the provider state in a few additional ways. If we add a FATAL state to the SDK, we could short circuit evaluations within the SDK itself. Hooks would still run, but the underlying provider would not be called. The SDK could return the value specified by the developer within the evaluation itself. The reason returned to the user would be ERROR, the error type would be FATAL, and the message would be whatever the provider defined (e.g. 'invalid auth token`). This would be useful because many existing feature flag SDKs will return a default value on error, which may look like a successful evaluation in OpenFeature unless the provider explicitly handles this scenario. I believe this will lead to inconsistent behavior across implementations unless the SDK handles it.

It may be worth performing the same short circuiting within the SDK if the provider isn't ready yet. This wouldn't only affect providers that have implemented an initialization function that hasn't completed prior to a flag evaluation. This would provide many of the same benefits as described above.

Finally, the error events should also include error code. Currently, only the error message is defined in the spec.

What do you think? @liran2000 @kinyoklion @lukas-reining @thomaspoignant @fabriziodemaria @toddbaert @jonathannorris

This I think covers the cases I was thinking about

toddbaert commented 4 months ago

These changes make sense to me and would simplify the providers we've written thus far 👍🏻

One thing to clarify (I might be missing the convo somewhere): the SDK is maintaining the state of the given provider instance for a domain that can span multiple clients, right? If the same provider instance is given to multiple domains, should those each maintain separate state? Or is that an edge-case/unsupported behavior?

I think I lean toward leaving this unspecified. Different langauges have different sematics and paradigms WRT references, pointers, etc. I think I'd rather leave the particularities of this up to implementations. We can perhaps make some helpful suggestions in non-normative sections.

toddbaert commented 4 months ago

I think we can reduce the implementation burden on provider developers and ensure greater consistency by leveraging the provider state in a few additional ways. If we add a FATAL state to the SDK, we could short circuit evaluations within the SDK itself. Hooks would still run, but the underlying provider would not be called. The SDK could return the value specified by the developer within the evaluation itself. The reason returned to the user would be ERROR, the error type would be FATAL, and the message would be whatever the provider defined (e.g. 'invalid auth token`). This would be useful because many existing feature flag SDKs will return a default value on error, which may look like a successful evaluation in OpenFeature unless the provider explicitly handles this scenario. I believe this will lead to inconsistent behavior across implementations unless the SDK handles it.

It may be worth performing the same short circuiting within the SDK if the provider isn't ready yet. This wouldn't only affect providers that have implemented an initialization function that hasn't completed prior to a flag evaluation. This would provide many of the same benefits as described above.

Finally, the error events should also include error code. Currently, only the error message is defined in the spec.

What do you think? @liran2000 @kinyoklion @lukas-reining @thomaspoignant @fabriziodemaria @toddbaert @jonathannorris

It seems like there's agreement on this, and I think it's a really nice addition. I've added this, as well as changed the name of the reconciliation state/event from CONTEXT_PENDING to RECONCILING.

See new commit.

cc @JamieSinn

toddbaert commented 4 months ago

rebased to resolve a 1-word conflict.

toddbaert commented 4 months ago

I'd like to merge this early next week if possible. Let me know if you have any concerns!

toddbaert commented 4 months ago

I'll merge this by EOD today unless I hear objections.

open-feature / spec

feat: internal provider state, new client events #241