open-feature / spec

OpenFeature specification
https://openfeature.dev
Apache License 2.0
595 stars 35 forks source link

feat: emit events on context change (client-only) #200

Closed Billlynch closed 6 months ago

Billlynch commented 11 months ago

This PR adds points to specify the behaviour of on context changed with respect to eventing.

Billlynch commented 11 months ago

I wonder if we should also add the provider name into the provider event details?

If you have multiple providers and you change the context. One provider may take more time than the other to emit the ready event. If you've subscribed to the event via the OpenFeature.addHandler() function instead of on the provider/client itself you don't know if the provider you're about to evaluate against is ready or not.

toddbaert commented 11 months ago

I wonder if we should also add the provider name into the provider event details?

If you have multiple providers and you change the context. One provider may take more time than the other to emit the ready event. If you've subscribed to the event via the OpenFeature.addHandler() function instead of on the provider/client itself you don't know if the provider you're about to evaluate against is ready or not.

Agreed.

This is one of the changes proposed here.

Something worth mentioning is one of the design principles of the SDKs is for application authors (the persona most concerned with actually evaluating flags) not to worry about providers at all. Ideally, they just use a client, which is bound to a provider that the integrator configured. We'd expect event handlers added to the API to be used most for telemetry, error-reporting, etc. We'd discourage application authors to add global handlers, check if their event is associated with "their" provider, and then take some action. Using a client event handler would be a better solution for that.

beeme1mr commented 11 months ago

Hey @Billlynch, thanks for the PR. When you have a moment, could you please respond to the open threads? It would be great to get some clarification so we can work through this spec enhancement. Thanks!

Kavindu-Dodan commented 11 months ago

Given that we have considerable state changes, I would like to see an updated visualization of the state changes of a provider.

I see this fits well with our current state visualization here ^1

In general, my understanding is that the Provider can go from ready state to stale and back to ready state once remediation for context change is done. Further, we have error status that can be reached from ready status as well as from stale status

Context change: Ready -> Stale -> Ready Context change: Ready -> Stale -> Error .... -> Ready

toddbaert commented 10 months ago

I'm struggling with one corollary to this.

I think we need to add a STALE state to keep things consistent (that's addressed here). But I'm a bit torn on exactly how we want the provider state to be managed. At the moment, providers' state is mutable only from within, it's readonly from the outside. If we implement this proposal as things stand, we'd expect authors to set the state to STALE while their provider's onContextChanged is run. This seems to be incongruent to the fact that the SDK will be emitting the STALE event on the provider's behalf.

I feel like it should be all-or-nothing to be consistent - either the SDK should fire these events and manage the state itself, or the provider should be responsible for both.

I'm wondering if it wouldn't be better to have the state mutable from the outside, and have the SDK set it after init/error/onContextChanged. This would reduce the burden on providers (though we might have to be careful to make sure the change is as "non-breaking" as possible.

Thoughts?

Billlynch commented 10 months ago

I see your point.

I think a nice clear thing would be for the providers to emit the event PROVIDER_STALE and set the state to a new STALE state.

This then can actually work a bit more like the original proposal of only emitting if the provider decides the old and new contexts are actually different.

So the state of the provider can be:

1. NOT_READY -> READY
2. NOT_READY -> ERROR
3. ERROR -> READY
4. READY -> STALE
5. STALE -> READY
6. STALE -> ERROR

with corresponding events being fired:

1. PROVIDER_READY
2. PROVIDER_ERROR
3. PROVIDER_READY
4. PROVIDER_STALE
5. PROVIDER_READY
6. PROVIDER_ERROR

This aligns the setting of the status of the provider, and emission of the events under the same logic, and the same domain as that which will actually make a cache update if needed.

What do you think?

toddbaert commented 10 months ago

I see your point.

I think a nice clear thing would be for the providers to emit the event PROVIDER_STALE and set the state to a new STALE state.

This then can actually work a bit more like the original proposal of only emitting if the provider decides the old and new contexts are actually different.

So the state of the provider can be:

1. NOT_READY -> READY
2. NOT_READY -> ERROR
3. ERROR -> READY
4. READY -> STALE
5. STALE -> READY
6. STALE -> ERROR

with corresponding events being fired:

1. PROVIDER_READY
2. PROVIDER_ERROR
3. PROVIDER_READY
4. PROVIDER_STALE
5. PROVIDER_READY
6. PROVIDER_ERROR

This aligns the setting of the status of the provider, and emission of the events under the same logic, and the same domain as that which will actually make a cache update if needed.

What do you think?

I think this is a coherent proposal. We push a lot of responsibility onto the provider, but I think that's a safer move. On the SDK side, we would need to do our best to rigorously document the purpose of STALE and that is should be used when context is mutated in client implementations.

It would also be useful in server implementations that cache rulesets or flag values, which is a nice in that it maintains semantic consistency between both SDKs.

@Billlynch I think if you agree you should update the PR as above, then I'll approve it. You also might want to have a quick review yourself of https://github.com/open-feature/spec/pull/196, since it includes STALE. Make sure that all makes sense to you. Then we can merge that PR, followed with this one if nobody objects.

If you do make significant changes, please dismiss @Kavindu-Dodan 's approval.

toddbaert commented 10 months ago

@Billlynch I've invited you to the org. Please comment here if you'd like to join (no obligation, but this will allow you to be pinged in PRs, issues, etc).

EDIT: I merged the above PR, which created some non-substantial conflicts with your branch. I fixed them, I hope you don't mind!

beeme1mr commented 10 months ago

Hey @Billlynch, could you please update this proposal based on this comment? That should be the last remaining issue because it's approved and merged. 👍

toddbaert commented 10 months ago

Hey @Billlynch I'm hoping to release a spec v0.7.0 this week. If you have the time to update this PR we can include this content in the release - otherwise we can do another after this is merged.

toddbaert commented 6 months ago

@Billlynch

I've come back to this PR after a lot of thought based on my experience in the React SDK. I have squashed your commits into one, and then added my own, which makes the STALE transition optional (some providers might cache local rulesets and never go STALE, so I think we can leave it up to them) and uses a new PROVIDER_CONTEXT_CHANGED event.

I think this new event is needed to specifically differentiate from READY events (which don't indicate any flags changed) and PROVIDER_CONFIGURATION_CHANGED which specifically imply the change was propagated from the management system. PROVIDER_CONTEXT_CHANGED are unique to the client-side, and particularly relevant for prompting repaints in frameworks like React.

I have included an intro to the new section you added, as well as a mermaid diagram. Please take a look, and thanks a lot for your farsightedness here! :heart:

image

cc @lukas-reining @thomaspoignant @kinyoklion @jonathannorris @beeme1mr @Kavindu-Dodan

toddbaert commented 6 months ago

Semantically this updated set of events looks good to me

I am not sure if we agreed which component should fire the events, if the SDK or the Provider (or both?). But this is probably an implementation detail that goes outside the scope of this PR

In this case I did mention that (see last 2 points):

The PROVIDER_CONTEXT_CHANGED is not emitted from the provider itself; the SDK implementation must run the PROVIDER_CONTEXT_CHANGED handlers if the on context changed function terminates normally.

toddbaert commented 6 months ago

Looks like we have a decent consensus here. I'll merge this tomorrow, EOD unless I hear objections!