Research for "bulk evaluation" functionality

toddbaert commented 2 years ago

We should do a quick survey of some of the vendor SDKs and open source solutions in terms of a means of getting many flags/values in a single API call. If this seems like something that could be reasonably implemented in most providers, and it's something we could abstract, we can consider adding it to the spec.

Definition of done:

markdown doc containing comparison of a few SDKs/APIs
summary of possible challenges in implementation
issue opened in spec repo to implement, if it's something we decide we can support

beeme1mr commented 2 years ago

Would this be equivalent to a bulk flag evaluation or is this something different?

toddbaert commented 2 years ago

Would this be equivalent to a bulk flag evaluation or is this something different?

Same thing.

thomaspoignant commented 2 years ago

For the context this would be mostly used by frontend/mobile SDK

InTheCloudDan commented 2 years ago

This has evaluation implications for LaunchDarkly. Overall we strongly recommend customers to not use it. There's only a few specific use cases it should be done in because overall it generates more headaches than it fixes for seeing if flags are active.

justinabrahms commented 2 years ago

Chatted about this today and the API we discussed was something akin to:

bulkFetch({"key1": false, "key2": "value"} and this be optional for providers to implement. We would not provide a default implementation b/c of the terrifying performance concerns.

We also talked about how in some languages we could represent the key value pairs as a proper object which may unlock some other interesting use-cases.

kyle-ssg commented 1 year ago

Hi all, from a frontend developer perspective I thought I'd add my 2 cents to this:

In pretty much all cases I’ve seen, a frontend application (some SSR with frameworks like next or compiled defaults via CICD) would get the flags it needs for a page once and then the rest of the application would use that state from then on, refreshing again when it needs to. It's quite likely that a frontend page / SPA could be 20+ flags, I think that fetching these individually would be quite odd.

Also in addition to this, I don't think the approach on the frontend would necessarily be to await a bunch of flags. It would more likely to be to initialise once and access directly in a synchronous fashion.

Here's some examples of providers doing this, all of them asynchronously initialise their SDKs and then evaluate flags synchronously:

Flagsmith https://docs.flagsmith.com/clients/javascript#example-initialising-the-sdk

  await flagsmith.init(...);
  const flagValue = flagsmith.getValue('my_feature_key', {fallback: 'default value'});

Launch Darkly https://docs.launchdarkly.com/sdk/client-side/javascript#initializing-the-client

  await client.waitUntilReady(...);
  const flagValue = client.variation('my_feature_key', 'default value');

Cloudbees https://docs.cloudbees.com/docs/cloudbees-feature-management/latest/getting-started/javascript-ssr-sdk#_installing_javascript_ssr_sdk

  await Rox.setup(...);
  const flagValue = flags.my_feature_key.getValue();

moredip commented 1 year ago

It seems that there are two potentially separate design choices that we're discussing here.

Should we allow bulk evaluation
Should we allow synchronous evaluation

We should be careful to not get those two decisions conflated.

I get that they're related - in order to achieve synchronous evaluation we'd likely want some version of bulk evaluation - but we should still be treating them as separate deisions.

moredip commented 1 year ago

Regarding bulk evaluation, we will need to think about how that intersects with analytics. Flag management platforms typically want to capture an analytics event whenever a flag is evaluated, as a way of tracking flag usage (to detect stale flags) and as a cheap way to instrument multivariate experiments. If a client evaluates all flags at once then we'll probably want some other way to detect when a specific flag is actually used.

dabeeeenster commented 1 year ago

I can't speak for other providers, but the way Flagsmith works is that there are 2 discrete stages.

Flag Retrieval. All the Flags for the Environment are received in a single call - 'getFlags()'. This is basically just a list of Flags and their states. The SDKs then hold this list in memory. We track these calls on our API.

Flag Evaluation. Then the SDK can request the boolean or string value of any flag whenever they like, just grabbing stuff from that list in memory. We do also track these as "evaluations" on the client and then send them back to the server regularly.

I think most providers work this way? Our API doesnt actually have a way of retrieving a single flag, as there's really no point given our model.

I think some of this discussion people are maybe using mixed meanings?

toddbaert commented 1 year ago

I think most providers work this way? Our API doesnt actually have a way of retrieving a single flag, as there's really no point given our model.

I think most client side / JS providers work this way. On the server side, many seem to (at least occasionally) perform I/O operations on evaluation. This is why many SDKs have async server-side APIs. I'd like to maintain a unified interface between both, if possible.

dabeeeenster commented 1 year ago

Sounds like we're about to open a big can of 🪱 🪱 stateful worms 🪱 🪱 🪱

justinabrahms commented 1 year ago

Luckily, any 🪱stateful worms🪱 are isolated to specific providers, not in the OF SDKs.

At eBay, our mobile folks have a locally stored copy of stale flags on second+ run. We issue a request for a "hey, give me these known important flags right now" and a second "give me the rest of the flags I'll need" sorts of API calls. So I think this dovetails with #34 as well.

Flag management platforms typically want to capture an analytics event whenever a flag is evaluated, as a way of tracking flag usage (to detect stale flags) and as a cheap way to instrument multivariate experiments.

We're currently using/planning to use provider hooks and hook hints to accomplish this. This also comes up in optimistic threading contexts where you might fetch a flag but not actually use the result and don't want to track it. (Not sure if I've made a ticket for this)

It seems like providers should have some mechanism to say "hey.. I'm going to be asking for flags soon. Maybe prefetch them?". Additionally, this seems in line with things I've seen some vendors do which is load flags, and then listen to an event stream for any updates. The question becomes "how much preloading/caching is the right amount".. and OF shouldn't really have an opinion on that. It should be between the app author and the provider. I think we're on the hook to provide APIs for developers to communicate prefetch intent.. and possibly provide a way to nicely tell developers that prefetch/caching isn't supported by a given provider.

dabeeeenster commented 1 year ago

Yes - agree with this. I think we need to do some research into different operation modes of providers; that effectively control the chatty-ness/caching-ness of the provider SDK.

Just thinking of Flagsmith - we're more imperative - our flags are stored locally by the SDK and not retrieved again unless the developer explicitly calls getFlags. Maybe some providers are more declaritive. I guess if we added a refreshFlags method some providers would just NOOP it.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in next 60 days.

liran2000 commented 1 year ago

I can't speak for other providers, but the way Flagsmith works is that there are 2 discrete stages.

Flag Retrieval. All the Flags for the Environment are received in a single call - 'getFlags()'. This is basically just a list of Flags and their states. The SDKs then hold this list in memory. We track these calls on our AP.

Flag Evaluation. Then the SDK can request the boolean or string value of any flag whenever they like, just grabbing stuff from that list in memory. We do also track these as "evaluations" on the client and then send them back to the server regularly.

I think most providers work this way? Our API doesnt actually have a way of retrieving a single flag, as there's really no point given our model.

I think some of this discussion people are maybe using mixed meanings?

I agree, it is 2 items.
Sharing our use case. Without getting into specific implementation details, per existing behavior, for a specific request properties context, the flow is fetching all flag names from the provider. Then iterate and evaluating each flag at a context level, resulting with a list of enabled boolean flags. Then we have a list of enabled flags for the context, and we can pass it to creation of a "remote" object.

I am not requesting to add anything like that to the spec, but sharing for the discussion.

cc @beeme1mr

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in next 60 days.

github-actions[bot] commented 1 year ago

This issue was closed automatically because there has not been any activity for 90 days. You can reopen the issue if you would like to continue to work on it.

open-feature / ofep

Research for "bulk evaluation" functionality #13